You are on page 1of 275

The Innate Mind

The Innate Mind: Volume 2: Culture and Cognition


Carruthers, Peter (Editor), Professor of Philosophy, University of Maryland
Laurence, Stephen (Editor), Philosophy, University of Sheffield
Stich, Stephen (Editor), Philosophy and Cognitive Science, Rutgers University
Print publication date: 2007









end p.i


EVOLUTION AND COGNITION
General Editor, Stephen Stich, Rutgers University
Published in the series
Simple Heuristics That Make Us Smart
Gerd Gigerenzer, Peter Todd, and the ABC Research Group
Natural Selection and Social Theory:
Selected Papers of Robert Trivers
Robert Trivers
Adaptive Thinking:
Rationality in the Real World
Gerd Gigerenzer
In Gods We Trust:
The Evolutionary Landscape of Religion
Scott Atran
The Origin and Evolution of Cultures
Robert Boyd and Peter J. Richerson
The Innate Mind:
Structure and Contents
Peter Carruthers, Stephen Laurence, and Stephen Stich
The Innate Mind:
Volume 2: Culture and Cognition
Peter Carruthers, Stephen Laurence, and Stephen Stich


end p.ii


The Innate Mind
Volume 2: Culture and Cognition

2006
end p.iii

Oxford University Press, Inc., publishes works that further
Oxford University's objective of excellence
in research, scholarship, and education.

Oxford New York
Auckland Cape Town Dar es Salaam Hong Kong Karachi
Kuala Lumpur Madrid Melbourne Mexico City Nairobi
New Delhi Shanghai Taipei Toronto

With offices in
Argentina Austria Brazil Chile Czech Republic France Greece
Guatemala Hungary Italy Japan Poland Portugal Singapore
South Korea Switzerland Thailand Turkey Ukraine Vietnam

Copyright 2006 by Oxford University Press, Inc.

Published by Oxford University Press, Inc.
198 Madison Avenue, New York, New York 10016
www.oup.com

Oxford is a registered trademark of Oxford University Press

All rights reserved. No part of this publication may be reproduced,
stored in a retrieval system, or transmitted, in any form or by any means,
electronic, mechanical, photocopying, recording, or otherwise,
without the prior permission of Oxford University Press.

Library of Congress Cataloging-in-Publication Data
The innate mind : structure and contents / edited by
Peter Carruthers, Stephen Laurence, Stephen Stich.
p. cm.
Includes bibliographical references and index.
ISBN-13 978-0-19-517967-5; 978-0-19-517999-4 (pbk.)
ISBN 0-19-517967-6; ISBN 0-19-517999-4 (pbk.)
Volume 2:
ISBN-13 978-0-19-531013-9; 978-0-19-531014-4 (pbk.)
ISBN 0-19-531013-6; 0-19-531014-6 (pbk.)
1. Cognitive science. 2. Philosophy of mind. 3. Nativism (Psychology)
I. Carruthers, Peter, 1952 II. Laurence, Stephen. III. Stich, Stephen P.
BD418.3.I56 2005
153dc22 2004056813


2 4 6 8 9 7 5 3 1


Printed in the United States of America
on acid-free paper
end p.iv

Preface
This volume is the second in a projected series of three volumes on the innate mind.
(The others are The Innate Mind: Structure and Contents (2005) and The Innate Mind:
Foundations and the Future (forthcoming).) It represents the main products of the
second year of the three-year interdisciplinary Innateness and the Structure of the
Mind project, funded by the United Kingdom's Arts and Humanities Research Board, with
support from the Hang Seng Center for Cognitive Studies at the University of Sheffield,
the Evolution and Higher Cognition Research Group at Rutgers University, and the
Cognitive Studies Group at the University of Maryland. We are grateful to all these bodies
for their support.
During the academic year 20023, four preparatory workshops were held, one at
Rutgers, one at Maryland, and two at Sheffield; and the concluding conference was held
in Sheffield in July 2003. We have selected the best, most original, most cohesive essays
from those presented at these venues, as well as commissioning chapters from some
whose research became known to us in the course of the year. These chapters were all
displayed on a closed project website for the other participants to read and take account
of. The result, we believe, is an original, cutting-edge volume on the topic of innateness
and culture that will shape research in this area for many years to come.
We are grateful to all those who participated in the preparatory workshops and
concluding conference, whose comments and contributions to discussions have helped to
make this volume better. Special thanks go to those who presented their work at one or
other of the year's meetings but who for one reason or another don't have a chapter
included in this volume (some will have a chapter in volume 3). They are: Rita Astuti,
Robert Aunger, Brian Butterworth, Susan Carey, Keith Frankish, Patricia Greenspan,
Jonathan Haidt, Joseph Henrich, Stefan Krauss, Stephen Laurence, Jesse Prinz, Edmund
Rolls, David Sally, Richard Shweder, and Elizabeth Spelke.
end p.v

Finally, we would like to thank Tom Simpson, the project Research Associate, for all his
assistance, particularly in helping to ensure that the Sheffield workshops and concluding
conference ran smoothly. We would also like to thank Simon Fitzpatrick for his work in
constructing a common reference list, preparing the texts of the chapters in a common
format, and constructing the index.

end p.vi

Contents
Contributors ix
1 Introduction: Culture and the Innate Mind 3

Tom Simpson, Stephen Stich, Peter Carruthers, and Stephen Laurence
PART I: LEARNING, CULTURE, AND EVOLUTION
2 Culture, Adaptation, and Innateness 23

Robert Boyd and Peter J. Richerson
3
About 17 (+/ 2) Potential Principles about Links between the Innate Mind and
Culture: Preadaptation, Predispositions, Preferences, Pathways, and
Domains 39

Paul Rozin
4 Steps toward an Evolutionary Psychology of a Culture- Dependent Species 61

Daniel M. T. Fessler
5 Human Groups as Adaptive Units: Toward a Permanent Consensus 78

David Sloan Wilson
6 The Baldwin Effect and Genetic Assimilation: Contrasting Explanatory Foci and
Gene Concepts in Two Approaches to an Evolutionary Process 91

Paul E. Griffiths
7 The Baldwin Effect and Genetic Assimilation: Reply to Griffiths 102

David Papineau

end p.vii

pp. [viii]
8 Mental Number Lines 112

Marcus Giaquinto
PART II: MODULARITY AND COGNITIVE ARCHITECTURE
9 Modularity in Language and Theory of Mind: What Is the Evidence? 133

Michael Siegal and Luca Surian
10 Culture and Modularity 149

Dan Sperber and Lawrence Hirschfeld
11 Shaping Social Environments with Simple Recognition Heuristics 165

Peter M. Todd and Annerieke Heuvelink
12 Simple Heuristics Meet Massive Modularity 181

Peter Carruthers
13 Modularity and Design Reincarnation 199

H. Clark Barrett
14 Cognitive Load and Human Decision, or, Three Ways of Rolling the Rock
Uphill218

Kim Sterelny
PART III: MORALITY, NORMS, AND RELIGION
15 How Good Is the Linguistic Analogy? 237




Susan Dwyer
16 Is Human Morality Innate? 257

Richard Joyce
17 A Framework for the Psychology of Norms 280

Chandra Sekhar Sripada and Stephen Stich
18 Religion's Innate Origins and Evolutionary Background 302

Scott Atran
References 319
Index 351

end p.viii


Contributors
Scott Atran, Centre National de la Recherche Scientifique, Paris, and Department
of Psychology, University of Michigan
H. Clark Barrett, Department of Anthropology, University of California at Los
Angeles
Robert Boyd, Department of Anthropology, University of California at Los Angeles
Peter Carruthers, Department of Philosophy, University of Maryland
Susan Dwyer, Department of Philosophy, University of Maryland Baltimore
County.
Daniel M. T. Fessler, Department of Anthropology, University of California at Los
Angeles
Marcus Giaquinto, Department of Philosophy, University College London
Paul E. Griffiths, Biohumanities Project, University of Queensland
Annerieke Heuvelink, Department of Artificial Intelligence, Vrije University,
Amsterdam
Lawrence Hirschfeld, Departments of Psychology and Anthropology, New School
for Social Research
Richard Joyce, Philosophy Program (RSSS), Australian National University
Stephen Laurence, Department of Philosophy, University of Sheffield
end p.ix




end p.x
end p.1
end p.2
1 Introduction
Culture and the Innate Mind
Tom Simpson
Stephen Stich
Peter Carruthers
Stephen Laurence
Humans are cultural creatures. From before birth to beyond death our culture provides an indispensable part of who
we are, what we were, and who and what we will become. Humans are also biological animals, and our biological
nature provides an equally indispensable element of our past, present, and future. Recognition and reconciliation of
these facts has proved no easy task, and debate between those who defend a cultural understanding of our humanness
and those who defend a biological understanding has been long and rancorous. Yet, as the twenty-first century begins
to unfold, it is increasingly clear that both our cultural and our animal natures are necessary elements in any plausible
account of what human beings are.
This volume is the second in a three-volume series aimed at giving a state-of-the-art overview of research in the
nativist tradition. The first volume, The Innate Mind: Structure and Contents (Carruthers, Laurence, & Stich, 2005)
explored what is known about the likely overall architecture of the innate mind and some of its specific features. In
this volume, the focus is on the relations between culture and the innate mind. The essays that follow investigate such
questions as: To what extent are mature cognitive capacities a reflection of particular cultures and to what extent are
they a product of innate elements? How do innate elements interact with culture to achieve mature cognitive
capacities? How do minds generate and shape cultures? How are cultures processed by minds? How, in sum, should
we understand the relations between our cultural and our biological selves? In the final section of this introduction, we
have assembled brief summaries of each of the essays here. Before getting to those, however, we will set out a bit of
the historical background of the research traditions represented in this volume. We will then sketch, in broad strokes,
some of the ways in which researchers in these traditions have attempted to exploit features of the innate mind to
explain cultural phenomena. It goes without saying that there are substantive theoretical, empirical, and
methodological differences among those who might take themselves to be sympathetic with broadly nativist approaches
to these issues. What follows is not intended to summarize a set of views all such theorists would endorse, but rather
to set out the
end p.3
David Papineau, Department of Philosophy, King's College London
Peter J. Richerson, Department of Environmental Science and Policy, University of California at Davis
Paul Rozin, Department of Psychology, University of Pennsylvania
Michael Siegal, Department of Psychology, University of Sheffield
Tom Simpson, Higher Education Academy Psychology Network, University of York
Dan Sperber, Institut Jean Nicod, Centre National de la Recherche Scientifique, Paris
Chandra Sekhar Sripada, Departments of Philosophy and Psychiatry, University of Michigan
Kim Sterelny, Department of Philosophy, Victoria University in Wellington, and Philosophy Program (RSSS), Australian
National University
Stephen Stich, Department of Philosophy and Center for Cognitive Science, Rutgers University
Luca Surian, Department of Psychology, University of Trieste
Peter M. Todd, Program in Cognitive Science and Department of Psychology, Indiana University at Bloomington
David Sloan Wilson, Department of Biological Sciences, Binghamton University
theoretical backdrop against which the current diversity of opinion among nativists, which is represented in the
volume, has developed.
1 A Bit of History
For much of the twentieth century, the vast majority of psychologists and the vast majority of anthropologists would
have expected a book exploring the relationships between culture and the innate mind to be a very slim volume
indeed. From the 1920s until the mid-1960s, behaviorism, in one guise or another, was the dominant paradigm among
psychologists, apart from those in the Freudian tradition. And behaviorists, like the empiricist philosophers who inspired
them,

1

1. One of the central contrasts characterizing the divide between empiricist philosophers (Locke, Hume) and rationalist philosophers
(Descartes, Leibniz) that emerged in the seventeenth and eighteenth centuries concerned their views on the extent to which the mind
is innate. Empiricists claimed that a relatively small amount of simple, general-purpose, innate machinery would sufficefor example,
simple mechanisms of perceptual processing and general-purpose principles of association. Rationalists, on the other hand, claimed that
a relatively large amount of complex, special-purpose, innate machinery was requiredfor example, a large number of innate ideas,
and special-purpose processing mechanisms associated with language, mathematics, and other cognitive faculties. Importantly, in the
contemporary nativist/empiricist debate, the issue is not over whether empirical research bears on the study of the mindboth nativists
and empiricists appeal to empirical work in support of their views.
were disinclined to explain much of anything by appeal to innate properties of the mind. The reason is not that
behaviorists did not believe in minds, though their rhetoric sometimes lent itself to that interpretation. Nor was it the
case that behaviorists denied that the mind has any innate properties. Indeed, as the philosopher W. V. Quine noted,
the behaviorist is knowingly and cheerfully up to his neck in innate mechanisms of learning-readiness (Quine, 1969).
But those innate mechanisms are part of a general-purpose learning device designed to detect and utilize regularities
in the environment, whatever those happen to be. So for behaviorists, as for other empiricists, the mind's innate
mechanisms impose few constraints on what is learned, and contribute little or no content to the output of the learning
process. Rather, what is learned depends entirely on the environment to which the learner is exposed. Since
behaviorists, like other empiricists, deny that the mind starts out with much by way of unlearned innate content, they
must, and happily do, attribute just about all the contents of the adult mind to the environment. Thus, for
behaviorists, the innate mechanisms of the mind contribute nothing of substance to family patterns, social
relationships, language, norms, religions, decorative and artistic activities, technological traditions, or other
paradigmatic elements of culture.
In anthropology, during the middle decades of the twentieth century and beyond, the emphasis was also on the
environment. Franz Boas, one of the founders of anthropology in America, took a strong stand against the nativist (or
hereditarian)and in hindsight, blatantly racistviews that loomed large in the work of nineteenth-century social
Darwinists like Herbert Spencer. Boas maintained that it was the environment, particularly the cultural environment,
rather than biology or psychology, that determined the patterns of behavior that differed across groups and societies.
end p.4
In 1925, Boas sent his young student, Margaret Mead, to Samoa, where she spent nine months studying adolescence
and sexual awakening among Samoan youth. The book she produced, Coming of Age in Samoa, quickly became (and
probably still is) the most widely read anthropological study ever published. In it, Mead describes adolescence in
Samoa as a time of carefree, guilt-free, and delightful sexual experimentation, facilitated by an easy-going social
environment that is dramatically different from the one familiar to her readers in America and Europe.
The Samoan background which makes growing up so easy, so simple a matter, is the general casualness of the
whole society. For Samoa is a place where no one plays for very high stakes, no one pays very heavy prices, no
one suffers for his convictions, or fights to the death for special ends. Disagreements between parents and child
are settled by the child's moving across the street ... between a husband and his wife's seducer by a few fine
mats. ... Love and hate, jealousy and revenge, sorrow and bereavement, are all matters of weeks. (Mead,
1928/1973, p. 198)
Throughout her long and enormously influential career, as Derek Freeman has documented in great detail, Mead
insisted that the Samoans had
no conviction of sin, regarded lovemaking as the pastime par excellence, [and] made a fine art of sex. ...
Samoan society, she reported, works very smoothly as it is based on the general assumption that sex is play,
permissible in all hetero- and homosexual expressions, with any sort of variation as an artistic addition.... The
assumption that sex is play provides a cultural atmosphere in which frigidity and psychic impotence do not
occur and in which a satisfactory sex adjustment in marriage can always be established. (Freeman, 1983, pp.
912)
This was just the sort of powerful evidence that Boas had hoped Mead would find. If cultures can differ so radically
and in such fundamental ways, Boas, Mead, and many of their followers maintained, surely biology imposes few
interesting constraints.

2

2. We should note that not everyone believes that Mead ultimately endorsed the extreme form of cultural determinism that we ascribe
to her in the text; but there is no doubt that many of her followers have interpreted her that way.
We are forced to conclude, Mead wrote, a decade after her visit to Samoa, that human nature is almost unbelievably
malleable, responding accurately and consistently to contrasting cultural conditions (1935/1963, p. 280).

3

3. After almost a half century during which Mead's claims influenced everyone from Bertrand Russell to the readers of National
Geographic and the Readers' Digest, Freeman (1983) published a book in which he makes a persuasive case that just about every
major claim Mead made about Samoan culture was mistaken. The book was promptly denounced by the American Anthropological
Association.
The empiricist psychological theses that the mind is malleable and that its contents are determined by experience fit
very comfortably with the anthropological theses, urged by Boas and Mead, that cultures differ dramatically in
fundamental ways and that everything interesting about societies can be explained by the local cultural environment.
The ideas in this package are the central components of what John Tooby and Leda Cosmides (1992) have called the
Standard Social Science Model. In the years after World War II, because of the role that nativist theories about the
mind and cultures had played in propaganda designed to justify Nazi
end p.5
racist and eugenic policies, the cluster of views making up the standard social science model came to have
considerable moral authority. Nativism, many people believed, is not merely false, it is evil.
During the last three decades of the twentieth century, all this began to change. Though the unraveling of the
standard social science model was a complex process that is still far from complete, for our purposes, three strands of
the story are central.

4

4. For a much more detailed, though hardly nonpartisan, account of the decline of the standard social science model, see Pinker (2002).
The first was the emergence of cognitivism in psychology and the decline of behaviorism. On cognitivist accounts,
which were inspired by the metaphor of the mind as computer, minds contain large sets of representational states that
are manipulated by one or more computational mechanisms. The job of the psychologist, or the cognitive scientist, is
to discover the structure of these representations and the programs or algorithms that manipulate them. Early work on
language by Chomsky and his followers, and on reasoning and problem solving by Newell and Simon, inspired a
generation of investigators to apply this approach to a wide range of phenomena, including vision, memory,
categorization, inductive reasoning, and a host of others, often with impressive results (see, for example, Pinker,
2002; for a useful history of the emergence of cognitivism, see Gardner, 1985).
At the same time, behaviorism was subjected to critical scrutiny from within, and empirical work by John Garcia, Paul
Rozin, and others argued against the view that all learning was general purpose. For example, Garcia and Koelling
(1966) gave rats a saccharin-flavored water. When the rats began to drink, there was a repeated sound and flashing
lights. After drinking, the rats in one group were exposed to an electric shock, whereas rats in a second group were
exposed to x-rays that induced illness. All these rats developed an aversion to saccharin-flavored water presented with
sounds and flashing lights. Garcia and Koelling then tested the rats under two new stimulus conditions. In one
condition, rats from both groups were given saccharin-flavored water without the lights and sounds. In the other
condition, they were given ordinary water with lights and sounds as in the original condition. Although the rats in the
two groups (shock and x-rays) had both developed an aversion to the original stimulus conditions, they behaved very
differently in the new test conditions. Rats that had been shocked drank the saccharin-flavored water, but did not drink
the ordinary water. And rats that had been exposed to x-rays (and became ill) did just the opposite; they drank the
ordinary water, but did not drink the saccharin-flavored water (Gleitman, 1991). What experiments like these suggest
is that organisms have an innate preparedness for learning certain types of connections (e.g., between tastes and
illness, and between sights and sounds and shocks). (For discussion, see Rozin, 1976.)
The second strand in the story was the rekindling of interest in nativist theories of the mind. As discussed in some
detail in our introduction to the first volume in this series, Chomsky's work on language was the spark that ignited the
fire. Beginning in the mid-1960s, Chomsky made an increasingly impressive case that the structure of natural
languages was simply too rich to be acquired by an empiricist
end p.6
learning mechanism on the basis of the evidence available to the child. Given the poverty of the stimulus, Chomsky
argued, the only plausible explanation for the linguistic knowledge that the child acquires is that a very substantial
portion of that knowledge is innate. Since any normal child can learn any natural language, the innate knowledge,
which Chomsky called universal grammar (UG), must be present in all normal humans and manifest in all natural
languages.

5

5. These enormously influential ideas generated a great deal of controversy at the time, and continue to generate debate. For a recent
contributions to this debate, see Cowie (1999), Laurence and Margolis (2001), and Crain and Pietroski (2001).
How, then, are we to account for the obvious fact that unrelated languages seem very different from one another?
Though other broadly nativist models of cultural variability exist, Chomsky's own answer to this question invoked two
ideas that have cast a large shadow on nativist-friendly explanations of culture.
The first is that while all natural languages manifest the features specified in UG, those features are not obvious to
casual inspection. Discovering the cross-linguistic regularities of UG, like discovering the regularities captured by
Newton's laws or by just about any other sophisticated science, requires careful study of the phenomena aided by a
theory that tells you what to observe or measure. The second idea is that some of the regularities are disjunctive.
There are, for example, many logically possible ways in which a language might order the components of sentences
that linguists call heads and complements. But almost all of the world's languages exhibit one of two patterns. So the
regularity here is that heads and complements are ordered in one or the other of these ways. In order to determine
which pattern prevails in the language that surrounds her, a child must, of course, be exposed to that language. But
all she needs is a bit of information that will serve as a cue or trigger enabling her to adopt either pattern A or
pattern B. She need not figure out all the complexities of those two patterns, since they are innately specified.
Linguists describe this process in which the environment provides a cue triggering the adoption of one of several
innately specified patterns as parameter setting, and many in the Chomskian tradition believe that a relatively small
number of parameters will account for most of the variation in grammars found around the world (for example, see
Baker, 2001).
A third idea inspired by Chomsky's work that has had an important impact on nativist explanations of culture is that
both the psychological mechanisms underlying language processing and the those underlying language acquisition are
special-purpose, innate devices that are built to do those jobs and nothing else. Chomsky often uses the term
language organ to stress the analogies between the mental system underlying language processing and familiar
biological organs like kidneys or the eyes. In the early 1980s, Jerry Fodor (1983) published a very influential book in
which he proposed the term module for mental mechanisms like the language organ, and went on to offer a detailed
characterization of the features of modules. Central among them was that modules (1) contain a substantial body of
information relevant to the task they were designed to accomplish, where this information is inaccessible to other
components of the mind, and (2) do their work while utilizing only that proprietary
end p.7
body of information, encapsulated from all other information held elsewhere in the mind. Fodor, who had earlier done
much to clarify the basic assumptions of cognitivism, also assumes that modules are computational devices that
manipulate representations in accordance with a program or algorithm.
The final strand in our story of the events that led to the unraveling of the standard social science model was the
emergence of sociobiology, which had its beginnings in the research tradition of ethology stemming from the work of
Konrad Lorenz and Niko Tinbergen. This tradition provided an empirically grounded alternative to behaviorism.
Important aspects of animal behavior were seen to be the product of innate mechanisms that were evolutionary
adaptations. However, from Darwin's time onward, evolutionary theorists had found certain social behaviors in animals
to be very difficult to explain in terms of adaptations. Perhaps most puzzling were altruistic behaviors that threatened
the survival or reproductive prospects of the animal exhibiting the behavior while increasing the likelihood that some
other animal would survive and reproduce. How could animals disposed to behave like that evolve? Starting in the
mid-1960s, a group of biologists that included George Williams, W. D. Hamilton, John Maynard Smith, Robert Trivers,
and Richard Dawkins began to make major advances in answering that question. One crucial idea, proposed by
Williams and Hamilton and popularized by Dawkins in his book The Selfish Gene (1976), was that we should not focus
on the number of offspring an organism produces but rather on the number of copies of its genes that are passed on
to the next generation. That made it clear how a gene that made altruistic behavior more likely could spread through a
population, provided that the recipients of the altruism were kin who carried a copy of the gene. Theories invoking
reciprocal altruism, parental investment, sexual selection, and the idea of an evolutionarily stable strategy yielded
plausible accounts of how other behavioral dispositions might evolve. (For an overview of these ideas and more, see
Trivers, 1985). In 1975, Harvard biologist E. O. Wilson published Sociobiology: The New Synthesis, a massive survey of
the literature on animal social behavior and of attempts to explain how this behavior might have evolved. In the last
chapter of that book, Wilson turned his attention to humans. He offered hypotheses aimed at explaining how a variety
of human social behaviors and cultural phenomena might have evolved, including religion, ritual, artistic activity, male
dominance, and warfare. This was, of course, a clear challenge to the standard social science model, since if Wilson's
explanations were correct, then the behaviors in question must, to some extent at least, be influenced by genes, and
those genes must have been favored by natural selection. The reaction was fast and furiousindeed so furious that for
some years after the publication of Sociobiology, public talks by Wilson and other sociobiologists were often met with
organized and aggressive heckling. (For a detailed history see Segerstrle, 2000.)
While sociobiology and the closely allied field of human behavioral ecology pose a clear challenge to the standard social
science model, they do not speak directly to the topic that is the central focus of this book, the links between culture
and the innate mind. The reason they don't is that both sociobiology and human behavioral ecology are largely
apsychologicalthey don't say much about the mind at all. Rather, they focus on behavior. Their central concern is to
explain
end p.8
how a given pattern of behavior evolved, and their usual strategy is to argue that that pattern of behavior is adaptive,
that is, that it increases the chance that copies of the genes of organisms displaying the behavior will be present in
subsequent generations. All of this changed with the advent of evolutionary psychology, where we find theories that
attempt to explain cultural phenomena that clearly invoke features of the innate mind.
2 Evolutionary Psychology's Strategies for Explaining Culture
Though the terminology, like much else in this area, is contested, we will use evolutionary psychology as a label for
the work of a group of researchers who, starting in the mid-1980s, attempted to integrate the burgeoning nativist
research tradition with the evolutionary approach to culture urged by sociobiologists. While many thinkers have played
a role in developing evolutionary psychology, the most influential figures have been the anthropologist John Tooby, the
psychologist Leda Cosmides, and, more recently, the psychologist Steven Pinker. Though they are broadly sympathetic
with the sociobiologists' attempts to give evolutionary explanations of cultural phenomena, evolutionary psychologists
maintain that sociobiology's focus on behavior and its neglect of psychology are misguided. When genes influence
behavior, they argue, they do so by building brains with a bevy of specialized mental modules. Behavior is the result of
the interaction between these mental modules and the environment. During the Pleistocene, when modern humans
were evolving, natural selection shaped these mental modules to produce behavior that would be adaptive in the
Pleistocene environment. But over the roughly 10,000 years since the invention of agriculture, the environment in
which humans live has been radically altered by human activity. Thus it is a mistake to assume, as sociobiologists
typically do, that the behavior of modern humans is generally adaptive, since it is produced by minds that were
designed by natural selection to produce adaptive behavior in a very different environment.
Rather than attempting to show how contemporary social behavior and the cultural institutions that it generates are
adaptive, the research program of evolutionary psychology proposes that we learn as much as possible about the
persistent adaptive problems that our ancestors confronted during the period when modern humans were evolving.
According to evolutionary psychologists, we should then hypothesize that for most of these adaptive problems, natural
selection produced a mental module that was well designed to solve it, and that those modules persist largely
unchanged in modern minds. These hypotheses about contemporary minds can then be tested using the methods of
contemporary cognitive science. The mental modules posited by evolutionary psychologists do not share all of the
features of Fodor's modules, and there is considerable debate about which features they retain (Carruthers, 2005,
chapter 12 here). However, it is clear that evolutionary psychologists take modules to be special -purpose computational
devices, and since these devices have been shaped by natural selection, they often use the term Darwinian
algorithms to characterize their programs.
There is much here that is controversial, including the pivotal assumption that there will be at least a rough pairing
between adaptive problems faced by our
end p.9
ancestors in the Pleistocene and mental modules designed to solve those problems (Samuels, 2000; Boyd & Richerson,
chapter 2 here). But in addition to their theoretical arguments in support of this claim, evolutionary psychologists
maintain that the assumption has been amply vindicated by contemporary cognitive science, particularly those parts of
cognitive science that have taken nativism seriously. According to Tooby and Cosmides, this research has shown that
our cognitive architecture resembles a confederation of hundreds or thousands of functionally dedicated
computers (often called modules) designed to solve adaptive problems endemic to our hunter-gatherer
ancestors. Each of these devices has its own agenda and imposes its own exotic organization on different
fragments of the world. There are specialized systems for grammar induction, for face recognition, for dead
reckoning, for construing objects and for recognizing emotions from the face. There are mechanisms to detect
animacy, eye direction, and cheating. There is a theory of mind module and a multitude of other elegant
machines. (Tooby & Cosmides, 1995, p. xiv)
All of this cognitive architecture, evolutionary psychologists maintain, is part of our human endowment and is shared
by people in all cultures. How, then, do evolutionary psychologists propose to explain the apparently limitless variety of
cultural differences that have been described by anthropologists who followed in the footsteps of Boas and Mead?
One important theme in evolutionary psychologists' response to this question is to challenge the assumption of all-but-
limitless cultural variability. While not denying that cultures vary in many ways, evolutionary psychologists also insist
that there are many cultural universalsfeatures all cultures sharethough, like Chomsky's linguistic universals, they
are sometimes not obvious unless one has a theory that suggests where to look. Here is how Tooby and Cosmides
make the point:
Anthropological orthodoxy to the contrary, human life is full of structure that recurs from culture to culture, just
as the rest of the world is. (Or, if one prefers, there are innumerable frames of reference within which
meaningful cross-cultural uniformities appear, and many of these statistical uniformities and structural
regularities could potentially have been used to solve adaptive problems.) ... Such statistical and structural
regularities concerning humans and human social life are an immensely and indefinitely large class (D. E.
Brown, 1991): adults have children; humans have a species-typical body form; humans have characteristic
emotions: humans move through a life history cued by observable body changes; humans come in two sexes;
they eat food and are motivated to seek it when they lack it; humans are born and eventually die; they are
related through sexual reproduction and through chains of descent; they turn their eyes toward objects and
events that tend to be informative about adaptively consequential issues; they often compete, contend, or fight
over limited social or subsistence resources; they express fear and avoidance of dangers; they preferentially
associate with mates, children, and other kin; they create and maintain enduring, mutually beneficial
individuated relationships with nonrelatives; they speak; they create and participate in coalitions; they desire,
plan, deceive, love, gaze, envy, get ill, have sex, play, can be injured, are satiated; and on and on. Our
immensely elaborate species-typical physiological and psychological architectures not only constitute regularities
in themselves but they
end p.10
impose within and across cultures all kinds of regularities on human life, as do the common features of the
environments we inhabit. (1992, pp. 889)
Tooby and Cosmides sometimes describe these universals as constituting a single human metaculture (p. 91).
But even if it is granted that there is a rich human metaculture that has been largely neglected by anthropologists,
there is still a great deal of cultural diversity that needs to be explained. One strategy that evolutionary psychologists
use to explain this diversity parallels Chomsky's parameter-setting strategy for explaining grammatical diversity. If our
ancestors had to solve persistent adaptive problems in several quite different environments, we should expect that
some of the Darwinian algorithms that evolved to deal with those problems would be disjunctive, with cues from the
physical or social environmental serving to activate the appropriate branch of the algorithm. As in the case of
Chomskian parameters, the information required to deal with the problem at hand is innate, and the environment
serves only to trigger the appropriate package of information. Cosmides and Tooby use the term evoked culture for
aspects of culture that are produced in this way (Cosmides & Tooby, 1992). The food-sharing practices within modest-
size band-level groups are among the phenomena that evolutionary psychologists have attempted to explain by
appealing to the idea of evoked culture. The core idea is that some sorts of foraging depend heavily on luck, and in
those cases, band-wide food sharing serves as an insurance policy that buffers the day-to-day variance. But when skill
and individual effort rather than luck are the major determinants of success, individuals will maximize their fitness if
they are inclined to share only with kin. And these patterns have indeed been reported in a number of studies. To
explain these patterns, Cosmides and Tooby posit innate evolved mechanisms that are toggled by cues indicating the
extent to which success in a given foraging activity depends on chance:
Because foraging and sharing are complex adaptive problems with a long evolutionary history, it is difficult to
see how humans could have escaped evolving highly structured domain-specific psychological mechanisms that
are well designed for solving them. These mechanisms should be sensitive to local informational input, such as
information regarding variance in the food supply. This input can act as a switch, turning on and off different
modes of activation of the appropriate domain-specific mechanisms. The experience of high variance in foraging
success should activate rules of inference, memory retrieval cues, attentional mechanisms, and motivational
mechanisms. These should not only allow band-wide sharing to occur, but should make it seem fair and
appealing. The experience of low variance in foraging success should activate ... mechanisms that make within-
family sharing possible and appealing but that make band-wide sharing seem unattractive and unjust. (1992, p.
215)
There is yet another way in which evolutionary psychologists invoke evolved modules to explain cultural variation. It is
a truism that culturesor more accurately individuals within culturesare great sources of locally useful information.
What plants are edible, what animals are dangerous, what paths are safe, all this and much more is conveyed from
one individual to another. When this culturally transmitted information is relevant to solving adaptive problems that
were
end p.11
frequently encountered in the Pleistocene, evolutionary psychologists maintain, mental mechanisms may have evolved
that seek it out and make use of it in predetermined ways. As an example, Barrett (2005a) has argued that children
have an innate dangerous animal category embedded in a mental mechanism that leads them to seek out and retain
information about local animal predators, and to have appropriate emotional and behavioral responses to such animals.
3 The Epidemiological Strategy for Explaining Culture
In the previous section, we sketched the three strategies for explaining cultural phenomena that evolutionary
psychologists have most actively explored: metaculture, evoked culture, and use of culturally transmitted information
by modules designed to exploit it. All three strategies are aimed at explaining aspects of culture that are clearly
adaptive, or that were adaptive in ancestral environments. But there are many cultural phenomena, including aspects
of religion, taboos, and etiquette rules, that appear to serve no adaptive function either now or in the past. Does the
innate mind have anything to tell us about these? Researchers who have adopted the epidemiological approach to
culture pioneered by Dan Sperber (1996) argue that it does. The starting point of this approach is the observation that
while it is undoubtedly true that lots of information is transmitted from one member of a culture to another, this
information transfer is almost always mediated by a variety of innate mechanisms. In order to imitate a dance or a
hunting technique, internalize a norm, or learn a folk tale, the learner (or cultural child as he or she is sometimes
called) must observe more knowledgeable members of the culture (cultural parents), infer or reconstruct the mental
representations that underlie their behavior (including their verbal behavior), and store the reconstructed mental
representations in the appropriate place in memory. Neither the mechanisms that underlie the necessary inferences nor
those that underlie memory are perfectly accurate, however. Such learners are bound to make mistakes, and those
mistakes will often not be random. Rather, because of the way the mechanisms responsible for inference and memory
are designed, the mental representations that are reconstructed and stored are more likely to selectively retain some
features of the cultural parents' representations, to drop others, and to introduce new features that may not have been
present in the cultural parents' representations. The features that are more likely to be retained or added might be
thought of as biases or attractors in the transmission process, and over time the transmitted mental representations
found in a population will tend to move in the direction of those attractors.
One influential example of research that adopts the epidemiological approach is Pascal Boyer's work on religion (Boyer,
2001). Boyer has shown that people's beliefs about supernatural beings tend to characterize those beings as having
just one, or a small number, of bizarre and unfamiliar properties, and otherwise to be pretty much the same as natural
beings in that category. Thus a supernatural person may be able to know what is happening in distant places or what
will happen in the future, but apart from this, his mind will have all the normal characteristics posited by commonsense
or folk psychology. The reason for this, Boyer argues, is that the small number of supernatural properties make the
representations of these beings
end p.12
particularly memorable, while the more mundane features of the supernatural agent's mind are supplied automatically,
when people hear accounts of these beings, by the innate mental modules that are responsible for attributing mental
states to real people. Shaun Nichols's work on etiquette norms provides another illustration of the epidemiological
strategy. Nichols has shown that, while a wide variety of behavior has been governed by etiquette norms during the
last 500 years, the norms that tend to survive, once they appear, are those that prohibit behavior that evokes disgust
reactions we are innately predisposed to have. Our innate predisposition to find certain types of things disgusting,
Nichols argues, biases the transmission process in favor of norms prohibiting disgusting behavior by making those
norms more salient and more memorable (Nichols, 2004).
Although describing this approach to the explanation of cultural phenomena as epidemiological is a metaphor, it is in
many ways a very apt one. The mental representations that are spread by the sorts of processes that are center stage
in the epidemiological approach, like the infectious agents tracked by medical epidemiologists, rarely do their hosts
much good. Those that succeed in spreading through a population do so by exploiting features of their hosts' cognitive
systems that were designed for very different purposes. The mind-reading system that explains why supernatural
beings are believed to have familiar psychological properties did not evolve because it enabled people to create
religious myths, and the core disgust system was presumably in place long before the emergence of rules of etiquette.
Thus the epidemiological approach gives us insights into some of the quirks of culture, and some of its pathologies. It
explains how innate mental mechanisms that were designed to deal with adaptive problems can, inadvertently as it
were, give rise to an efflorescence of cultural phenomena that often contribute nothing to fitness.
4 Cumulative Cultural Evolution and Adaptive Local Culture
Thus far we have considered strategies for explaining aspects of culture that were adaptive in ancestral environments
(many of which, of course, are still adaptive) and aspects of culture that are often maladaptive. But as Robert Boyd,
Peter Richerson, and a number of other researchers have argued, these approaches cannot explain some of the most
conspicuous and important features of culture. Humans are by far the most widely distributed large animals on earth;
they survive and often flourish in the Arctic, in temperate zones and in the Tropics, in deserts and in rain forests, on
tiny atolls and in enormous cities. People can live in this staggering variety of environments because they have
sophisticated, culturally transmitted, and locally appropriate technological knowledge that enables some groups of
people to build igloos and kayaks and hunt seals, and enables other groups to build high-rise apartment towers and
grow high-yielding genetically modified crops. None of this is plausibly explained by appeal to evoked culture alone.
Boyd and Richerson (chapter 2 here) make this point vividly with a thought experiment in which a contemporary urban
academic is deposited on an Arctic beach, where, in order to survive, he needs to make a kayak out of locally available
materials. He would, of course, be a spectacular failure. The new environment would not evoke a Darwinian algorithm
for kayak building. Nor would he be able to learn the art on his own, via trial-and-error
end p.13
learning, even after years of trying. The Inuit, who are masterful kayak builders, do not rely on Darwinian algorithms
or on individual learning to acquire their skills; rather, they get the relevant knowledge from other members of their
culture. But, of course, this immediately raises another question: How did this knowledge get established in the
culture? The answer, Boyd and Richerson argue, is that human cultural transmission, like genetic transmission, is
cumulative. Small changes in existing cultural knowledge introduced by individual innovators, whether they are
motivated by insight or by chance, will be adopted and passed on to subsequent generations if they are judged to
improve the product whose production the knowledge guides. Over time, this process of cumulative innovation can lead
to technologies that are exquisitely well adapted to local environments. And while it typically takes many generations
for the process to achieve these results, it can nonetheless be extraordinarily fast when compared with the pace of
cumulative biological evolution. This cumulative process of cultural evolution, Boyd and Richerson argue, is central in
explaining the extraordinary success of our species.
Humans are not the only species that has a system of cultural transmission (Heyes & Galef, 1996). However, only
humans exhibit the sort of massively cumulative cultural transmission that enables us to quickly adapt to a wide range
of environments. What features of our innate minds make this powerful component of culture possible? Answers to this
question are explored in several of the essays in this volume, including those by Boyd and Richerson (chapter 2),
Fessler (chapter 4), and Sterelny (chapter 14). One intriguing suggestion is that the mind-reading (or theory of
mind) system plays a central role since it enables us to understand the intentions or goals underlying other people's
behavior, and this may be crucial to successful imitation (Tomasello, 1999a). But if the mind-reading system, much of
which appears to be unique to humans, gives us the ability to imitate, one or more other components of the innate
mind must provide the motivation to imitate. Fessler suggests that some of this motivation may derive from the
mental mechanisms underlying norms, perhaps like those described in by Sripada and Stich (chapter 17 here). But if
this is right, it is clearly not the full story about the motivation to imitate and to internalize local knowledge, and much
more work is needed in this area.
Some of Boyd and Richerson's most influential work has focused on the question of who to imitate. The question is an
important one, since there will often be many potential cultural parents from whom a neophyte could learn. Using
sophisticated mathematical models, Boyd and Richerson have shown that in some circumstances it will be adaptive to
adopt the most common cultural variant, while under other circumstances it will be adaptive to adopt the variant
exhibited by a high-prestige individual. This suggests that we may have evolved innate mechanisms or biases
facilitating these choices (Boyd & Richerson, 1985; Richerson & Boyd, 2005).
The mechanisms that enable cumulative cultural evolution and that lead us to adopt appropriate cultural parents
produce the spectacular results that dramatically differentiate human societies from those of even our closest primate
cousins. But, as Richerson and Boyd have stressed (2005; Boyd & Richerson, chapter 2 here) there is a dark side to
this as well. The processes vetting cultural innovations,
end p.14
though they can be very effective, can also fail in a variety of ways. Prestige bias provides a clear and intuitive
example. While it is often adaptive to imitate successful and prestigious individuals, it is hard to know which aspects of
their belief systems and their preferences contribute to their success. So our inclination to imitate, while it may give us
useful knowledge and skills, may also lead us to pick up idiosyncratic beliefs and preferences that are inefficacious or
maladaptive. As Boyd and Richerson note, our propensity to adopt dangerous beliefs may be the price we pay for the
marvelous power of cultural adaptation (chapter 2 here).
5 Introduction to the Rest of the Volume
We have reviewed some of the main strategies that have been proposed for explaining aspects of culture by appeal to
innate features of the mind. But as the essays in this volume make clear, there is much more territory that needs to
be explored. Some of these essays suggest additional strategies, some propose ways strategies can be combined, and
many address the daunting task of assembling persuasive evidence in favor ofor againstproposed explanations. In
this section, we will offer brief sketches of each of the chapters that follow.
5.1 Learning, Culture, and Evolution
The chapters in part I all focus on possible relations between acquisition, learning, and culture and examine the extent
to which ideas from evolutionary theory can aid our understanding of such relations.
Boyd and Richerson (chapter 2) examine the ways in which coevolutionary phenomena have shaped our cultural and
genetic selves. In particular, Boyd and Richerson examine the costs and benefits associated with both social learning
and more rigid cognitive mechanisms, and show how trade-offs between these acquisition methods necessarily
underlie the kind of cumulative cultural evolution exhibited by the human lineage.
Rozin (chapter 3) continues the discussion of possible interactions between our cultural and genetic selves, presenting
a set of nineteen principles that he suggests may be useful in understanding links between culture and the innate
mind. Drawing on work by numerous researchers in a variety of research programs and domainsand with particular
attention to the domain of foodRozin provides a wealth of data and insight concerning the evolution and development
of human preadaptations, predispositions, and preferences.
Fessler (chapter 4) compares and contrasts human and nonhuman primates' uses of cultural information. Fessler points
out that the human capacity to acquire, employ, and elaborate on socially transmitted information is the cornerstone
of humans' evolutionary prosperity, and argues that these capacities reflect the workings of special -purpose
psychological mechanisms that evolved in order to exploit the enormous adaptive potential of socially transmitted
information. To support these claims, Fessler first reviews the principal existing approaches to this issue, and then
outlines some of the major topics he believes need to be addressed in developing an evolutionary psychology of our
uniquely culture-dependent species.
end p.15
Wilson (chapter 5) provides a more methodologically focused analysis, in which he examines our understanding of
human groups in the light of evolutionary theory. Wilson argues that scientific and intellectual thought has for some
decades been dominated by a form of individualism that renders groups as nothing more than collections of self-
interested individuals. However, in recent years groups themselves have begun to be interpreted as adaptive units, and
this interpretation has much in common with a much older understanding of the individual/group relationship. Wilson
suggests that we now have to hand the ingredients for a permanent consensus on the relationship between human
groups and evolution.
The next two essays in part I both focus on the relations between genetic assimilation and the Baldwin effect, two
famous models that envisage something that is initially learned by individuals becoming, over time, innate. Griffiths
(chapter 6) takes issue with David Papineau's claim to have described a form of genetic assimilation dependent on
social learning. According to Griffiths, the Baldwin effect is a phenotypic-level selection model that is supposed to
explain how selection can cause an acquired phenotype to become innate. Conrad Waddington's genetic assimilation,
however, is a developmental model that is supposed to explain how very small genetic changes can cause acquired
traits to become innate. Papineau conflates genetic assimilation with the Baldwin effect, and this, Griffiths argues, is a
result of the way he thinks about genes. We need to think about genes in a more sophisticated way if we are to
understand how and why the development of a phenotypic trait can become independent of certain aspects of the
developmental environment.
In reply, Papineau (chapter 7) argues that Griffiths himself conflates two distinct notionsgenetic canalization and
genetic assimilation. Papineau argues that Griffiths's criticisms would be both correct and well directed if the focus of
Papineau's concern was genetic canalization. However, Papineau claims, his concern is not genetic canalization but
genetic assimilation. Thus, Papineau concludes, even though Griffiths's critique may be theoretically sound, it
unfortunately misses its intended mark.
Finally in this part of the book, Giaquinto (chapter 8) discusses mental number lines, and the respective innate and
cultural contributions to their construction during development. He argues that while one might initially think of such
lines as cognitively simple objects that are routinely learned at school via associative mechanisms, in fact neither the
nature nor the origin of these number lines is at all clear. Using data from a variety of empirical studies, Giaquinto
concludes that the standard mental number line is ultimately the product of four interacting factors. Three of these are
innate facultiesour number sense, our sense of the space around us, and our visual imagery systemand one is the
culture-specific convention of a written numeral system.
5.2 Modularity and Cognitive Architecture
The essays in part II examine central elements of our cognitive architecture, and focus in particular on the nature and
role of modularity in human cognition.
Siegal and Surian (chapter 9) investigate two seemingly uniquely human cognitive capacitieslanguage and theory of
mindand examine the extent to which
end p.16
these capacities interact during ontogenetic development. Using evidence from developmental psychology, cognitive
neuroscience, and behavioral genetics, Siegal and Surian conclude that the development of both systems is significantly
modularized and characterized by a poverty of the stimulus. Nevertheless, they point out that in typically developing
persons these systems interact substantially to support word learning and the emergence of specific cultural beliefs.
Sperber and Hirschfeld (chapter 10) address the relations between cognitive modularity and cultural diversity, and
argue that these supposedly incompatible properties can in fact be reconciled. Indeed, Sperber and Hirschfeld claim,
cognitive modularity is necessary to explain important aspects of cultural diversity that would otherwise remain
mysterious, and they therefore conclude that these two properties should be considered as complementary rather then
conflicting elements of human existence.
Todd and Heuvelink (chapter 11) examine the information-gathering and decision-making mechanisms that may
underlie the construction of culture and cultural knowledge. Todd and Heuvelink claim that such mechanisms may be
much simpler than is often supposed, and support this claim with details from simulations of the use of one class of
such simple heuristicsrecognition heuristicsby a population of socially interacting agents. They argue that the
emergent behavior of these simulated agents is importantly similar to that of real world agents, and thus conclude that
such simple heuristics may well shape a great many of the social processes that occur in the real world.
Carruthers (chapter 12) continues with the theme of simple heuristics, and assesses the impact of results from the
simple heuristics research program on the notion of massive modularity prevalent in evolutionary psychology.
Carruthers begins by defusing several potential sources of conflict between these two programs, but then goes on to
show that the simple heuristics program does have the potential to undermine one of the main arguments frequently
used in support of massive modularity. This leads Carruthers to reexamine the notion of modularity as understood in
cognitive science, and in so doing he develops a characterization of modularity that can both support massive
modularity and accommodate the results from the simple heuristics program.
Barrett (chapter 13) also focuses on claims of massive modularity. More specifically, he is concerned to reconcile
modularity with development. He argues that it is a mistake to assume that modules would have to be either innate or
genetically prespecified. Rather, modules should be thought of as functionally distinct aspects of cognitive
organization that emerge in the course of normal development (where sometimes the developmental process can
include various forms of learning). And thus considered, they can still be targets of selection, and can count as
adaptations.
A more direct challenge to the simple heuristics program comes from Sterelny (chapter 14). Sterelny investigates the
decision-making processes involved in human social situations, and claims that many instances of such decision-
making involve what he terms a high information load. Sterelny then argues that in these cases it isn't possible for
simple or fast-and-frugal heuristics to do the decision-making work. He suggests that social decision-making must
therefore require a
end p.17
variety of other sorts of information-processing mechanism, especially those that utilize social and environmental
structures that are external to individual agents. Sterelny then considers how such mechanisms and the corresponding
external structures may have coevolved.
5.3 Morality, Norms, and Religion
The essays in part III all focus on the development of cultural norms and beliefs.
Dwyer (chapter 15) focuses on the development of children's moral capacities, and examines the extent to which this
development may mirror the development of our linguistic abilities, as understood from a Chomskian perspective. She
argues that there are in fact many deep similarities between the development of our moral and linguistic
competencies, and suggests that such similarities provide good evidence for the existence of an underlying normative
competence that allows us to see the world in moral termsindeed makes us do so.
Joyce (chapter 16) seeks to clarify the claim Human morality is innate and asks why moral thinking may have been
adaptive for our ancestors. By putting forward a hypothesis in terms of individual selection (and reciprocal altruism in
particular), Joyce raises the question of what practical advantages accrue to the individual (as opposed to the group)
by having a tendency to categorize certain actions (including one's own) as good, prohibited, virtuous, and so on.
Although he accepts that explanations involving group selection are entirely legitimate in principle, in this instance,
Joyce argues, explanations involving only individual selection and reciprocal altruism provide a less complicated and
ultimately more successful hypothesis.
Sripada and Stich (chapter 17) develop a framework within which to investigate what they term the psychology of
norms. Broadly speaking, norms are rules or principles that govern various aspects of human behavior, and that often
do so without any explicit recognition or enforcement by social institutions. In addition, norms usually give rise to
powerful subjective feelings, and people often feel motivated to comply with such norms, irrespective of any explicit
social requirement. Norms therefore play an extremely significant role in human culture, and in our explanations of
cultural change and evolution. However, as Sripada and Stich point out, research into the development and deployment
of norms is both partial and piecemeal. They therefore present a model of the cognitive mechanisms underlying the
acquisition and use of norms, which can not only explain the existing data but serve as a focus for future more
structured research.
Finally, Atran (chapter 18) considers the evolutionary and ontogenetic origins of human religions. He argues that
religious beliefs in generaland supernatural beliefs in particularare the by-product of various cognitive mechanisms
that originally evolved under natural selection for the purpose of performing other, more mundane adaptive tasks.
Atran claims that applying these more mundane capacities to existential rather than practical problems would produce
precisely the same kinds of solutions that we see illustrated by human religious systems, and he shows that the
scope and limits of several actual religious systems (including those of the Lowland Maya, Tamil Hindus, and Ladakhi
Buddhist tanshumants) provide good evidence in favor of his claims.
end p.18
6 Conclusion
These are fascinating times for multidisciplinary research into the interaction between culture and the innate mind.
Current research is producing results of unprecedented detail and scope from within and across many different
investigative domains, and these results are increasingly both influenced by, and serve to build upon, nativist models
of the mind. This volume provides further evidence of just how widespread and profitable nativist theorizing now is,
and offers significant insight into the many ways in which anthropologists, psychologists, philosophers, and other
cognitive scientists now employ and depend upon such theorizing for their own research. However, this volume also
shows how much more work is still to be done, and suggests a variety of new directions for future research. We
believe, therefore, that this book provides an important contribution to our understanding of what it is that we as
humans are, and of how we came to be that way.
end p.19
end p.20
end p.20
Part I Learning, Culture, and Evolution
end p.21
end p.22
2 Culture, Adaptation, and Innateness
Robert Boyd
Peter J. Richerson
It is almost 30 years since the sociobiology controversy burst into full bloom. The modern theory of the evolution of
animal behavior was born in the mid-1960s with Bill Hamilton's seminal essays on inclusive fitness and George
Williams's book Adaptation and Natural Selection. The following decade saw an avalanche of important ideas on the
evolution of sex ratio, animal conflicts, parental investment, and reciprocity, setting off a revolution in our
understanding of animal societies, a revolution that is still going on today. By the mid-1970s, Richard Alexander, E. O.
Wilson, Napoleon Chagnon, Bill Irons, and Don Symons, among others, began applying these ideas to understand
human behavior. Humans are evolved creatures, and quite plausibly the same evolutionary forces that shaped the
behavior of other animals also molded our behavior. Moreover, the new theory of animal behaviorespecially kin
selection, parental investment, and optimal foraging theoryseemed to fit the data on human societies fairly well.
The reaction from much of the social sciences was, to put it mildly, negative. While the causes of this reaction are
complex (Segerstrle, 2000), one key is that most social scientists thought about these problems in terms of nature
versus nurture. On this view, biology is about nature; culture is about nurture. Some things, like whether you have
sickle-cell anemia, are determined by genesnature. Other things, like whether you speak English or Chinese, are
determined by the environmentnurture. Evolution shapes innate genetically determined behaviors, but not behaviors
acquired through learning. Social scientists knew that culture plays an overwhelmingly important role in shaping human
behavior, and since culture is learned, evolutionary theory has little to contribute to the understanding of human
behavior. This conclusion, and the reasoning behind it, remains the conventional wisdom in much of social science.
It is also deeply mistaken. Traits do vary in how sensitive they are to environmental differences, and it is sensible to
ask whether differences in traits are mainly due to genetic differences or differences in the environment. However, the
answer you get to this question tells you nothing about whether the traits in question have
end p.23
been shaped by natural selection. Every aspect of phenotype of every organism results from the interaction of genetic
information stored in the developing organism and the properties of its environment. If we want to know why the
organism develops one way in one environment and a different way in a different environment, we have to find out
how natural selection and other evolutionary processes have shaped the developmental process of the organism. This
logic applies to any trait, learned or not, and has been successful when applied to understand learned behavior in a
wide range of species.
As a consequence, the evolutionary social science community by and large rejected the idea that culture makes any
fundamental difference in the way that evolutionary thinking should be applied to humans. The genes underlying the
psychological machinery that gives rise to human behavior were shaped by natural selection, so the machinery must
have led to fitness-enhancing behavior, at least in ancestral environments. If it goes wrong in modern environments it
is not culture that is the culprit, but the fact that our evolved, formerly adaptive psychology misfires these days.
Over the last 20 years, two healthy research traditions have grown up in evolutionary social science, human behavioral
ecology and evolutionary psychology, which study human behavior with little attention to the effects of culture.
In this essay we argue that both sides in this debate got it wrong. Culture profoundly alters human evolution, but not
because culture is learned. Rather, culture entails a novel evolutionary trade-off. Social learning allows human
populations to accumulate reservoirs of adaptive information over many generations, leading to the cumulative cultural
evolution of highly adaptive social institutions and technology. Because this process is much faster than genetic
evolution, it allows human populations to evolve cultural adaptations to local environments, an ability that was a
masterful adaptation to the chaotic, rapidly changing world of the Pleistocene. However, the same psychological
mechanisms that create this benefit necessarily come with a built-in cost. To get the benefits of social learning,
humans have to be credulous, for the most part accepting the ways they observe in their society as sensible and
proper. Such credulity opens up human minds to the spread of maladaptive beliefs. Tinkering with human psychology
can lessen this, but it cannot be eliminated without also losing the adaptive benefits of cumulative cultural evolution.
In this essay, we begin by sketching the view of culture that is current among many in the evolutionary social science
community. Then we summarize the evidence that human adaptation depends crucially on the cumulative cultural
adaptation. Next we expand on our argument that cumulative cultural adaptation entails an unavoidable trade-off. We
conclude by discussing how cumulative cultural evolution may breathe some new life into the idea of innateness.
1 Evolutionary Psychology: Culture as a Library of Works Written by Adapted Minds
In their critique of the standard social science model, John Tooby and Leda Cosmides (1992, pp. 1156) introduced
the distinction between epidemiological and evoked culture. Epidemiological culture refers to what most people
mean
end p.24
by the word culturedifferences between people that result from different ideas or values acquired from the people
around them. Evoked culture refers to differences that are not transmitted at all, but rather are evoked by the local
environment. Cosmides and Tooby argue that much of what social scientists call culture is evoked. They ask their
readers to imagine a jukebox with a large repertoire of records and a program that causes a certain record to be
played under particular local conditions. Then, all the jukeboxes in Brazil will play one tune, and all those in England
will play another tune, because the same program orders up different tunes in different places. Tooby and Cosmides
believe that anthropologists and historians overestimate the importance of epidemiological culture, and emphasize that
much human variation results from genetically transmitted information that is evoked by environmental cues.
They are led to this conclusion by their belief that learning requires a modular, information-rich psychology. Tooby and
Cosmides (1992) and some other evolutionary psychologists (Gallistel, 1990) argue that domain-general learning
mechanisms like classical conditioning and other forms of correlation detection are inefficient. When the environment
confronts generation after generation of individuals with the same range of adaptive problems, selection will favor
special-purpose, domain-specific cognitive modules that focus on particular environmental cues and then map these
cues onto a menu of adaptive behaviors. Evidence from developmental cognitive psychology provides support for this
picture of learningsmall children seem to come equipped with a variety of preconceptions about how the physical,
biological, and social worlds work, and these preconceptions shape how they use experience to learn about their
environments (Hirschfeld & Gelman, 1994). Evolutionary psychologists (and others; see Sperber & Hirschfeld, 2004)
think the same kind of modular psychology shapes social learning. They argue that culture is not transmitted
children make inferences by observing the behavior of others, and the kind of inferences they make are strongly
constrained by their evolved psychology. Linguist Noam Chomsky's argument that human languages are shaped by a
genetically transmitted universal grammar is the best known version of this idea, but evolutionary psychologists think
virtually all cultural domains are similarly structured.
Anthropologist Pascal Boyer's (1994) account of the nature of religious belief provides a good example. Boyer worked
among the Fang, a group in Cameroon and Gabon, who have elaborate beliefs about ghosts. For the Fang, ghosts are
malevolent beings that want to harm the living; they are invisible, they can pass through solid objects, and so on.
Boyer argues that most of what the Fang believe about ghosts is not transmitted; rather it is based on the innate,
epistemological assumptions that underlie all cognition. Once young Fang children learn that ghosts are sentient beings,
they don't need to learn that ghosts can see or that they have beliefs and desiresthese components are provided by
cognitive machinery that reliably develops in every environment. Like Cosmides and Tooby, Sperber, Atran, and others,
Boyer thinks that many putatively cultural religious beliefs arise because different environmental cues evoke different
innate information. Our California neighbors believe in angels instead of ghosts because they grew up in an
environment in which people talked about angels. However, most of what they know about
end p.25
angels comes from the same cognitive machinery that gives rise to Fang beliefs about ghosts, and the information that
controls the development of this machinery is stored in the genome.
Understand that these authors do not deny that epidemiological culture plays a role in shaping human behavioral
variation. They are clear that some differences between groups are due to beliefs and values that are stored in human
minds and transmitted from person to person and thus preserved through time, and agree that models from
epidemiology and population genetics may help explain how ideas spread through populations. However, to explain the
content of such ideas, evolutionary psychologists emphasize the information processing properties of human minds. For
example, Steve Pinker writes:
The striking features of cultural products, namely their ingenuity, beauty, and truth (analogous to an organism's
complex design), comes from the mental computations that directthat is, inventthe mutations, and that
acquirethat is, understand, the characteristics. ... Models of cultural transmission do offer insight on other
features of cultural change, particularly their demographics. ... They explain how ideas become popular, but not
where ideas come from. (1997, pp. 20910)
The idea here is that complex cultural adaptations do not arise gradually, as they do in genetic evolution. New
symphonies don't appear bit by bit as a consequence of the differential spread and elaboration of slightly better and
better melodies. Rather they emerge from people's minds, and their functional complexity arises from the action of
those minds. The same goes for all aspects of cultureart, ritual, and technologyor at least so Pinker thinks. Culture
is useful and adaptive because populations of human minds store the best efforts of previous generations of minds.
On this view, transmitted culture is like a library. Libraries preserve knowledge created in the past. Librarians shape
the contents of libraries as they decide which books are bought and which are discarded. But knowing about libraries
and librarians does not help us understand the complex details of plot, character, and style that distinguish a
masterpiece from a potboiler. To understand these things, you have to understand the minds of the authors who have
written these books. In the same way, cultures store ideas and inventions, and people's decisions (often unconscious)
about which ideas to adopt and which to reject shape the content of a culture. Evolutionary theories may help explain
why, for example, traditional Fang religious beliefs are replaced by alternative beliefs like Christianity or Islam.
However, to understand the structures of complex, adaptive cultural practices, religious beliefs, tools, or institutions,
you have to understand the evolved psychology of the mind that gave rise to that complexity, and how that
psychology interacts with its environment.
Students of the history of biology will recognize this picture of cultural evolution as similar to a frequently popular, but
incorrect, theory of genetic evolution. Very few of Darwin's contemporaries accepted (or even understood) his idea that
adaptations arose through the gradual accumulation of small variations. Some of his most ardent supporters, like T. H.
Huxley, thought that new types arose in big jumps, and then natural selection determined which types spread. In this
century,
end p.26
Richard Goldschmidt and the late Steven J. Gould, among others, championed this theory of evolution. It is wrong,
because the likelihood that a complex adaptation will arise by chance is nil. Of course, this objection does have not any
force for cultural evolution, precisely because innovations are highly nonrandom, and thus it is quite plausible that
cultural evolution mainly involves the culling of innovations, innovations whose adaptive complexity can be understood
only in terms of human psychology.
2 Culture Often Evolves by the Accumulation of Small Variations
This picture is a useful antidote to the view that cultural evolution is just like genetic evolution. Cultural variation is not
transmitted in the same way as genesideas are not poured from one head into another. These evolutionary
psychologists are surely right that every form of learning, including social learning, requires an information-rich innate
psychology, and that some of the adaptive complexity we see in cultures around the world stems from this information.
Nor does culture evolve through the gradual accumulation of memes, gene-like particles that arise through blind
mutations and spread by natural selection. Innovations are not purely random, and our evolved psychology certainly
must influence the rate and direction of cultural adaptation. Plausibly some cultural adaptations, especially relatively
simple ones, are invented in one step by individuals. Only a few good easy ways to tie a knot that makes a loop in the
end of a rope are currently known. Some individual might invent a new and perhaps better one.
However, we think that it is much less plausible that most complex cultural adaptations, things like kayaks and
institutions like hxaro exchange, arise in this way. Isaac Newton famously remarked If I have seen further it is by
standing on the shoulders of giants. For most innovators in most places at most times in human history, a different
metaphor is closer to the truth. Even the greatest human innovators are, in the great scheme of things, midgets
standing on the shoulders of a vast pyramid of other midgets. Individual minds rarely give birth to complex cultural
adaptations. The evolution of languages, artifacts, and institutions can be divided up into many small steps, and during
each step, the changes are relatively modest. No single innovator contributes more than a small portion of the total, as
any single gene substitution contributes only marginally to a complex organic adaptation.
The history of technology shows that complex artifacts like watches are not hopeful monsters created by single
inventors (Basalla, 1988). The watchmakers' skills have been built up piecemeal by the cumulative improvement of
technologies at the hands of many innovators, each contributing a small improvement to the ultimately amazing
instrument. Many competing innovations have been tried out at each step, most now forgotten except by historians of
technology. A little too loosely, we think, historians of technology liken invention to mutation because both create
variation and compare the rise of the successful technology to prominence with the action of natural selection. Forget
watches for a moment. The historian of technology Henry Petroski (1992) documents how even simple modern
artifacts like forks, pins, paper clips, and zippers evolve haltingly through many trials, some
end p.27
to capture the market's attention and others to fall by the wayside. No one knows how many failed designs have
languished on inventors' workbenches. Cultural evolution is more complicated than bare-bones random variation and
selective retention. To anticipate our argument, the decisions, choices, and preferences of individuals act at the
population level as forces that shape cultural evolution, along with other processes like natural selection. We urge great
care with loose analogies to mutation and selection because there are several distinct processes rooted in human
psychology that lead to the accumulation of beneficial cultural variations, each with a distinctive twist of its own and
none exactly like natural selection.
While human innovations are not like random mutations, they have been, until recently, small, incremental steps. The
design of a watch is not the work of an individual inventor but the product of a watchmaking tradition from which the
individual watchmaker derives most, but not quite all, of his designs. This is not to take anything away from the real
heroes of watchmaking innovation, like John Harrison. Harrison delivered a marine chronometer accurate enough to
calculate longitude at sea to the British Board of Longitude in 1759. He used every device of the standard clockmaker's
art and a number of clever tricks borrowed from other technologies of the time, such as using bimetallic strips (you
have seen them coiled behind the needles of oven thermometers and thermostats) for compensating the critical
temperature-sensitive timekeeping elements of his chronometers. His achievement is notable for the sheer number of
clever innovations he madethe bimetallic temperature compensators, a superb escapement, jewel bearings requiring
no lubrication, substitutes for the pendulum. Also notable is his extraordinary personal dedication to the task. By dint
of 37 years of unremitting effort and a first-rate mechanical mind, sustained by incremental payments against a British
Admiralty prize he was a good candidate to win, he made a series of ever smaller, better, more rugged seagoing
clocks. Eventually he delivered Number 4, with an accuracy of better than one-fortieth of a second per day,
significant improvement over one minute per day for the best watches of his day (Sobel, 1995). Only the rarest of
inventors makes an individual contribution of this magnitude. Yet, like every great inventor's machine, Number 4 is a
beautiful homage to the art and craft of Harrison's predecessors and colleagues as much as to his own genius. Without
a history of hundreds or thousands of ancient and mostly anonymous inventors, Harrison would not even have
conceived the idea of building a marine chronometer, much less succeeded in doing so. William Paley's famous
argument from design would better support a polytheistic pantheon than his solitary Christian Creator; it takes many
designers to make a watch.
Consider a much simpler nautical innovation, the mariners' magnetic compass. Its nameless innovators must have been
as clever as James Watt, Thomas Edison, Nikola Tesla, and the other icons of the Industrial Revolution whose life
stories we know so much better. First, someone had to notice the tendency of small magnetite objects to orient in the
earth's weak magnetic field in nearly frictionless environments. The first known use of this effect was by Chinese
geomancers, who placed polished magnetite spoons on smooth surfaces for purposes of divination. Later, Chinese
mariners built small magnetite objects or magnetized needles that could be floated on water to indicate direction at
sea. Ultimately, Chinese seamen
end p.28
developed a dry compass with the needle mounted on a vertical pin bearing, like a modern toy compass. Europeans
acquired this form of compass in the late medieval period. European seamen developed the card compass, in which a
large disk was attached to the magnets and marked with 32 points. This compass was not merely used to indicate
direction but was rigidly mounted at the helmsman's station, with the position of the bow of the ship marked on the
case. Now the helmsman could steer a course as accurate as one sixty-fourth of a circle by aligning the bow mark on
the case with the appropriate compass point. Compass makers learned to adjust iron balls near the compass to zero
out the magnetic influence from the ship, an innovation that was critical after steel hulls were introduced. The first
such step was a small one, replacing the iron nails of the compass box with brass screws. Later, the compass was
filled with a viscous liquid and gimbaled to damp the ship's motion, making the helmsman's tracking of the correct
heading still more accurate. Even such a relatively simple tool as the mariner's compass was the product of numerous
innovations over centuries and in space across the breadth of Eurasia (Needham, 1979).
Other aspects of culture are similar. Take churches. Modern American churches are sophisticated organizations for
supplying certain kinds of social services to their parishioners. The successful ones derive from a long tradition of
incorporating good ideas and abandoning bad ones. Surprisingly, one of the unsuccessful ideas turns out to be hiring
educated clergy. College-educated clergy are good intellectuals but too frequently deadly dull preachers, consumed with
complex doubts about the traditional verities of Christian faith. In the United States, successful religious innovation is
handsomely rewarded, due to the free-market character of Protestant religious institutions. Many ambitious religious
entrepreneurs organize small sects mostly drawing upon a set of stock themes called Fundamentalism. Only a tiny
fraction of sects expand beyond the original cohort recruited by the initial innovator. The famous celibate Shakers are
an example of a sect that failed to recruit followers, but there have been many others. A much smaller number are
successful and have grown to become major religious institutions, largely replacing traditional denominations. The
Methodists and the Mormons are examples of very successful sects that became major churches.
Religious innovators build in small steps. Mormon theology is very different from that of most of American
Protestantism. Nevertheless, John Brooke (1994) shows how Joseph Smith's cosmology mixes frontier Protestantism
with hermetic ideas, Masonry, divination schemes for finding treasure, and spiritual wifery (polygamy). He traces the
spread of these ideas from Europe to specific families in Vermont and New York where Smith and his family resided.
Smith invented little and borrowed much, although we properly credit him with being a great religious innovator. His
innovations were, like Harrison's, large compared to those introduced by most other ambitious preachers.
Individuals are smart, but most of the cultural artifacts that we use, the social institutions that shape our lives, and
the languages that we speak, are far too complex for even the most gifted innovator to create from scratch. Religious
innovations are a lot like mutations, and successful religions are adapted in sophisticated ways beyond the ken of
individual innovators. The small frequency of successful innovations
end p.29
suggests that most innovations degrade the adaptation of a religious tradition and only a lucky few improve it. We
don't mean to say at that complex cultural institutions can't ever be improved by the application of rational thought.
Human innovations are not completely blind, and if we understood cultural evolutionary processes better, they would
be less blind. But human cultural institutions are very complex and rarely have been improved in large steps by
individual innovators.
3 Culture Permits Adaptation to a Wide Range of Environments without Domain-Specific
Modules
Cultural adaptation has played a crucial role in human evolution. Human foragers adapted to a vast range of
environments. The archeological record indicates that Pleistocene foragers occupied virtually all of Africa, Eurasia, and
Australia. The data on historically known hunter-gatherers suggests that to exploit this range of habitats, humans used
a dizzying diversity of subsistence practices and social systems. Consider just a few examples. The Copper Inuit lived in
the high Arctic, spending summers hunting near the mouth of the MacKenzie River and the long dark months of the
winter living on the sea ice and hunting seals. Groups were small and highly dependent on men's hunting. The !Xo
lived in the central Kalahari. Women's collecting of seeds, tubers, and melons accounted for most of their calories. Men
hunted impala and oryx. They survived fierce heat and lived without surface water for months at time. Both the !Xo
and the Copper Inuit lived in small, nomadic bands linked together in large band clusters by patrilineally reckoned
kinship. The Chumash lived on the productive California coast around present-day Santa Barbara, gathering shellfish
and seeds and fishing the Pacific from great plank boats. They lived in large permanent villages with division of labor
and extensive social stratification.
This range of habitats, ecological specializations, and social systems is much greater than that of any other animal
species. Big predators like lions and wolves have very large ranges compared to other animals, but lions never
extended their range beyond Africa and the temperate regions of western Eurasia; wolves were limited to North
America and Eurasia. The diet and social systems of such large predators are similar throughout their range. They
typically capture a small range of prey species using one of two methods: they wait in ambush, or combine stealthy
approach and fast pursuit. Once the prey is captured, they process it with tooth and claw. The basic simplicity of the
lives of large carnivores is captured in a Gary Larson cartoon in which a Tyrannosaurus rex contemplates its monthly
calendarevery day has the notation Kill something and eat it. In contrast, human hunters use a vast number of
methods to acquire and process a huge range of prey species, plant resources, and minerals. For example, Hillard
Kaplan, Kim Hill, and their coworkers at the University of New Mexico have observed the Ach, a group of foragers
who live in Paraguay, take 78 different species of mammals, 21 species of reptiles, 14 species of fish, and over 150
species of birds using an impressive variety of techniques that depend on the prey, the season, the weather, and
many other factors. Some animals are trackeda difficult skill that requires a great deal of ecological and
environmental knowledge. Others are called by imitating the prey's mating or
end p.30
distress calls. Still others trapped with snares or traps or smoked out of burrows. Animals are captured and killed by
hand, shot with arrows, clubbed, or speared (Kaplan et al., 2000).
And this is just the Achif we included the full range of human hunting strategies, the list would be much longer. The
lists of plants and minerals used by human foragers are similarly long and diverse. Making a living in the Arctic
requires specialized knowledge: how to make weatherproof clothing, how to provide light and heat for cooking, how to
build kayaks and umiaks, how to hunt seals through holes in the sea ice. Life in the central Kalahari requires equally
specialized but quite different knowledge: how to find water in the dry season, which of the many kinds of plants can
be eaten, which beetles can be used to make arrow poison, and the subtle art of tracking game. Survival might have
been easier on the balmy California coast, yet specialized social knowledge was needed to succeed in hierarchical
Chumash villages, compared to the small egalitarian bands of the Copper Inuit and the !Xo.
So maybe humans are more variable than lions, but what about other primates? Don't chimpanzees have culture?
Don't different populations use different tools and foraging techniques? There is no doubt that great apes do exhibit a
wider range of foraging techniques, more complex processing of food, and more tool use than other mammals (Byrne,
1999). However, these techniques play a much smaller role in great ape economy than they do in the economies of
human foragers. Anthropologists Kaplan, Hill, and their coworkers (2000) compare the foraging economies of a number
of chimpanzee populations and human and human foraging groups. They categorize resources according to the
difficulty of acquisition: Collected foods like ripe fruit and leaves can be simply collected from the environment and
eaten. Extracted foods must be processed before they can be eaten. Examples include fruits in hard shells, tubers or
termites that are buried deep underground, honey hidden in hives in high in trees, and plants that contain toxins that
must be extracted before they can be eaten. Hunted foods come from animals, usually vertebrates, that must be
caught or trapped. Chimpanzees are overwhelmingly dependent on collected resources, while human foragers get
almost all of their calories from extracted or hunted resources.
Humans can live in a wider range of environments than other primates because culture allows the relatively rapid
accumulation of better strategies for exploiting local environment compared to genetic inheritance. Consider learning
in the most general sense; every adaptive system learns about its environment by one mechanism or another.
Learning involves a tradeoff between accuracy and generality. Learning mechanisms generate contingent behavior
based on observations of the environment. The machinery that maps observations onto behavior is the learning
mechanism. One learning mechanism is more accurate than another in a particular environment if it generates more
adaptive behavior in that environment. A learning mechanism is more general than another if it generates adaptive
behavior in a wider range of environments. Typically, there is a trade-off between accuracy and generality, because
every learning mechanism requires prior knowledge about which environmental cues are good predictors of the actual
state of the environment and what behaviors are best in each environment. The more detailed
end p.31
and specific such knowledge is for a particular environment, the more accurate is the learning rule. Thus for a given
amount of stored knowledge, a learning mechanism can either have detailed information about a few environments, or
less detailed information about many environments.
In most animals, this knowledge is stored in the genes, including, of course, the genes that control individual learning.
Consider the following thought experiment. Pick a wide-ranging primate species, let's say baboons. Then capture a
group of baboons, and move them to another part of the natural range of baboons in which the environment is as
different as possible. You might, for example, transplant a group from the lush wetlands of the Okavango Delta to the
harsh desert of western Namibia. Next, compare their behavior to the behavior of other baboons living in the same
environment. We believe that after a little while, the experimental group of baboons would be quite similar to their
neighbors. The reason that the local and transplanted baboons would be similar, we think, is the same reason that
baboons are less variable than humans: they acquire a great deal of information about how to be a baboon genetically
it is hard wired. To be sure, they have to learn where things are, where to sleep, which foods are desirable, and
which are not, but they can do this without contact with already knowledgeable baboons because they have the basic
knowledge built in. They can't learn to live in temperate forests or arctic tundra, because their learning systems don't
include enough innate information to cope with those environments.
Human culture allows learning mechanisms to be both more accurate and more general, because cumulative cultural
adaptation provides accurate and more detailed information about local environments. Evolutionary psychologists argue
that our psychology is built of complex, information-rich, evolved modules that are adapted for the hunting and
gathering life that almost all humans pursued up to a few thousand years ago. Fair enough, but individual humans
can't learn how to live in the Arctic, the Kalahari, or anywhere else. The reason is that our information-rich, evolved
psychology doesn't contain the necessary information. Think about being plunked down on an Arctic beach with a pile
of driftwood and sealskins and trying to make a kayak. You already know a lotwhat a kayak looks like, roughly how
big it is, and something about its construction. Nonetheless, you would almost certainly fail. (We're not trying to
belittle you; we've read a lot about kayak construction, and we'd make poor specimens, if we were lucky.) And,
supposing you did make a passable kayak, you'd still have a dozen or so similar tools to master before you could make
a contribution to the Inuit economy. And then there are the social mores of the Inuit to master. The Inuit could make
kayaks, and do all the other things that they needed to do to stay alive, because they could make use of a vast pool
of useful information available in the behavior and teachings of other people in their population.
The reason the information contained in this pool is adaptive is that combination of learning and cultural transmission
leads to relatively rapid, cumulative adaptation. Populations of people connected over time by social learning can
accumulate the solutions to problems that no individual could do on his or her own. Individuals don't have to be too
smart, because simple heuristics like correlation detection and imitation of the successful can produce clever
adaptations
end p.32
when averaged over a population of individuals and over generations of time. Even if most individuals imitate with only
the occasional application of some simple heuristic, many individuals will be giving traditions a nudge in an adaptive
direction, on average. Cultural transmission preserves the many small nudges, and exposes the modified traditions to
another round of nudging. Very rapidly by the standards of ordinary evolutionary time, and more rapidly than evolution
by natural selection alone, weak, general-purpose decision-making forces generate new adaptations. The complexity of
cultural traditions can explode to the limits of our capacity to imitate or be taught them, far past our ability to make
careful, detailed decisions about them. We let the population-level process of cultural evolution do the heavy lifting for
us.
4 Cumulative Cultural Adaptation Involves a Trade-Off
As far as many evolutionary social scientists are concerned, Richard Dawkins is way up in the pantheon of
contemporary evolutionary thinkers. (For sure, he makes most top five lists.) Nonetheless, most place little stock in
Dawkins's argument about rogue memes, regarding it as an imaginative device for explaining the nature of replicators
rather than a serious proposal about human cultural evolution. Instead, most evolutionary social scientists tend to think
that all forms of learning are processes whereby the organism exploits statistical regularities in the environment so as
to develop a phenotype that is well suited to the existing environment. Over time, selection shapes psychology so that
it uses predictive cues to generate adaptive behavior. Social learning is just another learning mechanism that exploits
cues available in the social environment. As a result, to oversimplify just a bit, most evolutionary social scientists
expect people to learn things that were good for them in the Pleistocene and perhaps in the smaller scale human
societies that resemble those of the Pleistocene. Adaptation arises from the information-processing capacities built into
the human brain by natural selection acting on genes. These mechanisms may give rise to maladaptive behaviors
nowadays, but it's got nothing to do with culture and everything to with the fact that environments are far outside of
the parameters to which our innate decision-making talents are calibrated.
We think this argument neglects an important trade-off. Selection cannot create a psychology that gets you only the
adaptations and always rejects maladaptive variants. Why not? Because of the accuracygenerality trade-off. General-
purpose learning has to be inaccurate to have bearable costs. Individuals, having let the population do the thinking,
are in no position to accurately assess the results. Think of using the taste of a substance as a guide to whether it is
edible or not. Many toxic plant compounds have a bitter taste. If you are tempted to eat something, and it is bitter,
you are well advised to reject it as food. On the other hand, many toxins do not taste bitter, so bitterness is no
infallible guide to edibility. Further, many bitter plants, for example acorns, can be rendered edible by cooking or
leaching. Further still, some bitter-tasting plant compounds have medicinal value. People can actually grow fond of
some bitter-tasting food and drink. Think gin-and-tonic. A bitter taste is only a rough-and-ready guide to what is
edible and what is not. In principle, you could do much better if you had a modern food
end p.33
chemist's laboratory on the tip of your tongue, one that could separately sense every possible harmful and helpful
plant compound rather than having just four very general taste senses. Some animals are much better at these things
than humanswe have a rather poor sense of smell, for example. But the number of natural organic compounds is
immense, and selection favors compromises that usually result in adaptive behavior and don't cost too much. A fancy
sense of smell requires a long muzzle to contain the sensory epithelium where all those fancy sensory neurons are
deployed, and plenty of blood flow to feed them. Bitter taste is a reasonably accurate and reasonably general sense for
screening substances for edibility, but it is far from a food chemist's laboratory or a dog's nose. To get the good, you
have to risk adopting the bad, because the evaluative machinery the brain deploys to exercise the various biases is
necessarily limited. Let's see why.
Tooby and Cosmides (1992, p. 104) define an adaptation as a reliably developing structure in the organism, which,
because it meshes with the recurrent structure of the world, causes the solution to an adaptive problem. They give
behavioral examples like inbreeding avoidance, the avoidance of plant toxins during pregnancy, and the negotiation of
social exchange. Evolutionary psychologists are prone to wax eloquent over marvelous cognitive adaptations created by
natural selection. And they are right to marvel; everyone should. Natural selection has created brains and sensory
systems that easily solve problems that stump the finest engineers. Making robots that can do anything sensible in a
natural environment is exceedingly difficult, yet a tiny ant with a few thousand neurons can meander over rough
ground hundreds of meters from its nest, find food, and return in a beeline to feed its sisters. Humans are able to
solve many astoundingly difficult problems as they go through daily life because natural selection has created
numerous adaptive information-processing modules in the human brain. Notably, the best examples involve tasks that
have confronted every member of our lineage in every environment over tens of millions of years of evolution, things
like visual processing and making inferences about causal processes. The list of well-documented examples that apply
to humans alone is short, and once again these psychological adaptations that provide solutions to problems that every
human, if not every advanced social vertebrate, facesthings like inbreeding avoidance, social contract reasoning,
mate choice, and language learning.
Cultural evolution also gives rise to marvelous adaptations. However, they are typically solutions to problems posed by
particular environments. Consider, once again, the kayaks built and used by the Inuit, Yupik, and Aleut foragers of the
North American Arctic. By Tooby and Cosmides's definition, kayaks are clearly adaptations. These peoples' subsistence
was based on hunting seals (and sometimes caribou) in Arctic waters. A fast boat was required to get close enough to
these large animals to reliably hit and kill them with a harpoon or spear. Kayaks are a superb solution to this adaptive
problem. Their slim, efficient hull design allows sustained paddling at up to 7 knots. They were extremely light
(sometimes less than 15 kg), yet strong and seaworthy enough to safely navigate rough, frigid northern seas. They
were also reliably developingevery successful hunter built or acquired oneuntil firearms allowed hunting from
slower, but more stable, and more widely
end p.34
useful umiaks. For at least 80 generations, people born into these societies acquired the skills and knowledge
necessary to construct these boats from available materialsbone, driftwood, animal skin, and sinew.
Certainly, no evolved kayak module lurks in the recesses of the human brain. People have to acquire the knowledge
necessary to construct a kayak using the same evolved psychology that people use in other environments to master
other crucial technologies. No doubt this requires an evolved guidance system. People must be able to evaluate
alternatives, to know that boats that don't sink and are easy to paddle are better than leaky, awkward designs. They
have to be able to judge, to some significant degree, whose boats are best, and when and how to combine information
from different sources. The elaborate psychological machinery that allows children to bootstrap any knowledge of the
world is also clearly crucial. People can't learn to make kayaks unless they already understand something about the
properties of materials, how to categorize plants and animals, the manual skills to make and use tools, and so on and
on. This guidance system is not domain general, in the sense that it allows people to learn anything. It is highly
specific to life on earth, in a regime of middle-sized objects, relatively moderate temperatures, living creatures,
humanmade artifacts, and small social groups. However, it is domain general in the sense that there is nothing in our
evolved psychology that contains the specific details that make a difference in the case of kayaksknowledge of the
dimensions, materials, and construction methods that make the difference between constructing a 15-kilogram craft
that safely skims across the arctic seas and death by drowning, hypothermia, or starvation. These crucial details were
stored in the brains of each generation of Inuit, Yupik, and Aleut peoples. They were preserved and improved by the
action of a population of evolved psychologies, but using mechanisms that are equally useful for preserving a vast
array of other kinds of knowledge.
Such widely applicable learning mechanisms are necessarily more error prone than highly constrained, domain specific
ones. As Tooby and Cosmides (1992, pp. 1048) have emphasized, broad general problems are much more difficult to
solve than simple constrained ones. A kayak is a highly complex object, with many different attributes or dimensions.
What frame geometry is best? Should there be a keel? How should the components of the frame be joined? What kind
of animal provides the best skin? Which sex? Harvested at what time of year? Designing a good kayak means finding
one of the very few combination of attributes that successfully produces this highly specialized boat. The number of
combinations of attributes grows geometrically as the number of dimensions increases, rapidly exploding into an
immense number. The problem would be much easier if we had a kayak module that constrained the problem, so we
would have fewer choices to evaluate. However, evolution cannot adopt this solution, because environments are
changing far too quickly and are far too variable spatially for selection to shape the psychologies of arctic populations
in this way. The same learning psychology has to do for kayaks, oil lamps, waterproof clothing, snow houses, and all
the other technology necessary to survive in the Arctic. It also has to do for birch bark canoes, reed rafts, dugout
canoes, planked rowboats, rabbit drives, blowguns, hxaro exchange, and the myriad marvelous, specialized,
environment-specific technologies that human hunter-gatherers have culturally evolved.
end p.35
For the same reason that it is impossible to build a learning device that is both general purpose and powerful, selection
cannot shape social learning mechanisms so that they reliably reject maladaptive beliefs over the whole range of
human experience. A young Aleut cannot readily evaluate whether the kayaks he sees his father and cousins using are
better than alternative designs. He can try one or two modifications and see how they work, and he can compare the
performance of the different designs he sees. But small samples and noisy data will severely limit his ability to
optimize kayak design by individual effort. From the point of view of an isolated individual, such general -purpose
learning mechanisms are both costly and weak. The repeated action of weak domain-general mechanisms by a
population of individuals connected by cultural inheritance over many generations can generate complex adaptations
like kayaks, but individuals must adopt what they observe with only marginal modifications. As a result, we may often
adopt maladaptive behaviors.
When it is difficult to determine which cultural variant is best, natural selection favors heavy reliance on imitating
others and low confidence in one's own experience (Boyd & Richerson, 1985, 1988). The natural world is complex and
variable from place to place and time to time. Is witchcraft effective? What causes malaria? What are the best crops
to grow in a particular location? Are natural events affected by human pleas to their governing spirits? The relationship
between cause and effect in the social world is often equally hard to discern. What sort of person should one marry?
How many husbands are best? Tibetan women often have two or three. What mixture of devotion to work and family
will result in the most happiness or the highest fitness? Students of the diffusion of innovations note that trialability
and observability are some of the most important regulators of the spread of ideas from one culture to another
(Rogers 1983, pp. 2312). Many important cultural traits, including things like family organization, have low trialability
and observability and are generally rather conservative. We act as if we know that sensible choices about such
behaviors are hard to make and that we are liable to err if we try to depart far from custom.
As the effects of biases weaken, social learning becomes more and more like a system of inheritance. Much of an
individual's behavior is thus a product of beliefs, skills, ethical norms, and social attitudes that are acquired from others
with little, if any, modification. To predict how individuals will behave, one must know something about their cultural
milieu. This does not mean that the evolved predispositions that underlie individual learning become unimportant.
Without the ability to taste and dislike bitter substances, and many similar innate senses and predilections, cultural
evolution would be uncoupled from genetic evolution. It would provide none of the fitness-enhancing advantages that
normally shape cultural evolution and produce adaptations. However, once cultural variation is heritable, it can respond
to selection for behaviors that conflict with genetic fitness. Selection on genes that regulate the cultural system may
still favor the ability and inclination to rely on imitation because it is beneficial on average. Selection will balance the
advantages of imitation against the risk of catching pathological superstitions. Our propensity to adopt dangerous
beliefs may be the price we pay for the marvelous power of cumulative cultural adaptation. As the saying goes, you
get what you pay for.
end p.36
We conclude by arguing that this way of thinking about cultural adaptation has implications for the topic of this book,
the notion of innateness.
One of Darwinism's central accomplishments is the explanation of designspectacularly improbable organs of extreme
perfection like the eyes of animals are explained by the gradual accumulation of the genes that give rise to these
traits through the process of natural election. While the development of such complex, highly functional traits always
depends on the interaction of genes and environment, the design information that causes functional eyes to develop
generation after generation comes from the genes. The eyes of a cod and an octopus are similar in design (Land &
Nilsson, 2002): both have spherical lenses that are located about 2.5 lens radii from the retina; in both, the index of
refraction of those lenses gradually increases toward their center. In both species, the eyes are oriented by six muscle
groups, one pair for each independent axis of rotation, and in both, different muscles adjust the focus by moving the
lens. These structures evolved independently, and develop quite differently. To be sure, environmental inputs will be
crucialthe development of functional eyes depends on light input, for example. But the design of these eyes can only
be explained in terms of natural selection acting on the genes that control this development. Put another way, design
doesn't come from the environment; it is innate.
The same argument applies to complex, adaptive behavior in most organisms. Like the development of eyes, behavior
arises from the interaction of the environment with innate, genetically transmitted developmental mechanisms,
especially various forms of learning. Simple, relatively domain-general mechanisms such as classical conditioning can
shape behavior in adaptive ways, but, if evolutionary psychologists are right, they are unlikely to generate the many
forms of highly complex adaptive behavior seen in nature. Behaviors like the long-distance stellar navigation of indigo
buntings or the spectacular feats of memory of acorn woodpeckers require a highly structured, information-rich
psychology. The design latent in this psychology comes from the genes, and the details of this design are explained by
the action of natural selection.
The cumulative cultural evolution of spectacular human adaptations like kayaks, bows and arrows, and the like
complicates this picture. Now there are two processes that generate design, natural selection acting on genes, and a
variety of processes acting on culturally transmitted variation. If we are right, cultural adaptation has allowed the
human species to adapt to a wide range of environments because its design information is stored in brains, not genes.
By linking the efforts of many people over many generations, relatively crude, relatively domain-general mechanisms
can generate cultural adaptations to a wide range of environments much more rapidly than natural selection can
generate genetic adaptations. True, cultural adaptation depends on the evolved psychological mechanisms that allow
social learning, and, again if the evolutionary psychologists are right, the learning mechanisms that shape cultural
adaptations over time depend on a large number of evolved psychological mechanisms. However, unlike other forms of
learning, the design information that generates the adaptations is not stored in the genes.
end p.37
Thus in cultural organisms it becomes interesting to ask, in any particular case, where does the design come from,
inside from genes shaped by natural selection, or outside from adaptive, cumulatively evolved information stored in
other human brains? The right question is not Is it nature or nurture? but Is it genes or culture? The answer to this
question is interesting, because the dynamic processes that cause cultural adaptation can lead to systematically
different outcomes from those of natural selection acting on genetic variation (Richerson & Boyd, 2005). Some of these
differences are adaptive. Culture evolves faster than genes and can track more rapidly varying environments. Symbolic
marking divides human populations up into semiisolated pseudospecies, as it were, that adapt finely to their local
environment, resisting the cultural analog of gene flow from other environments. Some of these differences are
maladaptive. The fact that much culture is transmitted nonparentally allows considerable scope for the evolution of
selfish cultural variants. A theory of how evolving genes interact with environments to determine behavior is adequate
for most organisms, but in humans, evolving culture is an essential part of the explanatory problem.
end p.38
3 About 17 (+/ 2) Potential Principles about Links between the Innate Mind and
Culture
Preadaptation, Predispositions, Preferences, Pathways, and Domains
Paul Rozin
I put forth here a series of potential principles, based on a combination of common sense and evidential support, and
with a historical perspective, from the viewpoint of a psychologist who has researched in the area of brain and mind,
preference and beliefs, and cultural influences. I illustrate these points or principles with examples, often chosen from
the world of food, my principal area of specialization.
1 The Middle Road Is (Almost Always) Right: A Balance of Innate Predispositions and
Enculturation (and Perhaps Modularity and General-Purpose Systems)
Arguments for balance either fall on deaf ears or receive the agreement that usually comes with obvious truismsand
then are ignored. Balance does not make news. Solomon Asch's (1952) truly balanced view of human nature,
recognizing both the importance of the human biological heritage and the power of culture, was underappreciated. It is
absurd to deny the human primate heritage, to posit that hundreds of thousands of years or more of evolution have
left no mark. On the other hand, it is hard to believe that almost all of what it means to be human can be
comprehended without taking into account the power of culture, and the enormous capacity of the human brain to
acquire things. As Asch has said: In short, we start with the assumption that individual men possess authentic
properties distinctive of Homo sapiens and that their actions in society alter them in authentically distinctive ways (p.
119). Comparing an unenculturated to a fully enculturated human being, Asch notes: He would possess imagination,
but not that which produces wit, comedy or tragedy. He would have a self, but not that which can stand in judgment
upon itself (p. 136). The matter becomes more interesting because innate factors have a substantial effect on the
evolution of culture, and, as well, culture has had some effect on human innate capacities, as I will discuss under a
later principle.
end p.39
2 Finding an Adaptive Account Is Just a Beginning; Adaptive Accounts Can Be Misleading
as to Mechanism and the Process of Evolution or Adoption
It is a perfectly reasonable, often productive exercise to hypothesize an adaptive value to a particular human activity
or function. Indeed, one fundamental dimension of explanation within psychology is to assign adaptive values to
mental and behavioral features of humans and other animals. Of course, we have to remember that humans are
adapted, in most cases, to their ancestral as opposed to their current environment. In some cases, those environments
may be similar, but for the important case of food, the ancestral and current developed-world environments are
almost opposite to one another (see principle 13).
In any event, adaptive accounts are hypotheses, to be supported or rejected on the basis of evidence. And one type of
evidence is an argument for evolvability, that is, creation of a conceivable pathway through which the adaptation was
accomplished. It is often difficult to account for complex structures in terms of a set of gradual changes, each of which
is more adaptive than its predecessor, but this can be, and has been, done.
I illustrate the problem of evolvability with an example from culinary history. Solomon Katz and his colleagues (Katz et
al., 1974) have nicely demonstrated that the procedure for preparing tortillas, in the ancient Americas, has nutritional
advantages. By boiling corn in alkali, an essential part of the procedure, three nutritionally important improvements
occur. First, the alkali used (calcium hydroxide) adds calcium, a mineral in short supply in the ancient Meso-American
diet. Second, the resulting alkalinity increases the availability of an essential amino acid that is otherwise bound in
such a way as to not be utilizable. Third, an important vitamin, niacin, is also bound in corn in such a way as to make
it unavailable, except in an alkaline medium.
Katz and colleagues (1974) also show that across the Americas, the more a society relies on corn, the more likely it is
to use the tortilla technology in preparing the corn. This study is one of the best documentations of the adaptive value
of a cultural practice. But there is an evolvability problem. One is unlikely to happen on the complex tortilla production
technology by chance. And all of the demonstrated nutritional effects are subtle and cumulative; it is not clear how
these effects could be observed. It is not like the case of bitter manioc (see Rozin, 1982); the cyanide in the manioc
causes striking symptoms promptly, so the effectiveness of the leaching technique that removes it would be easily
observable.
While working in a rural village in Mexico, I asked many of the residents why they boil corn in alkali as a first step in
making tortillas (Rozin, 1982). The men had no idea and, in fact, didn't know how to make tortillas. The women had
no ready answer, but the most common response was that the alkali processing makes it easier to roll out a tortilla,
by softening the corn. I asked one of the women to try and make some tortillas for me without alkali -boiled corn. She
was amused, but agreed to do so. She was right; the tortilla she made was more difficult to roll out, and had pieces of
whole corn in it. Now one can imagine how people would appreciate a procedure that has such a palpable effect. But
the question is: in terms of the origin of tortilla
end p.40
making, what is the relation between the rather subtle nutritional adaptive values and the more palpable culinary
adaptive value?
There is another interesting chapter to the tortilla story that has to do with cultural evolution. Corn was encountered
by the early European explorers, like Cortez, and was brought back to Spain and ultimately the rest of Europe. In spite
of the efficiency of corn as a crop, it was rarely adopted for human (as opposed to domesticated animal) consumption
in Europe. Why not? Because, I think (Rozin, 1982), of the simple fact that Cortez and his fellow explorers did not
have Spanish women on their early visits to the Americas. Mexican men don't know how to make tortillas, and Spanish
conquistadors didn't learn how to make them. So they brought corn back to Europe (perhaps with some very stale
tortillas) but not the technology to make them, and thus make the corn tastier and more nutritious. One woman
among the Spaniard males could have changed European and world culinary history.
3 The Innate Mind Does More Than Affect Cognitive Processing: It Affects Norms,
Cultural Institutions, Beliefs, Preferences, and Attitudes
In the last two decades, psychology has reawakened to the importance of affect, both in understanding human life in
general, and more particularly in understanding cognition (Davidson et al., 2001). As we consider the innate mind, we
should attend not only to computational mechanisms that function in the cognitive domain but also to affective
processes, including preferences and attitudes. These may facilitate or inhibit particular types of cognitive processing.
Innate preferences (as for sweet tastes) or aversions (as for bitter tastes) influence the types of interactions humans
have with their environment, hence their experience, hence their mind.
4 Innate Preferences Shape Cultural Environments and Institutions
Innate predispositions affect not only the life of the individual but the culture that is shaped by the collectivity of
individuals. In this way, culturally transmitted preferences and attitudes, as well as opportunities, institutions, and
environments, are influenced by genetic predispositions. Humans have powerful and clearly adaptive innate preferences
for sweet tastes and fatty textures (shared with rats and many other mammalian generalists; the preference for fatty
textures, while widespread, has never been properly shown to be present at birth, unlike the sweet preference;
Steiner, 1979). These preferences serve as useful guides to the recognition and consumption of sources of calories.
They motivate human efforts to discover, collect, and cultivate plants that will provide these sensory experiences. One
of the motivations for Europeans to explore the Americas was to develop environments in which to grow sugar cane.
The development of sugar extraction techniques, and ultimately, in the face of caloric oversupply, artificial sweeteners,
can be traced to the human predisposition to like sweet tastes (Mintz, 1985; Rozin, 1982). The elaborate processing of
chocolate beans to make manifest the fatty texture inherent in the fats within the bean, and the
end p.41
addition of sugar in substantial amounts to almost all human-consumed chocolate, testifies to the power of both sweet
and fat predispositions. In addition, sugar is widely used (e.g., in chocolate and coffee) to mask or compensate for
innately negative bitter tastes. The widespread availability of sugar, accomplished through accumulated cultural
discoveries, and institutionsincluding food corporationshas provided readily available new opportunities for human
predispositions to be regularly indulged and refined. Chocolate is a culturally evolved food that is more appealing to
humans than just about anything in nature, and extremely dense, calorically. Its principal natural source is a bitter
bean with unremarkable texture from Mexico, shaped to appeal exquisitely to the human palate by accumulated
discoveries, principally in western Europe (Coe & Coe, 1996.
5 Culture Is Powerful Enough to Reverse Innate Preferences
One should not underestimate the power of culture. There are abundant examples in which cultures have reversed
innate predispositions, whether in the sexual, social, or food domains. I will focus here on the food domain. Most
cultures display a strong preference for some food that is innately unpalatable. These preferences are typically based
on an acquired liking for an innately unpalatable taste. In Euro-American cultures, liking for tobacco, irritant spices like
chili pepper, coffee, bitter chocolate, a wide variety of alcoholic beverages, and some vegetables are examples. I will
briefly discuss the case of chili pepper, perhaps the most widely used spice in the world (Rozin, 1990a).
There is little doubt that the oral burn produced by the capsaicin in chili pepper is innately aversive; indeed, it probably
evolved to deter ingestion by mammals (Rozin, 1990a). The acquired liking for chili pepper (and other innately
negative oral experiences) is probably unique or almost unique to humans (Rozin & Kennel, 1983). The acquired
preference is powerful enough to have motivated the spread of chili pepper from its tropical American origins to
becoming a major constituent of the flavorings in most tropical and semitropical Asian and African cuisines. It is
accomplished in a still poorly understood process during socialization (reviewed in Rozin, 1990a). Whatever the ultimate
mechanism of acquisition of liking, culture provides coerced exposure, through the use of chili pepper in most savory
foods in many cuisines. This mere exposure, by itself, may promote the reversal of an innate aversion (Zajonc, 1968).
It is also possible that this exposure promotes opponent-process endorphin responses, which link a pleasant internal
state to the irritating sensory experience. Second, there is probably an important effect of social influence. Humans are
highly motivated to adopt the preferences of their elders; the intensely social structure of the human meal no doubt
makes this particularly likely in humans, as opposed to other animals. Third, it is possible that a peculiar, uniquely
human quality contributes to the preference reversal (and many other preference reversals). I have called this benign
masochism (Rozin, 1990a; Rozin & Schiller, 1980), but it could also be called thrill seeking. Humans seem to enjoy the
experience of negative inputs in contexts where they know these inputs are not actually threatening. Such is the case
for the irritating sensations of chili pepper, the fear induced in a roller coaster, or the sadness induced by a tragic
drama. In all
end p.42
of these cases, and many others, the negative experience occurs in a safe environment, with minimal risk of actual
harm. This seems to be an example of mind over matter, a pleasure induced by our awareness that the negative
signals we are getting are not indicative of real prospects of harm. One piece of evidence for this, for chili pepper, is
that the optimal level of hotness for most individuals who like chili pepper is just below the level that produces
strongly unpleasant experiences (Rozin, 1990a; Rozin & Schiller, 1980). All of this may fall under a general motivation
for mastery, of obvious adaptive value.
Finally, it is worth noting that one account of liking for chili, its production of elevated levels of brain endorphins, has
particularly interesting biological-adaptive implications. Endorphins seem to be involved in modulating pain experiences.
According to the opponent process model of learning (Siegel, 1977; Solomon, 1980), organisms learn to compensate
for certain types of internal disturbances by acquiring (or innately producing) opponent processes that neutralize these
disturbances. These compensations grow with exposure. One account of chili liking involves an overextension of this
adaptive system. The compensatory secretion of endorphins in response to the irritation of chili pepper, as it develops
with exposure, can come to dominate the initial negative response, and produce pleasure. In this scenario, the forced
exposure produced by cultural forces allows an adaptive opponent process to overshoot. Normally, the aversion to the
irritation would prevent repeated resamplings of a negative taste, but culture intervenes to make this happen.
6 Predispositions Affect Culture, but Culture Also Shapes the Genome
It goes without saying that the individual human is the principal initial shaping force in the development of culture.
However, organized human societies, and what may be called cultures, have been a part of the human environment for
tens of thousands of years. Writing, agriculture, and domestication have been a part of human life for many thousands
of years. This is more than long enough to allow the human-created environment, including changes in what promotes
fitness, to affect the human genome. For example, the decline in the importance of hunting for any individual or
group, and related changes, have relaxed the importance for possessing high visual acuity, and the invention of
eyeglasses has provided a totally adequate compensation for those with defects in acuity.
I will illustrate the effect of culture on genes with one example, milk (see Rozin & Pelchat, 1988, for an extended
treatment of this issue). Milk is the first and only food for humans and other mammals for some time after birth. It is
thus a complete food. Among its other properties, the substantial carbohydrate component of milk is entirely composed
of a unique sugar, lactose. Lactose is found only in milk, and is composed of two linked simple sugars, glucose and
galactose. The enzyme lactase, present only in the gut of young mammals, breaks lactose into its two component
sugars and allows it to be digested. As milk is found only in mammal mothers, it is a food available to nonhuman
mammals only in the nursing period. Appropriately, production of this enzyme declines to very low levels in virtually all
nonhuman mammals at about the time of weaning. Adult nonhuman mammals cannot
end p.43
efficiently digest milk: the substantial (about 40 percent of solids, depending on the species) lactose component cannot
be absorbed. Furthermore, it is fermented by bacteria in the hind gut, resulting in gas, cramps, and diarrhea, and
further inefficiencies in general absorption.
With domestication, milk became available as a food to humans postweaning, for the first and only time in mammalian
history. But milk is an unsatisfactory food for adults, because of lactose intolerance. This problem has been handled in
the evolution of humans in two opposed ways. One was to modify culture, the environment, to make milk digestible.
This is through the appropriately named culturing techniques. If milk ferments outside the body, under somewhat
controlled conditions, bacteria break down the lactose to its two digestible components, glucose and galactose. Humans
have developed various ways of doing this, producing, among other things, yogurts and cheeses. These milk products
are low in lactose, can be digested, and form excellent sources of nutrients.
The second route involves changes in the human genome, such that the normal deprogramming of lactose production
at about weaning time is blocked. This is controlled at a single genetic locus; mutations at this locus that block lactose
deprogramming would be adaptive for humans living in a milk-producing culture. We cannot at this time present a
satisfactory story about how this actually happened, but we know that it did happen. People from dairying cultures,
particularly of northern European origin (and a few African pastoral groups) have a high incidence of adult lactose
tolerance (Rozin & Pelchat, 1988; Simoons, 1982). These, of course, are the major milk-drinking cultures of the world.
Thus, for the case of milk, cultural advances made a particular genetic change adaptive, and that change occurred and
predominated in certain cultures.
7 Humans, among Mammals, Have Strong Proclivities to Develop Positive Attachments to
Activities and Objects: A Possible Adaptation to the Acquisition of Culture
Many have argued that humans have special evolved learning abilities that foster the acquisition of culture. One such
ability is the ability to develop strong likes and allegiances, which allows for the incorporation of norms. If one
likes/values a person, object, idea, or institution, allegiance to it comes naturally. Thus, a dieter who prefers cottage
cheese to ice cream is going to have an easier time of it. The distinction between intrinsic and extrinsic motivation
addresses this issue (e.g., Deci & Ryan, 1985; Lepper, 1983). Quite simply, intrinsic motivation is a more secure way
to maintain an activity, although strong social sanctions can be quite effective in some situations, if the penalties are
high and enforcement is consistent and efficient. We know relatively little about how values or likes arise, but it seems
that reinforcement and punishment are not effective in producing them (Deci & Ryan, 1985; Lepper, 1983). Social
influence and social perceptions seem, in general, to be very effective. Things like identification and imitation, desires
to be adult and like admired figures have been implicated.
Although nonhuman animals promptly and regularly develop strong dislikes (as for toxic foods), it is relatively difficult
to produce enduring likes (e.g., for foods)
end p.44
in nonhuman animals. Humans, on the other hand, have powerful likings for foods, and a wide variety of other objects
(e.g., stamps, sports). I have suggested (Rozin, 1982) that the flowering of positive intrinsic reactions in humans may
be a consequence of the evolution of culture. Adhering to a culture means not only avoiding prescribed things but
valuing positive, important things.
8 In General, in Animals and Humans, in Terms of Innate Biases and Perhaps Derived
Cultural Biases, Negative Events Have More Impact on Organisms Than Positive Events
The greater power of negative events has been noted a number of times, and dealt with systematically by Peeters and
his colleagues (Peeters, 1971, 1989). Recently, a wide range of evidence supporting negativity bias or negativity
dominance has been marshaled (Baumeister et al., 2001; Rozin & Royzman, 2001). Negativity bias manifests itself in
at least four ways (Rozin & Royzman, 2001): (1) Negative potency: negative events are subjectively more valenced
than objectively equivalent opposite positive events. Loss aversion is an example of this. (2) Negativity dominance:
even when one combines subjectively equated opposite valence events (a negative event whose rated negativity is the
same as the rated positivity for the corresponding positive event), the net outcome is usually negative. (3) Steeper
negative gradients: as one approaches (in time or space) a negative event, its negativity grows at a faster rate than
the growth of positivity as a positive event is approached. (4) Greater differentiation in negative events: there is a
richer vocabulary and more distinctions are made among negative than positive events. For example, in the Western
taxonomy of basic emotions, there is only one positive emotion (happiness), and there are four negative emotions
(anger, disgust, fear, and sadness).
Negativity dominance is particularly clear in cases of contagion. While brief contact of a positive entity by a negative
one often spoils the positive entity, brief contact of a negative entity by a positive entity usually has little, if any,
effect (Rozin et al., 1989; Rozin & Nemeroff, 1990). One touch with a cockroach spoils a favorite food, while there is
nothing one can touch to a pile of cockroaches to make them acceptable as food.
There are a number of adaptive accounts of negativity bias (summarized by Rozin & Royzman, 2001). First, the
ultimate negative event, death, is more final and negative than any positive event. Second, negative events are much
less frequent than positive events; hence they have more information value. Third, while the general response to
positive events is approach, there are diverse ways of responding to negative events, including attack (anger),
withdrawal (disgust or fear), or freezing in place (fear).
9 Both Culture and the Innate Mind Often Express Themselves in Terms of
Predispositions; Culture Often Operates by Promoting Default Accounts
There is abundant evidence that humans are predisposed to learn certain types of relationships that are, not
accidentally, represented in virtually all human
end p.45
languages. Humans are, in short, predisposed to learn at least certain aspects of language. This is a predisposition, not
a fixed limitation. That is, humans can learn arbitrary linguistic relationships, but it is more difficult. Language is a good
model for understanding humans in general. In both biological and cultural evolution, predisposition is a particularly
common way of producing an outcome. Imprinting involves predispositions, and it works because both the
predispositions for features of the target of imprinting resemble features of one's own species, and because the
environment conspires to virtually guarantee the presence of that very organism in the environment of the newborn or
young organism.
Within cultures, predispositions can often be described as defaults (Rozin, 2003). That is, cultures promote certain ways
of feeling, certain motivations, and certain ways of construing the world. These become default modes of operation.
Under most circumstances, the default arises and continues to occupy attention, or direct behavior. Other modes of
thought or behavior are possible, but do not usually occur because of the greater salience of the default. Under
conditions where the default is inadequate, alternatives may be tried.
Many of the cognitive differences between the left and right brain hemispheres or, respectively, the Western and
Eastern hemispheres (very, very roughly, including Asia, Africa, and Latin America as east, and North America and
Europe as west) can be described as defaults. The left hemisphere is capable of some gestalt processing, and the right
of some analytic processing, but each has its preferred mode of operation, and unless forced by circumstance, will
proceed in its analytic or holistic mode, respectively.
Similarly, the collectivist (East) cultures seem to have a more holistic approach to the world, looking more at
relationships and less at isolated individual components (Nisbett, 2003). This does not mean that Americans cannot
think holistically or Japanese, Chinese, or Indians analytically, but rather that they are predisposed otherwise.
Levy and Trevarthan's (1976) work on split brains illustrates predispositions related to hemisphericity beautifully in a
chimeric figure classification task, applied to split brains. I have reformulated the task, to make it conceptually clearer,
into a task with normal (as opposed to chimeric) figures, using the same illustrations and logic (see fig. 3.1).
Each hemisphere of the split brain is queried as to which of the bottom row of three in figure 3.1 goes with each of
the top entities. Note that, in this clever design, there is one choice that matches on visual form and another that
matches on function. Levy and Trevarthan found that the left hemisphere reliably chose the function match and the
right hemisphere the form match, such that, for example, the right hemisphere would match the birthday cake with
the similarly shaped hat and the left hemisphere with the fork and knife. I have repeated this test on American college
students, and find that almost all reliably pick either all three formal matches or all three functional matches. (This
observation suggests a default mode of response, but it would need to be shown to be stable across time and
generalized to similar tasks in order to qualify as a general difference in functional/formal default.) That is, normal
individuals may have a default way of looking at things, which is either formal or functional. But these same individuals
are quite
end p.46
Source: Modified from Levy and Trevarthan (1976), fig. 1.
FIGURE 3.1 Figure classification task to illustrate predisposition for formal versus semantic-
functional modes of processing. Participants (split-brained individuals or normal college students)
are asked to indicate which item in the bottom row corresponds to goes with each item on the
top row. Each item in the top row corresponds to one item on the bottom row on the basis of
resemblance in visual form and to another on the basis of semantic-functional criteria. The
presentations to participants were accomplished in three trials, in each of which only one of the
top items was present.
Source: Modified from Levy and Trevarthan (1976), fig. 1.
capable of understanding and applying the alternative strategy. This processing predisposition in normals no doubt
relates to some sort of hemispheric dominance.
In a similar type of design, Zaidel (1990), presented split-brained individuals with a peculiar face (fig. 3.2), in which
the features (eyes, nose, mouth) appear in switched positions.
It was found that when asked to point to the nose, the left hemisphere (right hand) pointed to the literal nose, now
in the eye position, whereas the right hemisphere (left hand) pointed to the normal position of the nose (now
represented by an eye). With American college students, I have found (unpublished data) that some spontaneously
point to the physical nose, and others to the proper location of the nose. This difference seems to illustrate a
privileging of relational/gestalt or analytic strategies, and maps nicely onto Nisbett's (2003) analysis of prominent East
Asian versus American modes of thought, although no data is currently available to compare such groups on this task.
end p.47
Source: Zaidel (1990).
FIGURE 3.2 Participants are asked to point to the nose on the face. The right hemisphere (left hand) of
split individuals with split brains typically points to the correct position for the nose (an ear in the
picture), while the left hemisphere (right hand) typically points to the literal nose, in the eye position.
American undergraduates are divided on which choice they make.
Source: Zaidel (1990).
Free associations can be used to look at predispositions, because they are, by definition, the first thing that comes to
mind (Rozin et al., 2002). For example, in response to the word chocolate, Americans are more likely to mention the
word fat in one form or another (fat, fatty, fattening) than are either Asian Indians or French.
Another technique that elicits defaults is the use of triads (illustrated in one form in fig. 3.1) or alternative associations.
In the triad technique, a person is given three words, and is asked which of the two belong together (or which doesn't
belong with the other two). In the alternative association technique, using the same three words, a person is asked
which of two belong with one that is selected. Thus, in an illustration of social/collective versus individualistic/hedonic
thinking, Menon and Shweder (1997) asked Hindu Indians and Americans which does not belong with the other two:
ANGER, SHAME, HAPPINESS. In a small sample of traditional Hindu Indians and Americans, they found that the Indians
reliably selected anger, and the Americans happiness. This finding was confirmed with a much larger sample
end p.48
of Indian and American college students (Rozin, 2003), using the alternative association method: (what goes with
SHAME: ANGER or HAPPINESS), although the differences were not as extreme as with the more traditional samples.
Americans explain their choice by saying that ANGER and SHAME are negative, and HAPPINESS is positive. Indians
explain their choice by saying that SHAME and HAPPINESS are socially constructive, while ANGER is socially
destructive. All participants interviewed acknowledge the basis for the construal opposite to their own; they just don't
usually think that way. Importantly, this bias may have powerful effects, because one usually proceeds from one's
original construal to further implications.
This general position has been called frame switching within a dynamic constructivist view of culture (Hong et al.,
2000). It has been possible to shift people, including biculturals, from one predominant frame (default) to another by
surprisingly simple priming procedures, in which the participant is exposed to things associated with one or another
culture (including flags, buildings) (Brewer & Gardner, 1996; Haberstroh et al., 2002; Hong et al., 2000). The idea that
a variety of cultural systems are differentially accessible in any individual has been put forth in a variety of forms in
the recent literature in cultural psychology (Hong et al., 2000; Oyserman et al., 2002). The idea of default, frame
switching, or differential accessibility has been around for some time (e.g., Rozin, 1976a, in relation to the evolution of
intelligence) but has come to the fore in recent work.
10 Cultural Norms Are Typically More Extreme than Cultural Behavior or the
Enculturated Mind
One function of norms is to push people away from their predispositions (Rozin, 2003). We are appropriately
impressed by cultural differences; these constitute a major motivation for world travel. Yet, when psychologists seek to
measure differences between the peoples in one culture and another, the differences almost always appear as
quantitative, and sometimes account for much less than half of the variance (the same may be said for most
behavioral sex differences). Even on items specifically selected to highlight cultural differences, the overlap between
individuals is high. Perhaps the most researched cultural difference in the literature in cultural psychology has to do
with the individualism-collectivism dimension (Markus & Kitayama, 1991; Triandis, 1995). The United States has one of
the most distinctly individualistic societies, and India one of the most collectivist. Collectivism can be instantiated by
the statement Solidarity is more important than individuality. Yet only 62 percent of a sample of Indian college
students in a rather traditional Indian city endorse this claim, and 10 percent of American students do (Rozin, 2003).
On another item directed at the same difference: The nail that stands up gets hammered down (nonconformity is
discouraged), 59 percent of the Indian students agree, in contrast to 27 percent of the American students. Indian
endorsement of these statements would no doubt be higher in older Indian adults, but we also have evidence (Rozin,
2003) that American endorsement of the same statements is also higher in older American adults than American
college students. Many extensive studies of cultural differences (Hofstede, 1982; Oyserman et al.,
end p.49
2002) reveal important and significant cultural differences on important dimensions, but substantial overlap as well.
If we, reasonably, assume universal human predispositions in the social domain, we then confront conflicts between
these and important cultural values (e.g., either individualism or collectivism). So, for example, if we assume that
humans have both collectivist and individualist dispositions, depending on, among other things, domain of activity
(Fiske, 1991, 1992), the cultural values/norms often have the function of pushing humans away from their natural
balance in these predispositions. Perhaps the reason transcultural norm differences are bigger than transcultural
individual differences is that it is necessary for cultures to establish rather extreme norms, to optimally motivate
departure from natural predispositions (Rozin, 2003). To move a population x units in the direction of collectivism, it
may be optimal to set a standard at 2x units.
11 Cultural Influence May Be More Prominent in the Area of Behavior than in the Area of
Mental Events
Unlike the other principles asserted in this essay, this one is based only on common sense, with no direct empirical
evidence. It derives simply from the fact that it is much easier to shape (reinforce, build institutions or environments
to promote) behavior, because it is observable, than mental events (Rozin, 2003). Behavior must be used as the
marker for mental events, in order for a third party (individual or culture) to attempt to affect mental events. Surely
this occurs abundantly, but because the control is indirect, and verification is indirect, it seems very likely that the
thought-shaping process is less potent and successful. In some sense, the existence of intrinsic versus extrinsic
preferences illustrates change in thought and affect versus behavior.
It would be of interest to obtain actual data on this point, although this would be difficult. In addition, it is likely that
the degree of effort needed to shape mental events, as opposed to behavior, varies both by culture and domain of
activity. For example, Judaism seems more oriented to shaping behavior, and less to shaping mental events, than
Christianity (Cohen & Rozin, 2001).
12 Domains Are Critical in Understanding Links between the Innate Mind and Culture
The realization that adaptations are often domain specific (e.g. Rozin, 1976a; Rozin & Kalat, 1971) has extended in
recent decades into the domain of cognition, particularly in the form of an emphasis on modularity (Fodor, 1983).
Different activity domains (eating, sex, communication) (and different sensory systems, as well) face different
problems in representation, acquisition, and action, and brains seem usually to make appropriate, specialized
adaptations. For example, learning occurs with delays of hours between ingestion and its consequences in the food
domain (where such intervals are a necessary aspect of the digestive process), but much less so for other domains of
activity. Similarly, at least for food generalist animals like humans, distinguishing between edible and safe versus
harmful potential
end p.50
foods must be based on experience, whereas, in the sexual domain, mate recognition can often be prewired. Mayr
(1974) refers to food as an open system and mate selection as a closed system. There are also arguments for easier
evolvability of domain-localized systems. This general approach has been applied to culture (e.g., Cosmides & Tooby,
1994). It remains an open question as to when and how more general problem-solving systems arise.
The most explicit application of domain specificity in the broad realm of social behavior is Alan Fiske's (1991, 1992)
four models of social structure, which map in specific ways onto particular domains of life, such that, in most cultures,
for example, communal sharing is characteristic of family relations and meal contexts.
Domain specificity (modularity, adaptive specializations), while accepted to some degree in the study of cognition and
language, has not penetrated very deeply into psychology. For example, introductory psychology texts, or more
specialized texts in social psychology or developmental psychology, pay little, if any, attention to how humans function
in different domains of life (e.g., work, eating, leisure activities and the arts, religion; Rozin, 2005).
13 There Are Major Changes between the Ancestral and Contemporary (Culturally
Evolved) Environment, Particularly Striking in the Domain of Food
It is a truism that most (human) adaptations can be understood as promoting fitness in the ancestral environment in
which they evolved. In the case of humans, there have been massive changes in that environment in about the last
8,000 years, due to the evolution of culture, and associated technological advances. This situation allows for
mismatches in which adaptations to the ancestral environment may be maladaptive in the relatively recent
contemporary environments. This situation is particularly common in the domain of food, and in particular, energy
regulation.
Virtually all animal species studied show some ability to regulate energy intake, that is, to match energy intake with
energy expenditure in their adult phase. This, of course, serves to maintain a presumably optimum weight. In addition,
virtually all animal species studied minimize energy expenditure, such that as little energy as possible is spent in
searching for and consuming food. This extensive set of findings is summarized under the term optimal foraging.
Unnecessarily increased energy expenditure in searching for food requires more time searching for food, at the cost of
other activities, and yielding increased exposure to predation. Both regulation of food intake and optimal foraging are
clearly adaptive in the ancestral environment.
Following upon the development of agriculture and domestication, human societies have become massively
transformed, by these technologies and the technologies that they permitted or encouraged, through establishment of a
stable food supply that required much less individual effort to procure (Diamond, 1997). The changes, in the food
domain itself, escalated during the twentieth century, in the developed world, resulting in a food environment that is
almost the opposite of the ancestral environment.
end p.51
In the ancestral environment, food is relatively scarce. In the contemporary (developed-world) environment, food is
abundant. The evolved food regulation system was oriented primarily to motivate ingestion in cases of shortage.
Hunger plays this role. Mechanisms to prevent overconsumption are less powerful.
In the ancestral environment, with the exception of meat, there were very few foods that were calorically dense (e.g.,
high in fat and containing minimal noncaloric components). In the contemporary environment, technology has
produced superfoods, which have very high caloric density and combinations of desirable sensory properties not
encountered in the natural world. Chocolate is an example, incorporating the sweet and fatty properties so innately
appealing to humans. Restraint in the face of such choices was not a part of our inherited equipment.
The variety of edibles is modest in most ancestral environments, but an enormous variety of foods is available in the
contemporary environment: foods from all over the world, in any season, are now available in local supermarkets.
Variety encourages increased intake, creating another force for overconsumption in the contemporary world (Rolls et
al., 1986).
The linkage between energy expenditure and consumption of food, present in the ancestral environment, has been
broken in the contemporary world. One can now obtain a week's supply of food on one trip to the supermarket by car,
with virtually no energy expenditure.
In the ancestral environment, there were clear linkages between ingestion and its negative consequences. Toxic or
infected food would promptly produce negative symptoms, and the organism could learn to avoid such foods (e.g.,
conditioned taste aversions). As a result of the epidemiological revolution, that is, the conquest of most acute
infectious diseases, and the development of sanitation and food-borne toxin controls, the short-term risk of illness or
death from food consumption has been drastically reduced. In the contemporary developed world, the health risks and
benefits of foods, or food ingestion patterns, manifest over many decades, rather than hours. Humans are unable to
notice and act on such contingencies. The development of epidemiology has enabled humans to document the long-
term relationships between diet and health, and to communicate these widely. The risks so described are probabilistic
and small, below any level that evolved biological mechanisms were designed to detect.
The result of all these changes is that an organism adapted to ensure sufficient energy supplies, with a satiation
system that is opposed by easily available, highly palatable food, is confronted with a level of temptation and
stimulation that easily overwhelms the innately wired satiety system. And, most critically, an organism finely tuned to
spend as little energy as possible to obtain energy faces an environment in which the link between energy expenditure
and energy procurement is broken; convenience (read as energy efficiency), highly adaptive in the ancestral
environment, becomes an easy route to obesity in the early twenty-first-century developed world.
There is another general adaptation of a wide range of organisms to the ancestral environment that has been
neutralized or perhaps reversed in the modern developed world. The sympathetic magical law of similarity is a strategy
or heuristic that is widely operative in the animal kingdom. As described by Mauss (1902/1972)
end p.52
and others (see Rozin & Nemeroff, 1990, for a review), in one of its forms, the law holds that appearance equals
reality. That is, things are what they appear to be. For example, if something looks like a tiger, it is a tigerobviously
an adaptive system. However, for modern humans, a good part of their visual contact with the world is through
images. Images, of course, are not what they appear to be. A picture of a tiger is not a threat. But this primitive part
of our cognition continues to exert its effect; that is, we tend to respond to images that we know are images as if they
are actual exemplars of what they appear to be. For example, in the food domain, we have shown that individuals are
reluctant to consume a piece of fudge shaped to look like dog-doo (looks like dog-doo, is dog-doo) or to drink
apple juice served in a brand new bed pan (looks like urine, is urine), even though the individuals in question know,
from direct observation, that both choices are edible and desirable (Rozin, Millman, & Nemeroff, 1986; Rozin, Haidt, et
al., 1999).
14 The Food Domain Is Virtually Unique among Biological Domains, in That It Has Been
Elaborated So Much by Culture That Its Biological Roots Are Often Disguised
Leon Kass (1994), in his brilliant book The Hungry Soul, shows how the very biological food system has been vastly
transformed by culture. Unlike the other fundamental biological systems (e.g., sex, excretion, breathing) food has
become deeply entwined in the social and moral world. It is the only major biological function (other than breathing)
that is typically, and crossculturally, performed in public, and in such a way (as a result of table manners) as to
transform its appearance from its animal origins. As Kass points out: Like the ballerina who defies gravity, so the
graceful eater defies neediness and eats as if he were not compelled to do so (p. 158). This means that in thinking
about the relation between the innate mind and culture, we may come up with a more culturally freighted story for the
case of food. In the service of functions other than nutrition, and under strong social stimulation, innate aversions are
reversed, food assumes ritual functions thatas in the case of taboosmay interfere with optimal nutrition, and
elaborate food preparation and consumption traditions develop that have no relation to the basic biological function of
food.
15 The Elaboration of Food and the Cultural Evolution of Disgust Illustrate the
Fundamental Principle of Preadaptation in Cultural Evolution
Preadaptation is a major force in large- (and small-) scale evolutionary change. According to Ernst Mayr (1960; see
also Bock, 1959), perhaps the leading evolutionary biologist of the twentieth century, most major new structures and
abilities are not evolved gradually, de novo, but rather build on existing adaptations and programs. Entities evolved for
one purpose (or occasionally, neutral features, hence the word exaptationGould & Vrba, 1982) come to be used for
another. In a sense, preadaptation is comparable to genetic recombination, as opposed to mutation and the
development of new genetic material.
end p.53
The human mouth is a particularly appropriate and striking example of preadaptation. The teeth and tongue evolved
for food processing, but they are later used by the language system for speech articulation. Notably, this is a
preadapted food system being used for another purpose. Given the fundamental importance of food procurement and
selection, it is not surprising that many primary adaptations would appear in this domain. It is possible that even the
first forms of conditioning appeared first in the food domain, and later became more generalized. There is also
evidence in children that reasoning about contamination and other food-related matters may be more advanced than
reasoning in other domains (Siegal, 1996).
Food is, biologically, about nutrition, but in humans, it becomes embedded in many other domains. Food is a major
social instrument; for example, it provides a major set of occasions for social exchange, at meals. It is used to both
express intimacy (as with sharing food) and to create social distance, as in the Hindu caste system. Food also becomes
an art form in cuisine, which can hardly be justified on nutritional grounds. Food is a major source of metaphors
(Lakoff & Johnson, 1980), as when we say that someone is sweet, or that something is in bad taste, or that we cannot
stomach an argument. A metaphor is a form of preadaptation: use of a word from one domain to express something in
another domain. Finally, food becomes tied into moral systems, especially in some cultures, such as Hindu India
(Appadurai, 1981).
My colleagues and I have argued that just as food and food-adapted systems transfer to other domains, so does the
emotion of disgust, by a combination of preadaptation in biological and cultural evolution (Rozin et al., 1997, 2000).
Briefly, we argue that the disgust system is originally, in many mammals, a system for rejecting foods based on bad
taste. The facial expression and associated nausea seem oriented to rejecting food and preventing further ingestion.
This distaste system is present in rats, and human infants. However, in human cultural evolution (and development),
this get this bad taste out of my mouth system is utilized more and more widely, as a general instrument of
socialization, until it becomes something more like get this out of my soul. First, many potential foods, especially
body products and decayed matter, come to be disgusting, on the basis not of their taste but the idea of what they
are. This core disgust then expands to disgust at a whole set of reminders of our animal nature; humans seem to want
to turn their backs on their animal nature, particularly the animal feature of mortality. Notably, the odor of decay or
death is the quintessential odor elicitor of disgust. Yet later, other people, usually strangers, or other groups, are
included in disgust, and finally, disgust becomes one of a set of moral emotions (Rozin, Lowery, et al., 1999). It
becomes the emotion of negative expression when moral violations related to purity and divinity are encountered. The
general elaboration of disgust into the social and moral domain is described in rich detail by William I. Miller (1997).
A critical feature of disgust is contamination: when something disgusting touches an otherwise edible or desirable
entity, it renders it unacceptable. This powerful negation, originally functioning in the food domain (e.g., with contact
with feces) becomes generalized, just as disgust does, to wider and wider domains,
end p.54
including contact with strangers or immorality. As is often the case with preadaptation, in this situation, the original
preadaptation, the negative response to bad-tasting food, ceases to function in the new system. That is, although the
bad taste and disgust systems share an expressive (e.g. facial) system, some of the general features of disgust are
not shared with the distaste response. Thus, distasteful foods are not contaminating, whereas virtually everything
disgusting is.
Another domain where preadaptation is a basic process is pleasure itself. Whatever the circuits that produce this
positive feeling, originally linked to biological necessities, the system is expanded in humans to include mastery and
aesthetic appreciation (Rozin, 1999).
16 Preadaptation, While Very Important in Biological Evolution, Is Even More Important
in Cultural Evolution, Because Purpose and Foresight Enhance It in Cultural Evolution
As important as preadaptation is in biological evolution, it is much more important in cultural evolution. Preadaptation
in biological evolution is limited by the fact that borrowing an adaptation requires (1) that it is borrowable, that is, that
some sort of (e.g., neural) contact can be made between the new and the old domain, and (2) that all stages of
borrowing have adaptive value (Bock, 1959). This is often problematic biologically, but is not a problem in cultural
evolution, because teleology is actually at work in cultural evolution. That is, one can imagine a new use for a system,
and make it happen. One can put up with failure (which is typically terminal in biological evolution) while one perfects
a system, with the end in mind. So, one can make trucks out of cars without a gradual set of stages; one can combine
a calculator and a typewriter, with many false starts, to make a computer; and so on. Preadaptation is rampant in
cultural evolution. The expansion of disgust illustrates this. If something is undesirable in any culture, efforts can be
made to make it disgusting, through a socialization process. Disgust is a very effective way to discourage contact or
interaction.
17 A Major Influence of the Innate Mind on Culture, and Culture on the Innate Mind, Is
through Institutions and Alterations of the Environment: The Innate Mind to Enculturated
Mind Link Is Often Mediated by Matters outside the Head
Culture exists in the environment (e.g., cities, streets, homes, places of worship, markets, conveniences) as well as
in the minds of members of a culture. Indeed, international tourism is based largely on people's interests in observing
other cultures' environments. Environments provide major constraints for behavioral options, and alter the likelihood
(environmental predispositions) for different behaviors (Rozin, 2003). By altering the perceived world, and the arena
for action,
end p.55
environments influence minds and mental development. In the psychology of recent decades, there has been a strong
emphasis on mental events, consequent upon the cognitive revolution, and a reaction against behaviorism. This
emphasis carried over into the earliest forays by psychologists into the study of culture (e.g., Markus & Kitayama,
1991). However, more recently, cultural psychologists have come to appreciate the power of the cultural environment.
Kitayama (2002) has pointed to the importance of the environment created by cultures, and Kitayama and Markus
(1999) promote the term cultural affordances to encompass situations, structures, artifacts, and customs in which the
individual is interactively embedded.
The influence of the physical and physical-social environment, such as institutions, on behavior and mental events has
been attended to much more in sociology than in psychology. The lack of attention in psychology may be a result, in
part, of the fact that the mechanisms through which environments operate are frequently transparent. People can only
take trains, go to school, and eat pineapples if these opportunities (affordances) are part of the environment. This
truism seems trivial, and it is, if one's aim is to uncover nonobvious principles of mind and behavior, as opposed to
understand why people think and do the things they do. The importance of writing, a deeply important feature of the
environment, as affording reading and allowing for major changes in the nature of education and communication is
obvious, but central to understanding modern human beings. And the type of writing system employed in a culture can
influence, in a major way, the ease of learning to read, the ability to electronically code writing, and the degree of
literacy. For example, modern Chinese is easier for children to conceptualize and learn at the earlier stages of reading
but harder for adults to fully master (on account of the many thousands of symbols that must be learned). We have a
tendency to take these environmental effects for granted, perhaps making something like the fundamental attribution
error in underrating the importance of the environment as a determiner of behavior and mental events.
The great bulk of research in psychology on eating has been devoted to the internal signals (blood levels of nutrients,
stomach fill, etc.) that promote or deter eating. Such influences are undoubtedly present. But I believe it is hard to
deny that for most humans, the major determinants of how much is eaten at a meal are the presence of food and its
palatabilityboth features of the environment. We recently demonstrated that amnesic individuals, who had no
memory of having eaten a recent meal, would consume a second and even a third meal if presented with appropriate
meals in an appropriate lunch context (Rozin et al., 1998). More generally, cultural rules about amount to be eaten
and times of eating, availability, cost, and palatability seem to be the principal determinants of amount eaten in a
given meal (Pliner & Rozin, 2000).
The French eat a highly palatable diet, and eat a higher percent of calories as fat in their diets than Americans. Yet the
French are noticeably less overweight than Americans. In our attempt to understand why this state of affairs exists, we
have concluded that a major part of the explanation has to do with differences in eating environments. French food
portions are notably smaller than American food portions, in both restaurants and supermarkets (Rozin et al., 2003).
People simply eat
end p.56
less when they are served less, and the proper range of portion sizes becomes strongly ingrained in cultural practice.
The same type of analysis could be applied to energy expenditure, the other half of the obesity equation. Cultural
affordances can promote or deter energy expenditure. In France, very high gas prices, a less car-friendly environment,
and the location of basic food stores in every neighborhood promote walking over driving. In much of modern American
society, the environment, from garage to mall, has been structured so that walking is almost unnecessary. And since,
as I noted earlier, it is in our genes to spend as little energy as necessary to gain necessities, Americans reasonably
opt for the most convenient affordances plentifully provided by their culture.
The innate mind influences the development of the structure of the cultural environment, and that environment has set
up different selection pressures (e.g., for good driving as opposed to walking ability) that can and will affect the innate
mind.
18 Innate Predisposition, Socialization, and Structuring of the Environment Operate to
Constrain Departures from Certain Pathways, as in Canalization
In his classic work in developmental biology, The Strategy of the Genes, Waddington (1957) addresses the problem of
how, in the face of many predictable and unpredictable perturbations, the process of normal early development
continues on an almost unerringly adaptive path. He coins the term canalization to refer to the fact that certain
adaptive pathways (including choice points) are laid out, and established by multiple constraining forces, such that it is
very difficult for the developmental trajectory to depart from these pathways. He illustrates this idea with a
downwardly slanted surface that represents the range of developmental possibilities, with a ball rolling down this
surface as the actual course of development (fig. 3.3). Deep channels in this surface serve to keep the ball on certain
pathways, and choice points occur along this pathway.
Canalization applies directly to ideas about cognitive and affective development in humans. Furthermore, with the
powerful importance of culturally created environments and norms, further channels of canalization are available to
steer human development. Schools are a primary example, as are traditions of child rearing, covering issues such as
use of punishment and modes of toilet training and weaning. In addition, adult activities are generally channeled by
cultural artifacts. A simple example is a path through a park or woods. The path provides an easy route, involving
minimal effort, for traversing an area, and people (and, by the way, dogs) tend to follow these paths, though
departures are easy. Cultures provide easy ways of doing things, and children and adults tend to follow these
pathways. One can wash dishes by hand, but it is easier to use the dishwasher; one can walk eight blocks to the store,
but it is easier to drive. These sometimes crude and sometimes subtle influences can have a massive effect on our
activities, and as a result on our experiences, and ultimately on our minds.
end p.57
Source: Waddington (1957), p. 36.
FIGURE 3.3 The development landscape, illustrating the principle of canalization. The rolling of the ball
down the inclined plane represents the course of development, and the channels represent the canalized
pathways.
Source: Waddington (1957), p. 36.
19 Accessibility Represents the Same Process as Preadaptation, but Occurring during
Development
In a 1976 essay, The Evolution of Intelligence and Access to the Cognitive Unconscious (Rozin, 1976a), I proposed
that a major feature of the evolution of intelligence was gaining access to existing adaptive specializations (modules) so
that their processing capabilities could be applied to new inputs and outputs. This, of course, is an application of the
idea of preadaptation to the process of development. Piaget's (1955) concept of vertical decalage is precisely this; a
particular ability appears in one domain first, and gradually, with development, expands to other domains. This holds
for the idea of constancy, for example.
The principal example I used to develop this idea was the history of the alphabetical writing system (Gleitman & Rozin,
1977; Rozin & Gleitman, 1977). So far as we know, the alphabet was invented only once, somewhere in the Middle
East. In most respects, it is the most efficient writing system, in that it encodes all of language (speech) into a set of
2040 written characters. If one learns the characters and the corresponding phonemes, one can now represent and
understand any speech utterance via the medium of writing. The memory load is minimal, as are the requisite writing
skills. And, in the modern world, digitalization is particularly easy because of the small number of characters.
end p.58
Given all of these advantages, it is surprising that the alphabet was only invented once (although it has certainly
spread widely). Furthermore, although the alphabet is the easiest system to master and employ for adults, it is also
the hardest for children, at the initial stages of learning. This is because the idea of the alphabet is difficult to
appreciate. The alphabet is built on the principle of phonological segmentation in the speech system. The continuous
stream of speech is segmented in the system (brain) into elementary units, called phonemes. Although phonemes have
definite reality in the system (brain), they do not have an independent physical existence in the sound stream. The
word bag, in its motor organization and perception, has three component sounds, but these cannot be recovered
from the physical representation of the sound stream, which is continuous. This is because the B, A, and G distinct
articulatory commands coming from the brain are shingled when realized in the oropharynx. So while it is true that
bag has three sounds, this is not obvious. It was this deep conscious realization that what seems continuous is
actually segmented that allowed for the development of the alphabet. This involves gaining access, at some level, to
the mind's (brain's) speech segmentation module.
Our work and that of others (summarized in Gleitman & Rozin, 1977; Rozin & Gleitman, 1977) supports this
interpretation in three ways: (1) Understanding phonological segmentation is a major barrier to the acquisition of
alphabetic systems; (2) Syllable-based writing systems, which were common in the history of writing, are much easier
to acquirethe syllable is the smallest speech unit that can be separated out physically in the sound stream (Baghdad
is composed of separable sound elements, BAGH and DAD); (3) once one understands the alphabetic principle, it
seems entirely intuitive. Indeed, it is hard to convince reading teachers that the fact that bag has three sounds has
to be taught.
Preadaptation in biological and cultural evolution, and accessibility, all refer to the same process of borrowing. The
basic structure of each is laid out in table 3.1.
In all three cases, the original source may remain intact, or be replaced by the preadapted/accessed entity. My
argument is that this is a deeply fundamental family of processes for the understanding of evolution, culture, and
development.
Table 3.1 Three modes of utilizable existing programs for new purposes.
end p.59
20 Conclusion
I began this essay with 17 principles in mind, but added plus or minus 2 to provide a margin for error. In writing it,
the list expanded to 19, which, fortunately, falls within the scope of the title. I hope that some of these principles
prove fertile or stimulating to some readers. In general, I see these principles as emerging from a combination of
findings from evolutionary biology, neuroscience, psychology, linguistics, and anthropology. Insofar as this list is useful,
it also argues for the importance of the food domain as a source of innovation in biological and cultural evolution, and
in development.
end p.60
Process Domain of Activity Example
Preadaptation Biological evolution Mammalian inner ear bones
Human mouth as a vehicle for speech
Preadaptation Cultural evolution Applications of computers, to word processing and other domains
Applications of motors, wheels, writing over wide domains
Accessibility Individual development Acquisition of alphabetic principle
Piagetian vertical decalages
4 Steps toward an Evolutionary Psychology of a Culture-Dependent Species
Daniel M. T. Fessler
Humans are at once phylogenetically linked to, and yet fundamentally different from, other primates. Most profound
among these differences is the extent of our reliance on culture, by which I mean socially transmitted information
shared by at least some members of the learner's group. While recent work reveals the existence of socially
transmitted foraging techniques and social behaviors in some nonhuman primates (Fragaszy & Perry, 2003; Whiten et
al., 1999), compared to the human case, cultural information plays a minor role in these animals' efforts to negotiate
their physical and social environments. Highly altricial and relatively gracile, lacking large teeth, strong jaws, or claws,
we are a rather unimposing mammalour ability to exist, indeed to prosper, in nearly every ecosystem on the planet
is primarily due to our capacity to acquire, employ, and elaborate on socially transmitted information. This chapter is
based on the premise that these capacities reflect the workings of special -purpose psychological mechanisms that
evolved in order to exploit the enormous adaptive potential of socially transmitted information. After reviewing the
principal existing approaches to this question, I outline some of the major topics that I believe need to be addressed in
developing an evolutionary psychology of our uniquely culture-dependent species.
1 Principal Existing Perspectives
To date, scholars have largely adopted one of three perspectives when exploring the relationship between culture and
human evolution; I refer to these, respectively, as the punctuated change model, the psychological anthropology
model, and the orthodox evolutionary psychology model.
end p.61
1.1 The Punctuated Change Model
The punctuated change model holds that the transition from a more primate-like hominid having limited use of culture
to a fully human creature deeply dependent on culture was the result of some discrete set of neurological changes
that, at least initially, occurred largely independent of the benefits of socially transmitted information. In this view, a
small number of genetic changes expanded some previously limited capacity (symbol manipulation and language use
being popular candidates) in a fashion that allowed for the rapid development of a body of socially transmitted
information; it was only after this event that culture became an important component of human behavior or, in more
modest versions of the claim, that culture became the principal means whereby hominids coped with their physical and
social environments (see, for example, Byers, 1994; Diamond, 1992; Klein, 1995; Mellars, 1989; Mithen, 1994, 1996;
White, 1959).
There are both empirical and theoretical grounds for questioning the punctuated change model. First, as the breadth
and resolution of the archeological record improves, evidence increasingly favors a portrait in which human behavior
gradually increased in complexity, in fits and starts, over a period of several hundred thousand yearsthe so-called
human revolution, wherein cultural complexity was thought to dramatically increase in the space of a few thousand
years, is an artifact of researchers having viewed only a narrow slice of the archeological record (McBrearty & Brooks,
2000). Second, the punctuated change model argues either that (1) the neurological changes that opened the door to
culture use were not favored as a result of the advantages therein, culture use being an accidental consequence of
selection for other traits, or (2) dramatic alterations in the psychological architecture necessary for the extensive
exploitation of culture took place very rapidly, as the result of a small number of changes. Although some investigators
hold that our abilities to acquire, use, modify, and transmit cultural information are the result of relatively general-
purpose cognitive attributes, such as being able to adopt another's perspective (Tomasello, 1999a) or being able to
map information across cognitive domains (Mithen, 1994), I will argue that humans' use of culture reflects the
workings of a large number of highly specialized psychological mechanisms. If I am correct in this regard, then the
evolution of a so-called capacity for culture is not parsimoniously explained as a side effect of other changes. Likewise,
with regard to the proposal that a few sudden changes opened the door to a greatly enhanced reliance on culture,
although such events are not impossible, nevertheless, in general, natural selection operates through the gradual
modification of existing designs, with each minor alteration offering a fitness advantage over the previous configuration.
It is therefore more plausible to suppose that, consistent with the archeological record, a process of incremental
feedback took place wherein small changes in specific aspects of the mind allowed for alterations in culture-relevant
behavior, opening the door to a modest expansion of the content and usefulness of socially transmitted information; in
turn, such expansion favored additional alterations in the aforementioned aspects of mind, inviting additional cultural
expansion, and so on (see chapters 6 and 7 here for discussions of possible evolutionary processes). In short, it is
unlikely that our ancestors ever suddenly got culture.
end p.62
1.2 The Psychological Anthropology Model
Students of psychological anthropology will recognize that neither the foregoing position nor the general goal of this
chapter is novel. Half a century ago, in his presidential address to the American Anthropological Association, A. Irving
Hallowell (1950) took his colleagues to task for allowing one of the most important scientific questions to fall through
the cracks in their division of labor. As Hallowell described the social structure of the discipline of anthropology, the
topic of human evolution was assigned to the physical anthropologist, who studies morphological change over time;
behavioral evolution was assigned to the archeologist, who studies changes in the material record over time; and
human nature was assigned to the cultural anthropologist, who buried it under evidence that experience and behavior
vary greatly across cultures. Nowhere in this arrangement was there room for the study of the evolution of the
psychological attributes that make humans distinctive, chief among which are those that allow us to so effectively
exploit culture (see also Hallowell, 1956, 1961).
Hallowell both contributed to and drew on a large body of contemporaneous anthropological research aimed at
exploring the evolution of the capacity for culture (see, for example, Cohen, 1968; Montagu, 1962, 1968; Spuhler,
1965). Why then did this enterprise largely collapse? With a few exceptions (e.g., D'Andrade, 2002), psychological
anthropologists, Hallowell's intellectual descendents, have abandoned the question that he viewed as central to, and
uniting of, the discipline of anthropology. I believe that two factors led to the collapse of Hallowell's agenda. First, in
the vast majority of midcentury work on this subject, investigators saw as their goal the exploration of the
phylogenetic precursors of attributes such as tool use and vocal communication, as this would allow them to address
the question demanded by Darwinian gradualism, namely how we got here from there. In focusing on this topic,
evolutionists were in part reacting to the disciplinary prejudices of mainstream anthropology. As Hallowell phrased it:
Whereas opponents of human evolution in the nineteenth century were those who naturally stressed evidence
that implied discontinuity between man and his primate precursors, anthropologists in the twentieth century,
while giving lip service to morphological evolution, have, by the special emphasis laid upon culture as the prime
human differential, implied what is, in effect, an unbridged behavioral gap between ourselves and our closest
relatives. The possession of culture has tended to become an all-or-none proposition. (Hallowell, 1956, p. 91)
I applaud the midcentury evolutionary anthropologists' attempts to map out the hominid precursors of the
psychological attributes of interest here. However, although this emphasis usefully presaged investigations of primate
behavior and psychology (see Fragaszy & Perry, 2003; Whiten et al., 1999), it also had detrimental effects. First,
particularly in the case of tool use, it tended to focus evolutionists' attention on behavior rather than on the
psychological attributes underlying that behavior. Second, it diverted attention away from the broader question of what
the capacity for culture actually consists ofthough they were obvious targets, the emphasis on reconstructing the
phylogenies of tool use and language reduced
end p.63
attention paid to other fundamentally important aspects of the evolution of the human mind.
Largely alone among his peers, Hallowell attempted to grapple with the question of the nature of the psychological
architecture that makes life as a cultural organism possible. Consistent with his goal of placing the study of human
nature at the forefront of the anthropological agenda, Hallowell directed anthropologists' attention to the importance of
symbolic representation in human thought, including both (1) the manner in which symbolic representation facilitates
discerning or learning norms for behavior, and (2) the manner in which symbolic representation affords self-
objectification, that is, the ability to view oneself and one's actions from an observer's perspective. These two features,
Hallowell argued, are central to human behavior, for it is only through perceiving norms and comparing one's own
behavior to them that cultural adherence or conformity, with all of the benefits thereof, can be achieved. Moreover,
Hallowell went on to argue, while symbolic representation enhances the recognition of norms, and while self-
objectification allows for awareness of the extent to which one lives up to such norms, the key component completing
this triad is the motivational structure linking hedonic state to norm adherence (Hallowell, 1960). Hallowell thus
correctly identified topics central to an understanding of the psychological architecture underlying the human reliance
on culture (see chapter 17 here). I believe that the reason that Hallowell's efforts nevertheless failed to inspire an
extensive corpus of empirical research (and, perhaps relatedly, failed to change the structural divisions within the
discipline of anthropology) is that (1) he often eschewed analysis of postulated selection pressures, thereby eliminating
one of the most useful of the heuristics employed by evolutionists, and (2) his ideas were framed in terms of the
psychological constructs prevalent at the time, constructs that approach the phenomena at issue from the wrong
direction.
While Hallowell and other evolutionarily minded psychological anthropologists did identify some topics, such as self-
objectification, that are useful in the present context, this was generally not true of the enterprise as a whole, a failing
stemming from the fact that much of their theorizing was premised either on neo-Freudian psychodynamic models
(e.g. Hallowell, 1950) or on a general humanistic psychology such as that of Maslow (e.g. Hallowell, 1960). Holding
aside the (nontrivial) question of the empirical validity of either of these theories, it is important to recognize that such
perspectives do not lend themselves to the adaptationist tactic productively employed in modern evolutionary analyses
of behavior. It is often said that contemporary evolutionists carve nature at its joints by identifying the logically
distinct adaptive challenges that selected for specialized psychological, physiological, or anatomical features. In
contrast, concepts such as on the one hand id, ego, and superego (see Hallowell, 1950) or on the other hand needs,
desires, goals, and purposiveness (see Hallowell, 1960) concern general postulates that do not directly address specific
evolutionarily relevant features of an organism's interaction with its environment. To stretch the metaphor, rather than
carving the turkey of human nature at its joints, these constructs address distinctions, like the one between white
meat and dark meat, that, while they capture the observer's attention, nevertheless shed little light on the question of
why the turkey possesses the
end p.64
structure that it does; just as the notion of giblets is useful for the chef but useless for the functional anatomist, so,
too, can the notion of superego be productive for the psychotherapist but counterproductive for the evolutionist.
Handicapped by their choice of theoretical tools, more than 50 years after Hallowell declared that an understanding of
mind is the key to exploring the evolution of the human capacity for culture, to the extent that they have attended to
the issue at all, psychological anthropologists have made little progress.
1.3 The Orthodox Evolutionary Psychology Model
Largely independent of the efforts of the midcentury evolutionary anthropologists, over the last 20 years, enormous
advances have taken place in the application of evolutionary theory to the study of human nature. While there is
considerable variety in this enterprise, the past decade has seen substantial consolidation, with what I term orthodox
evolutionary psychology becoming the dominant perspective. In their seminal essay outlining an evolutionary
psychological approach to human behavior, John Tooby and Leda Cosmides (1992), the principal proponents of this
view, assert that traditional observations regarding culture mask three different sources of behavioral regularity and
ideational similarity across individuals.
Tooby and Cosmides argue that panhuman circumstances and experiences, reliably present across generations, favored
the evolution of psychological mechanisms attuned to, and able to exploit, those regularities. In turn, these
mechanisms produce mental contents (beliefs, reactions, etc.) that, at a general level of abstraction, are similar across
individuals and across social groups, leading to overarching similarities among all or most humans; Tooby and
Cosmides term these regularities and similarities metaculture. For example, the altricial nature of human infants is
such that all humans have the experience of growing up under the care and supervision of caretakers; biological kin
selection is such that, in most circumstances, this care will be provided by close relatives. In turn, this reliable
regularity in social circumstances allowed for the evolution of psychological mechanisms that, in the service of
inbreeding avoidance, rely on propinquity during childhood as an index of relatedness; the combination of the
commonality of experience across individuals and populations and the universal possession of these mechanisms leads
to a nearly universal negative emotional reaction to the idea of sex between close kin (Westermarck, 1891; Wolf,
1993).
Metaculture refers to features shared across radically disparate societies and groups. Cognizant of the intergroup
variation that has for so long impressed (and perhaps obsessed) anthropologists, Tooby and Cosmides identify two
sources of the similarity within, and difference between, groups. First, arguing that some behavioral and ideational
similarities within groups do not stem from the social transmission of information, the authors propose that such
patterns result from the uniformity of responses of panhuman psychological mechanisms when presented with a
common local environment. For example, Cosmides and Tooby (1992) note that sharing, a method of managing
production risk, is common in hunter-gatherer groups that face high variance in food production as a result of
stochastic factors. This pattern, the authors argue, reflects the output of evolved psychological
end p.65
mechanisms that gauge resource availabilitywhen luck matters, mechanisms present in each hunter's head increase
the attractiveness of sharing, resulting in locally patterned behavior. The authors coin the term evoked culture to refer
to similarities within groups that result exclusively from the responses of evolved mechanisms to the local social and
physical environment.
Evoked culture is contrasted with epidemiological culture. In the latter, similarities within groups result from the
transfer of information from one individual to another. The concept of epidemiological culture thus refers to the central
phenomenon of interest in this chapter, namely socially transmitted information. Indeed, congruent with the goals of
this chapter, Tooby and Cosmides (1992, p. 119) argue that (1) a rich body of locally useful knowledge acquired by
one's predecessors constitutes a potentially valuable resource, (2) the existence of such bodies of knowledge was a
recurrent feature of ancestral social environments, and hence (3) selection can be expected to have favored the
evolution of specialized psychological mechanisms dedicated to the acquisition and use of such information. However,
despite their recognition of the utility of cultural information, the choice of the term epidemiological suggests that
Tooby and Cosmides' emphasis is on the question of the relative ease of transmission of various ideas (or, in their
phrasing, the relative ease with which ideas can be reconstructed in the minds of naive actors). Moreover, the term
epidemiological connotes the exploitation of the host organismdiseases spread as pathogens take advantage of
features of their hosts, propagating at the host's expense. There is considerable utility in the notion that so-called
selfish memes (Dawkins, 1976) spread as a function of the extent to which they resonate with the outputs of evolved
psychological mechanisms possessed by their hosts. For example, this approach illuminates regularities in beliefs about
the supernatural (Boyer, 2001); sheds light on the relationship between disgust and the popularity of urban legends
(Heath et al., 2001); and explains the relationship between evolved inbreeding avoidance mechanisms and the ubiquity
of incest taboos (Fessler & Navarrete, 2004; Lieberman et al., 2003), as well as the connection between the salience
of proteinaceous foods in aversions produced by evolved toxin avoidance mechanisms and the centrality of meat in
food taboos (Fessler & Navarrete, 2003). Importantly, however, while cases such as these involve phenomena long of
interest to anthropologists, they do not address that aspect of culture with which we are here concerned, namely the
body of advantageous information the existence of which favored the evolution of mechanisms aimed at its acquisition
and exploitation. Examining the popularity of ghost beliefs or meat taboos does not shed light on how or why humans
are able to employ socially transmitted information to an unprecedented degree. In short, while Tooby and Cosmides
were on the right track, in combination with their emphasis on evoked culture (a notion that does not address
accumulated cultural knowledge), their focus on the epidemiological aspect of information transfer deflected attention
away from core questions; these issues have subsequently not been addressed by the majority of evolutionary
psychologists.
The remainder of this chapter is devoted to sketching out some of the tasks involved in the acquisition and use of
valuable socially transmitted information, and the evolved mechanisms that may address these tasks. I make no claims
of either
end p.66
originality or completeness; rather, my goal is to further the agenda laid out, but not fulfilled, by scholars as diverse
as Hallowell and Tooby and Cosmides.
2 Structure-Rich Information Acquisition Mechanisms
Tooby and Cosmides (1992) draw attention to the role of innate psychological structure in the process of social
information transfer. In this regard, it is useful to define a spectrum of information transfer. At one end of the
spectrum lies social information transfer that involves orienting or calibrating an elaborate preexisting set of schemas
and behavioral responses to local circumstances; I term this structure-rich information acquisition. At the opposite end
of the spectrum, the body of knowledge acquired from others is both sufficiently baroque and sufficiently parochial as
to make it unlikely that this material maps in any tight fashion onto innate informational structures; I term this
structure-poor information acquisition.
To illustrate the spectrum from structure-rich to structure-poor information acquisition, consider the difference between
learning to identify locally prevalent dangerous animals and learning how to make clay pots. Barrett (2005a) has
demonstrated that young children exhibit remarkable competence at identifying predators, given their generally limited
knowledge about the world, and seem quite attuned to information concerning the extent to which various animals
pose a threat to humans. Barrett argues that (1) humans inhabit a wide range of ecosystems; (2) until recently,
dangerous animals were prevalent in the vast majority of these; and (3) young children are particularly vulnerable to
predation; but (4) the identity of dangerous animals varies across ecosystemsboars do not resemble bears. Barrett
suggests that children possess an innate dangerous animal category that, while it may be linked with morphological
cues (large sharp teeth, for example), is nevertheless dependent on socially transmitted information for content.
Consistent with Hamburg's (1963) speculations concerning the existence of predispositions to acquire evolutionarily
relevant information, children avidly and preferentially pursue and retain socially transmitted information about
predators (Barrett, 2005a). The cognitive domain of predatory animals thus seems to be one in which social
information acquisition and use occurs against a backdrop of fairly rich innate structurechildren rapidly grasp
distinctions in this domain, and are able to act on the acquired information (be afraid of, flee from, etc. predators)
without extensive background learning. This contrasts with learning a complex technology-related skill such as making
clay pots. Pot making does not have the same universality as the problem of dangerous animals, as it (1) is contingent
on the presence of appropriate materials, and (2) produces a tool that serves a function that can be performed by
other tools (bladders, baskets, nets, etc.). Accordingly, while we might expect children to innately possess or easily
acquire the concept of a container, we should not expect them to have richly elaborated structures dedicated to the
task of acquiring and employing socially transmitted information about pot manufacture. As a consequence, children
should be less attracted to information about pots than about dangerous animals, they should find it more difficult to
learn about the former than the latter, and their command of the relevant information should occur later in
development.
end p.67
One facet of constructing an evolutionary psychology of humans as a culture-dependent species consists of identifying
and exploring structure-rich domains. For example, food selection is another area in which natural selection likely
created a strong predisposition to acquire relevant information from others. Humans, being dietary generalists, are able
to subsist in a variety of ecosystems. However, this flexibility brings with it the dilemma that selection cannot specify
templates for what to eat and what to avoid (Rozin, 1976b). Given the costs of individual learning through
experimentation (Boyd & Richerson, 1985), it is understandable that social factors play an important role in the
development of dietary behavior. Social facilitation of the acquisition of food preferences and avoidances occurs in
many animals (Galef & Giraldeau, 2001; Snowdon & Boe, 2003), hence there are phylogenetic precursors to our
propensity to acquire dietary behavior from conspecifics. However, in addition, the social shaping of dietary behavior is
symbolically mediated in humans (see Fallon et al., 1984; Rozin, 1990b, chapter 3 here) opening the door to the use
of dietary behavior for such nondietary purposes as marking ethnic boundaries.
While mapping out structure-rich mechanisms can illuminate many aspects of mind, much of socially transmitted
information, including information vital to survival, is more akin to pot making than to predator identification or food
selection (see chapter 2 here). I turn, therefore, to some of the factors that may pertain to structure-poor information
acquisition.
3 Structure-Poor Information Acquisition Mechanisms
3.1 Selecting a Model for Vertical Information Acquisition
In industrial and postindustrial nation-states, adult-initiated pedagogy plays an important role in social information
transfer. However, such teaching is far less significant in the learning processes that occur in many small-scale,
traditional societies (Fiske, in prep.; Mead, 1943). Instead, learners spend much of their time either on the sidelines,
watching the skilled performance of locally adaptive behaviors, or attempting to engage in play-like learning behaviors
that are often structured by older children (Maynard, 2002; Rogoff et al., 1993). In both contexts, the ability to imitate
is often vital to the acquisition of new behaviors. Recognizing others' intentions and goals seems to play a vital role in
the process of imitation (Bjorklund & Bering, 2002; Tomasello & Call, 1997), suggesting that the capacity to
manipulate a theory of mind is a critical element in humans' reliance on culture. Conventional accounts (see Byrne &
Whiten, 1988) hold that selection favored the ability to infer others' intentions because this enhances the capacity to
both exploit conspecifics and counter such exploitation. However, this capacity may also have been favored due to the
manner in which it enhances acquisition of information from fellow group members (see also Tomasello, 1999a).
From an early age, humans excel at imitation. However, while this is a necessary condition for much exploitation of the
knowledge possessed by conspecifics, it alone is not sufficient. A problem facing the social learner is the selection of an
appropriate target for imitation (Boyd & Richerson, 1985). Even forager societies
end p.68
exhibit divisions of labor by sex and age (see Kelly, 1995), hence only a subset of all individuals present will have
routinely engaged in actions constituting appropriate foci for imitation by a given learner. Moreover, imitators face the
difficulty that complex skills are built atop, or subsume, simpler skills and knowledge, creating a necessary chronology
in the acquisition of the relevant information: because simple skills and simple knowledge may be difficult to discern
when embedded in complex behavior, the most effective acquisition strategy is that which begins by focusing on
models who engage in behavior that is not vastly more complex than that of which one is currently capable (Rogoff et
al., 1993; Wertsch, 1991). A single heuristic addresses both questions of appropriateness and questions of accessibility,
namely imitate those who resemble oneself (Boyd & Richerson, 1985) but are somewhat more advanced in terms of
skill, knowledge, social standing, and so on. Consider, for example, children's play: while themes in such play often
concern adult economic and social activities, much of the actual content of play is acquired not from adults but rather
from older children (see Goodman, 1970). Overall, the presence of a slightly more advanced peer influences learning
from an early age (Barr & Hayne, 2003; see also Zukow-Goldring, 2002).
The foregoing suggests that we should expect humans to possess mental mechanisms that identify suitable targets for
imitation as a function of some combination of the target's similarity to the learner and the target's superiority to the
learner. People should be sensitive to, and able to accurately gauge, the degree of similarity between themselves and
others, and should find interesting and attractive those who are similar to, yet somewhat more advanced than,
themselves. Children as young as two show a behavioral preference for same-sex individuals (Fagot, 1985; Maccoby &
Jacklin, 1987), and, as both advertising agencies and parents are acutely aware, children long to be like their older
peers. While children's play often mimics adult occupations, children pattern much of their daily lives, including many
important everyday behaviors and skills, after the models provided by older children and adolescents.
There are at least two categories of cues that may elicit copying, namely superior performance and superior status. On
the one hand, individuals may be selected as models for imitation because they evince superior abilities in a domain
that is socially valued, self-evidently useful, or both. On the other hand, models may be selected because they occupy
a higher position in the social order than the learner, notably when that position is the result of prestige (social
advantage freely conferred by others) rather than dominance (social advantage achieved through force or the threat
thereof ) (Henrich & Gil -White, 2001). In practice, abilities and social position are often linkedsports stars, for
example, achieve prestige through their athletic prowess, whereafter both their skills and their prominence serve to
further focus public attention upon them. Nevertheless, because the tasks of evaluating relative skill and evaluating
relative prestige differ in important ways, a system that efficiently selects models for imitation can be expected to
employ input from separate mechanisms dedicated to each task. These mechanisms, in turn, ought to exhibit
selectivity in the type of information to which they are sensitive. A mechanism that evaluates the skill levels of
prospective models for imitation can be expected to attend to the outcome of others' behavior, the ease with which
others
end p.69
accomplish a task, and the relative rapidity with which others accomplish a task. A mechanism that evaluates prestige
can be expected to attend to others' orientations toward prospective models (Henrich & Gil -White, 2001) and the
extent to which prospective models command markers of prestige.
3.2 Motivations for Vertical Information Acquisition
To fully articulate the psychological architecture underlying the acquisition and use of culture, we must differentiate
between information acquisition strategies and motivational systems. It is not enough to be able to learn through
observation and imitation, nor does it suffice to be capable of identifying suitable targets for imitation, as neither
capacity will be effectively utilized without a corresponding set of emotions that make such activities attractive.
Foremost among these emotions is probably admiration, which appears to motivate individuals to study the details of a
target individual's behavior, to model their own actions after the target, and to be willing to incur costs so as to gain
access to the target.
Henrich and Gil -White (2001) argue that prestige-based social interactions are explicable in light of the dynamics of a
market for information transfer. Because learners stand to benefit from the opportunity to interact with and observe
successful models, learners are willing to pay costs, in the form of deference and service, to successful models. The
ability to learn from a model is in part a function of the degree of access, which in turn is a function of both the
model's behavior and the presence of rival learners. Learners must therefore weigh the skill and prestige of a
prospective model against his or her accessibility. If, at the proximate level, admiration is the factor motivating desire
for proximity with, and willingness to pay costs for access to, the model, then the intensity of admiration felt toward
an individual should not only reflect that individual's skill or prestige but should also weigh these factors against indices
of accessibility. Arrogant or domineering behavior is unattractive in a prospective model, while a self-deprecating,
regular guy persona is attractive, because these patterns reveal the model's willingness to provide access at a
reasonable cost to learners (Henrich & Gil -White, 2001). Moreover, we can expect that admiration will rise as a
function of having had the experience of successfully gaining proximity to, and interacting with, a model.
3.3 Conformist Information Acquisition
Thus far, I have emphasized the acquisition of cultural knowledge from individuals who are superior in skill or social
standing. However, much information transmission involves not the pursuit of an advanced target but rather conformity
to a pattern prevailing among one's peers. As mathematical models demonstrate (Boyd & Richerson, 1985; Henrich &
Boyd, 1998), conformist transmission, the When in Rome, do as the Romans do strategy, is often an effective
alternative to patterning one's behavior after some outstanding individual. Importantly, the demands of the two
strategies differ in notable ways. Whereas a principal task in vertical transmission consists of identifying a model who is
both superior on relevant grounds and accessible, conformists do not want for models, nor is accessibility as much of a
concern,
end p.70
since, if the behavior is sufficiently uniform across actors, the learner can compile individually incomplete observations
by watching multiple models. Similarly, conformist strategies do not involve a willingness to pay costs in order to
procure access to models, as access is not a limited resource. At the motivational level, different emotional systems
should underlie conformist and prestige-biased transmission.
Learning from successful individuals can provide two types of knowledge, namely (1) information that is useful because
it addresses a fitness-relevant task (e.g. knowing how to catch fish), and (2) information that is valuable because it
addresses culturally constituted prestige competitions (e.g. knowing how to sing). Similarly, conforming to prevailing
patterns of behavior can lead to the acquisition of both utilitarian practices and practices that are valuable primarily
because of their social consequences. While scholars agree that humans are remarkably conformist, debate continues
as to the evolutionary factors responsible for conformism.
It appears that much conformist behavior is not explicable in utilitarian terms. First, widely shared behaviors often
constitute but one of many possible solutions to a practical problem, with alternatives being relatively easy to learn or
discover. Second, many widely shared behaviors are stylistic in nature, without apparent utility (e.g. walking speed
appears to be similar within, and differ across, cultures). Some of these behaviors are explicable in terms of the
advantages of coordinationit does not matter which side of the road one drives on, so long as everyone drives on the
same side. However, behaviors such as walking speed do not overtly concern coordination. One clue as to the
significance of such behaviors, hence the ultimate functions of the mechanisms underlying their acquisition and
practice, lies in the observation that most, perhaps all, cultural information is morally forceful (Swartz & Jordan,
1980), that is, there is a right way to think, speak, or act, and people have a higher opinion of those who conform to
such standards, and a lower opinion of those who do not (see also chapter 17 here). Congruent with this observation,
violations of many cultural practices are met with punishment.
The models of Richerson, Boyd, and collaborators suggest that conformism has its roots in the stability of social
systems in which (1) norm violators are punished, and (2) those who fail to punish norm violators are also punished.
While this configuration can facilitate the cultural evolution of group-functional practices, it also constitutes a source of
selection pressure favoring psychological mechanisms that enhance conformism, since conformists escape the costs of
both punishment and higher order punishment (punishment meted out to individuals who fail to conform to the norm
of punishing norm violators). With both stable systems of punishment (Boyd & Richerson, 1992; Henrich & Boyd,
2001) and conformism-promoting psychological mechanisms in place, an efflorescence of norms will occur, leading to
considerable within-group homogeneity with regard to nonutilitarian behaviors such as walking speed. In this view, for
many nonutilitarian practices, conformism is an accidental consequence of other social and psychological systems, with
avoidance of punishment being the only benefit to be gained.
While Richerson and Boyd's position has much to recommend it, the phenomenology of reactions to norm violations
suggests to me that the psychology of conformism has been shaped by more than simply the recurrent presence of
higher order punishment. A central pillar in Richerson and Boyd's argument is the
end p.71
indisputable observation that punishing others entails costs. Doing something often involves more costs than doing
nothing, particularly when the action at issue concerns inflicting costs on another, behavior that will frequently elicit
resistance or retribution. Even the seemingly low-cost tactic of punishing through ostracism generates costsothers
are often potentially valuable to the actor as cooperation partners, sources of information, and so on, hence engaging
in ostracism entails forgoing a social resource. Given the costly nature of punishing others, Richerson and Boyd's
perspective leads one to expect that actors should be conservative in this regardwhether through conscious
calculation or unconscious evaluation by mental mechanisms dedicated to this task, punishers should be motivated to
punish only to the extent necessary to avoid suffering higher order punishment themselves. By extension, the
punisher's attention should focus primarily on the norm violation at issue, as this allows for a calibration of punishment
such that it is commensurate with the action being punished.
Congruent with the foregoing reasoning, evangelical Christian preachers often exhort their congregations to hate the
sin, not the sinner, that is, to direct punitive sentiment toward discrete norm violations that are punctuated in time,
rather than toward the norm violator, a social actor who may maintain a presence in the community long after the
violation has been committed. The fact that such exhortations are necessary at all (and the observation that they are
frequently unsuccessful) calls into question the conclusion that punishers are motivated to punish only to the extent
necessary to avoid higher order punishment.
Reactions to norm violations, including trivial norm violations, frequently seem to involve not merely disapproval of the
violation (as might be expected in a punishment-driven system) but, moreover, condemnation of the norm violator as
a person. Often, people who walk too slowly or too quickly are not merely bad walkers, they are suspected of being
bad people (he walks too slowly because he is lazy, she walks too fast because she is arrogant, etc.). Such
inferences are exercised not only in the domain of manners (into which walking speed falls) but also in regard to
mundane, even practical practices. Try writing a check at the bank while holding a pen in your fist instead of using a
pinch grip, punching the elevator button with your elbow instead of your finger when you are not carrying anything, or
walking about the supermarket pushing your shopping cart from the wrong end, and you will soon discover that you
attract not merely attention, but disdain. Moreover, I suggest that this disapproval is not purely corrective in nature, as
it is not simply your actions that are frowned upon, it is you, the whole actor, who are disliked.
One possible explanation for the tendency to hate the sinner rather than just the sin is that conformity to norms holds
communicative value, indicating to the observer that the actor (1) is familiar with local practices, the intricacy of which
is often so great that only extensive exposure will lead to mastery across domains; (2) values the local set of
practices, viewing them as superior to possible alternatives, including those common in other groups; and (3) values
the opinions of members of the local group. Conformity to diverse norms thus signals that the actor is a competent
and dedicated member of the cultural group, attributes that make the actor attractive as a potential member in
coalitions and cooperative ventures in which predictability is an important attribute (see also chapter 18 here).
Conversely, norm
end p.72
violations signal to observers that the actor either (1) is not a member of the local group, (2) does not value local
practices, or (3) does not value the opinions of members of the group, three attributes that make the actor
unattractive as a cooperative partnerthe sinner is indeed truly of little worth. I therefore suggest that natural
selection has favored the evolution of psychological mechanisms promoting conformism not simply because of the
ever-present threat of higher order punishment, but also because of the benefits to be gained by signaling that one is
the sort of person others ought to value (Fessler, 2004; Fessler & Haley, 2003).
3.4 Motivations Underlying Conformism and Punishment
The foregoing position is congruent with the observation that conformism is largely motivated by a desire to avoid
shame and embarrassment, the aversive emotions attending negative social appraisal (see Fessler, 2004, for review).
Although Westerners equate shame with guilt (Fessler, 2004), the two emotions are profoundly different: whereas guilt
focuses on the wrongness of an action and the need to repair the damage it inflicts on other parties, consistent with
the signaling argument developed earlier, shame focuses on the inadequacy of the person as a whole, and the
corresponding need to escape additional costly social scrutiny (see Gilbert et al., 1994; Tangney, 1998). Second,
embarrassment, the emotion often elicited by violations of norms governing comportment and presentation of self, is
accompanied by display behaviors that inform onlookers that the violation was unintentional; by communicating that
the actor knows and values the local standards, this mitigates the damage the violation causes to the actor's social
position (Keltner & Buswell, 1997; Keltner et al., 1997). While shame and embarrassment have their origins in
homologous primate affects, they nevertheless exhibit a number of novel design features, including (1) a focus on
culturally constructed standards for behavior, and (2) a reliance on a theory of mind (Fessler, 2004). Together with
the whole-self attribute of shame, the latter feature suggests that humans possess evolved motivational mechanisms
geared toward the avoidance of negative social appraisals stemming from nonconformity, where those appraisals
concern not simply the action but the individual as an actor in sustained, iterated social interactionswe do not simply
care how others act in response to our failure to conform to social standards, we care what others think of us as a
result of our nonconformity.
The threat of punishment by group members plays a central role in both Boyd and Richerson's account of the evolution
of conformism and mine. They note (1992) that punishment, itself a costly action, is explicable in terms of the
presence of higher order punishmentonce both punishment and higher order punishment are in place in any given
social system, the practice of punishment, and the attendant conformism, will be maintained. While cogent, this
perspective does not explain the origins of punishment, since a critical mass of punishers is necessary before the
system stabilizes. Kevin Haley and I (Fessler & Haley, 2003) have sketched a signaling account of punishment that, by
addressing the factors motivating individual punishers, can explain the origins of punishment. We argue that costly
punishing of norm violators offers an avenue for demonstrating to observers that the
end p.73
punishers know and support local standards for behavior, hence are themselves reliable and predictable individuals
who can be counted on as partners in coalitions and cooperative ventures (compare with chapter 16 here). Natural
selection has crafted emotions, importantly including moral outrage (anger at norm violations that do not affect the
observer) that motivate punishment by assimilating norm violations to the category of transgressions against the self
(see also discussion in chapter 17 here). Recently, Haley (in prep.) demonstrated that third parties express more moral
outrage at norm violations when in public settings than in private, and that those who resemble the norm violator
along dimensions relevant to the violation express more moral outrage than those who do not. We interpret these
findings as indicating that the mechanism generating the sentiments underlying punishment of norm violations (1) is
sensitive to opportunities for reputation formation, and (2) operates to differentiate punishers from norm violators with
whom they might otherwise be equated.
The social utility of signaling to others that one understands and values local standards in part explains why humans
dedicate so much effort to acquiring and policing practices and beliefs that have neither intrinsic utility nor direct
application to problems of social coordination. In turn, the existence of these mechanisms generates a proliferation of
often-arbitrary cultural standards. As a consequence, humans are born into a complex ecology composed of both a
dynamic social world of potential allies, rivals, and punishers and a baroque informational world of intricate and
situation-specific norms. I suggest that the social benefits of both conforming to and enforcing cultural standards for
behavior have constituted powerful selective pressures, crafting mental mechanisms dedicated to both the acquisition
of information regarding prevailing local norms and the assignation of moral force to those norms. These mechanisms
operate so pervasively that observers quickly moralize any prevailing pattern of behavior (a process I term normative
moralization; Fessler & Navarrete, 2003) even if the behavior's frequency does not derive from cultural sharing across
actors. For example, the right hand is often associated with rectitude, purity, and so on, while the left (source of the
word sinister) is often associated with evil and pollution. Presumably, these associations derive from the fact that, in
all populations, most people are right-handed.
3.5 Internalization
The moral force associated with many standards for behavior appears to derive in part from foundational beliefs and
values, by which I mean both propositional information (moral precepts, ethnopsychological schemas, etc.) and more
inchoate intuitions driven by emotional reactions to events (see Haidt, 2001). A topic of longstanding interest in
psychological anthropology is internalization, the process whereby cultural information comes to be integrated into the
informational framework with which, and through which, the individual perceives and experiences reality. Spiro (1997),
one of the leading theorists on the subject, draws a distinction between the internalization of cultural information and
its mere acquisition, arguing that there is a fundamental difference between simply being familiar with a cultural
concept and believing it to be self-evidently true (often to the point that
end p.74
the belief itself is transparent to the believer) and intrinsically motivationally salient. Information integrated into the
fabric of one's perception of reality necessarily exercises a greater influence over one's actions than does information
that is not so integrated. For example, during fieldwork in Indonesia, I observed that, among Bengkulu Malays working
for a Western oil company, those individuals who appeared to have internalized the key tenets of the Islamic faith
were violently ill upon learning that they had accidentally violated the taboo against eating pork; in contrast,
individuals who appeared to only pay lip service to religious ideas displayed little revulsion when informed that they
had consumed pork. Given its impact on behavior, for the purposes of this chapter, there are at least two important
questions regarding internalization, namely (1) What are the proximate mechanisms responsible for this process? and
(2) What are its ultimate functions?
Although numerous scholars have postulated mechanisms responsible for internalization, consistent with the legacy of
midcentury psychological anthropology discussed earlier, the vast majority of these rely on one version or another of
psychoanalytic theory (see Throop, 2003, for review). Explanatory frameworks in this tradition employ concepts such
as transference, countertransference, and projection that today are rejected by many experimentalists as either false
or simply untestable. One claim present in many accounts that is not intrinsically tied to such questionable constructs
is the assertion that the degree to which cultural information is internalized is in part a function of the extent to which
it resonates with redundant life experiences, particularly experiences that occur during maturation (see Throop, 2003).
This idea is appealing for several reasons. First, it is congruent with the notion that conformist transmission of cultural
information involves inferences made on the basis of observations of the behaviors and statements of numerous
individuals. The degree to which an individual internalizes a given piece of cultural information can thus be viewed as a
reflection of the ubiquity with which that idea is shared and the frequency with which it shapes the actions and
utterances of models in the individual's environment. Second, of relevance for both conformist transmission and
prestige-biased transmission, because the depth of internalization in part determines the extent to which a given idea
influences behavior, by assessing or registering the frequency with which the actions of a given model appear to be
congruent with or reflect a given idea, the learner can assess the importance of that idea for the given modelin other
words, the process of compiling observations over a prolonged period of time allows the learner to acquire both the
idea and its appropriate level of internalization (where appropriate reflects either the level of internalization
characteristic of one model, in the case of prestige-biased transmission, or the level of internalization prevailing across
many models, in the case of conformist transmission).
Given the many connections to more rigorous models of cultural transmission, psychological anthropologists' claim that
internalization is a function of redundancy in experience is deserving of study. However, caution may be in order with
regard to the primacy assigned to early experience by psychological anthropologists. Granted, it is sensible to presume
that the more foundational or elementary a given cultural construct, the less likely it is that later experience will
decrease its level of internalization. Nevertheless, I am impressed by the rapidity with which students
end p.75
appear to adopt new beliefs, at least some of which are passionately held, during the first year or two of college. In
ancestral human populations, natural disasters, warfare, alliances, and exogamy would all have contributed to a pattern
in which a significant number of individuals were, at one point or another during their adult lives, suddenly immersed in
a new cultural environment, one in which there were enormous potential costs to nonconformity and hence, given the
benefits of internalization (see later), enormous potential advantages to being able to deeply internalize new cultural
beliefs. One question in need of study, therefore, is the extent to which age or maturational stage does or does not
influence the impact of redundant experiences on internalization.
In contrast to the attention dedicated by psychological anthropologists to the process of internalization, to date,
theorizing regarding the ultimate functions of internalization has been more circumscribed. Hallowell (1955) influentially
argued that, because humans live in a culturally constructed reality, deeply internalizing the cultural worldview of one's
group is essential if one is to function effectively in society. I suggest that Hallowell was on the right track, but failed
to carry this reasoning through. Specifically, I propose that internalization is often (1) an efficient means of generating
correct behavior in diverse circumstances, and (2) a means of guarding against potentially costly temptations to violate
norms (compare with chapter 17 here). First, some of the cultural standards that have the greatest impact on an
individual's potential inclusion in cooperative ventures and alliances both spring from more elementary principles and
emotional orientations and may rarely be violated or otherwise addressed directly, leading to few learning
opportunities. Individuals who are able to distill such principles and orientations from diverse experiences can act in a
manner that will likely be acceptable to many members of the local group even when the specific task at hand is novel
or rare. Second, whether due to discounting of the future or simply underestimating the probability of getting caught,
individuals are often tempted to violate important cultural standards in order to obtain short-term gains. Because the
long-term costs of such violations can be substantial (including ostracism or collective execution in many small-scale
societies), it may be advantageous if important cultural principles can become self-evidently true, as this may reduce
the likelihood that they will be violated.
In arguing that internalization is an effective means of generating socially approved behavior, I do not mean to imply
that there is a constant relationship between the degree to which a cultural principle is internalized and the extent to
which it shapes behavior (the same caveat also applies to the observation that individuals moralize prevailing patterns
of behavior). C. David Navarrete and I and our colleagues (Navarrete et al., 2004) have demonstrated that, consistent
with the notion that actors can enhance their inclusion in cooperative ventures by signaling their familiarity with local
cultural understandings and their affiliation with the group that holds them, the prospect of circumstances in which aid
would be advantageous leads people to enhance their endorsement of the views of their cultural in-group, a
phenomenon dramatically illustrated in the United States following the terrorist attacks of September 11, 2001.
Together with Haley's findings regarding moral outrage, these results indicate that much work remains to be done to
end p.76
uncover the mechanisms responsible for the selective deployment of culturally shared information.
4 Conclusion
As this volume illustrates, the time is right for a systematic investigation of the evolved psychological mechanisms that
underlie humans' remarkable ability and propensity to acquire and use cultural information. While I have attempted to
sketch out some of the joints at which, perhaps, this defining aspect of human nature can be carved, I suspect that
the topics outlined here constitute but a small fraction of the beast. Sharpen your knives.
end p.77
5 Human Groups as Adaptive Units
Toward a Permanent Consensus
David Sloan Wilson
Foundational changes are taking place in our understanding of human groups. For decades, the biological and social
sciences have been dominated by a form of individualism that renders groups as nothing more than collections of self-
interested individuals. Now groups themselves are being interpreted as adaptive units, organisms in their own right, in
which individuals play supportive roles.
Let me be the first to acknowledge that this new conception of groups is not really new. A long view of scientific and
intellectual history reveals that the last few decades have been an exception to the rule. The founding fathers of the
human social sciences spoke about groups as organisms as if it were common sense (Wegner, 1986). Before them,
philosophers and religious believers employed the metaphor of society as organism back to the dawn of recorded
history.
Far from robbing recent developments of their novelty, this pedigree only deepens the mystery. How is it possible for
one conception of groups to be common sense for so long, for a radically different conception to become common
sense, and then for the earlier version to experience a revival? A superficial answer is that ideas are like pendulums
that swing back and forth. On the contrary, I believe that the organismic concept of groups will become permanently
established, in the same sense that the theory of evolution has become permanently established, even if there will
always be a frontier of controversy. In this essay I will attempt to show how the ingredients for a permanent
consensus are already at hand.
1 A Theoretical Zone of Agreement
Despite the radically different conceptions of groups, there are some substantial zones of agreement that provide the
basis for a future permanent consensus. The first concerns the theoretical conditions for a group to become an
adaptive unit similar to a single organism. Prior to the middle of the twentieth century, adaptations were often thought
to evolve for the good of the individual, group, species, or ecosystem as if there was no need to distinguish among
these units. This position, which now is
end p.78
termed naive group selectionism, became the target for criticism by a number of authors, notably G. C. Williams in
his book Adaptation and Natural Selection (1966). A consensus emerged that natural selection at any given level of the
biological hierarchy requires a corresponding process of natural selection at that level.
As an example, consider a single group consisting of two types of individual, A and B. Type A individuals behave in a
way that increases the fitness of everyone in their group (including themselves) at no cost to themselves. The idea of
providing a public good at no private cost might seem unrealistic but is useful for illustrative purposes. Type B
individuals are free-riders who enjoy the benefits provided by A-types without providing any benefits of their own. By
increasing the fitness of everyone, the frequency of A-types does not change within the group (except by drift). After
all, natural selection is based on differences in fitness, which are not present in this example. If providing the public
good requires a private cost, then A-types will be less fit than B-types, and their frequency within the group will
decrease until they ultimately go extinct. More generally, natural selection within a single group is insensitive to the
welfare of the group. This is one of the fundamental principles that emerged in the middle of the twentieth century; it
enjoys, and deserves, widespread agreement.
Continuing this example, suppose that there are many groups, not just one, that vary in their frequency of A and B
types. Even though the frequency of A does not change within any group (except by drift), groups with a higher
frequency of A will contribute more to the total gene pool than groups with a lower frequency of A. In effect, we have
added a process of natural selection at the group level: a population of groups, that vary in their genetic composition,
with corresponding variation in their contribution to the gene pool (fitness). Group selection provides the fitness
differences that were lacking within groups. In the case of a no-cost public good, any variation among groups is
sufficient for the A-type to evolve to fixation in the total population, because positive among-group selection is
unopposed by within-group selection. If providing a public good requires a private cost, then positive selection at the
group level is opposed by negative selection at the individual level, and the outcome depends on the relative strength
of the two processes. More generally, groups can evolve into adaptive units that are designed to maximize their
contribution to the total gene pool to the extent that selection among groups prevails against selection within groups.
This is also part of the consensus that emerged in the middle of the twentieth century; it remains theoretically valid
today (see Sober & Wilson, 1998, for a fuller discussion).
A third part of the consensus was that among-group selection is almost invariably weak compared to within-group
selection, so that in the vast majority of cases, groups cannot be considered adaptive units. Notice that this is an
empirical claim, in contrast to the previous two theoretical claims. The first two claims establish the conditions under
which group-level adaptations can evolve in principle. The third claim asserts that these conditions seldom exist in the
real world.
Everything I have said so far is part of the received wisdom during the age of individualism that can be found in just
about any evolution textbook published during the last 40 years. For the purpose of this essay, the important point is
that a new consensus can be reached by challenging the empirical claim while retaining the theoretical consensus. The
fact that a permanent theoretical consensus has already been established makes the task of establishing a new overall
consensus easier.
end p.79
2 An Empirical Zone of Agreement
In the previous section, I argued that the individualistic conception of human groups can be rejected and the
organismic conception accepted on the basis of a theoretical framework that everyone accepts. If an adaptation
evolves by group selection, then it is for the good of the group. If I am correct, then the existing disagreement must
be empirical in nature. Nevertheless, at a pretheoretical level there is also widespread empirical agreement about the
pervasive cooperative character of human society. Reviews of my recent book Darwin's Cathedral: Evolution, Religion
and the Nature of Society (Wilson, 2002) vividly illustrate this point. The thesis of this book is that religious groups
and other human social organizations are highly cooperative and evolved by genetic and cultural group selection. In
one set of commentaries, whose authors come from a variety of backgrounds, not everyone agreed about group
selection, but everyone did agree with the empirical evidence for religious groups as highly cooperative units. Alvis
(2003) said: I do not doubt his thesis that religious communities can function as adaptive units. Hinde (2003)
regarded the empirical claim as superbly demonstrated. Lease (2003) regarded it as unsurprising and already
appreciated within the humanities. Paden (2003) called it obvious, at least at the level of historical observation. In
another book on religion from an evolutionary perspective, Atran (2002) rejects adaptationist hypotheses at both the
group and individual level in favor of a by-product explanation. My hypothesis based on group selection is criticized at
length, but when the theoretical dust settles (at least according to Atran) he still acknowledges that it is
embarrassingly obvious that ... religious groups cooperate among themselves to better compete against other groups
(p. 233). This quotation could easily have come from Alexander (1987), Ridley (1997), or Wright (2000), including
the emphasis on between-group competition, but these authors base their views on individual- or gene-level selection
rather than group selection or nonadaptive by-product accounts.
In short, there appears to be nearly universal agreement about the empirical fact of human cooperation within groups
and even on the importance of between-group competition as a causative factor. The controversy is about how to
explain the accepted empirical fact theoretically. How odd! What I have said in this section seems to conflict flagrantly
with what I said in the previous section. How is it possible for everyone to agree theoretically on what counts as a
group-level adaptation, for everyone to agree empirically on the fact of human groups as (largely) cooperative units,
and for so much controversy to remain about how to theoretically interpret human cooperation as a group-level
adaptation, an individual-level adaptation, a gene-level adaptation, or a nonadaptive by-product of evolution?
3 Part of the Problem: Logical Inconsistency
It might sound suspect and self-serving to say that much of the controversy is based on logically inconsistent
arguments that can be dispelled with a little bit of clear thinking. In a vigorous debate among smart people, these
problems are quickly dispelled, leaving more interesting and substantial differences of opinion. However,
end p.80
the controversy over the nature of groups is not restricted to a debate among smart informed members of a single
group dedicated to the task. It takes place at a much larger spatial, temporal, and disciplinary scale that leaves plenty
of room for logical inconsistency. For example, the average college biology student learns little more about group
selection than what I provided at the beginning of this essay. Mostly such students learn that it is wrong and different
from accepted theories such as kin selection and reciprocal altruism. Even their knowledge of the accepted theories is
rudimentary. Theoretical literacy is low even among graduate students and faculty in ecology, evolution, and behavior.
To reevaluate group selection, such people would first need to overcome the aura of foolishness and taboo that
surrounds the subject. Then they would need to increase their theoretical literacy to the point where they could follow
a simple mathematical argument. All of this would take time and effort that they might be unwilling to invest unless
they became centrally interested. It would result in endless conversations with peers who have not made the same
commitment and the substantial likelihood that manuscripts and grants would be rejected because they invoke group
selection. The situation for students and faculty from other disciplines trying to learn about evolution is even worse.
These sociological factors tend to be regarded as boring by those who want to examine the issues on purely scientific
grounds. Nevertheless, they are interesting in their own right, especially for philosophers and sociologists of science
who wish to achieve a realistic understanding of science as it is actually practiced. I will therefore elaborate on how
there can be a zone of theoretical agreement that nevertheless results in controversy that persists for decades.
The theoretical consensus, as I said earlier, is that group-level selection is required for groups to evolve into adaptive
units. To determine if any particular trait evolves by group selection, the following information is required.
This follows directly from the theoretical consensus. Anyone who has accepted even the abbreviated account of group
selection provided in textbooks should feel compelled to accept these conditions for evaluating group selection. Thus
we are still in the zone of theoretical agreement.
Now comes the problem. Many discussions of evolution include the information just listed but do not present it in a
way that allows the role of group selection to be evaluated. Instead, group selection is rejected verbally or not
mentioned at all, and total evolutionary change is attributed to individual-level selection. When the same information is
presented as outlined here, group selection proves to be a significant component of total evolutionary change. The
rejection of group selection is
end p.81
therefore logically inconsistent. As long as the commonly accepted theoretical framework remains valid, the role of
group selection must be acknowledged, based on the empirical information provided.
Elliott Sober and I have extensively discussed this problem (Sober & Wilson, 1998), including detailed case studies
(e.g., Wilson, 1998, 1999, 2000). As a quick way to illustrate the magnitude of the problem, I encourage the reader
to listen closely to the next conversation that he or she has about the evolution of any given trait. Very often the
discussion is framed not in terms of evolution per se but in terms of an individual making a decision. Will the individual
receive a higher fitness by adopting trait A or alternative trait B? Whichever trait bestows the highest fitness is
assumed to evolve by individual selection. This heuristic assumes that individual selection will maximize the
1. The groups must be defined.
2. The relative fitness of individuals bearing alternative traits within single groups must be examined to evaluate
within-group selection.
3. The relative fitness of groups in the total population must be examined to evaluate among-group selection.
4. The relative strength of within- and among-group selection must be evaluated to determine the role of among-
group selection in total evolutionary change.
absolute fitness of the individual, even though everyone knows that natural selection is based on relative fitness and
that the evaluation of group selection requires the comparison of relative fitnesses within and among groups.
The assumptions that are required for the absolute fitness criterion (AFC) to correctly predict the outcome of natural
selection or to correspond to within-group selection are usually unstated and unquestioned. Returning to our example
of the no-cost public good, an individual would increase its absolute fitness by adopting trait A compared to trait B, but
not its relative fitness within its own group. Multiple groups and variation among groups are required for A to evolve.
Given these conditions, the AFC does correctly predict the outcome of natural selection (the A-trait does evolve) but
mistakenly attributes the outcome to within-group selection. In other cases, the AFC simply comes to the wrong
conclusion about what evolves (Wilson, 2004).
This problem exists not only at the level of casual conversation but at the highest levels of scientific discourse. A
recent model of sentinel behavior provides a sterling example (Bednekoff 1997). In numerous species of birds and
mammals, a single individual scans for predators, often from an exposed location, while other members of its group
forage for food. Along with alarm calls, sentinel behavior is a classic example of altruism that seems to require group
selection, with a shared benefit (enhanced protection from predators) and two potential private costs; exposure to
predators and inability to feed. Bednekoff 's model attracted attention because it interpreted sentinel behavior as safe
and selfish for the sentinel rather than altruistic.
The core of Bednekoff 's model is shown in figure 5.1, which portrays the fitness of sentinels and foragers (y-axis) in a
single group of five individuals in which some number between 0 and 5 act as sentinels (x-axis).
Each forager fails to detect a predator attack with probability V, and each sentinel fails to detect a predator attack with
the smaller probability V/a (a). The term a therefore represents the enhanced protection afforded by the sentinel.
Detection by either foragers or sentinels is assumed to be noticed by the whole group, so the collective probability of
failing to detect an attack is V
F
(V/a)
S
, where F is the number of foragers and S is the number of sentinels in the
group. The predator is assumed to be successful if it remains undetected, and the individual that is actually killed is
determined by a lottery in which each forager holds 1 ticket and each
end p.82
FIGURE 5.1 Bednekoff 's model: the fitness of sentinels and foragers.
sentinel holds b tickets (b > 1). The term b therefore measures the relative risk of a sentinel if the predator remains
undetected. Because a appears in both fitness equations and b appears only in the numerator of the sentinel's fitness,
sentinels provide a public good at their own private cost. This is shown graphically by the two curves in which the
fitness of both foragers and sentinels increases with the number of sentinels in the group (positive slopes) but the
fitness of sentinels is always less than the fitness of foragers (one curve entirely below the other). Equations and
graphs similar to these are typically used to study altruism. The graph charts values for F + S = 5, V = .9, a = 4, and
b = 3.
I have presented this model in detail to show that it includes all of the information required to identify sentinel
behavior as a group-level adaptation, at least within the context of the model. First, the groups are clearly defined as
the set of individuals who influence each other's fitness with respect to the evolving trait. Second, it is clear that
sentinels are less fit than foragers within any single group. Third, it is clear that groups with more sentinels contribute
more to the total gene pool than groups with fewer sentinels. Fourth, the relative strength of within- and among-group
selection will depend on the amount of variation among groups and on other details of population structure, but it is
clear that whenever the sentinel behavior does evolve, it will be on the strength of among-group selection, since it is
selectively disadvantageous within groups. Given all of this, how can Bednekoff (1997) interpret the sentinel behavior
as safe and selfish?
end p.83
The answer is that an individual can increase its absolute fitness by becoming a sentinel, even as it decreases its
relative fitness within its group. For example, imagine a forager in a group without sentinels deciding how to behave.
As a forager, its probability of survival is approximately 0.88 (see fig. 5.1). If it becomes a sentinel in a group with one
sentinel (itself ), its probability of survival will be approximately 0.94. It was on this basis that Bednekoff called
sentinel behavior safe and selfish. Subsequent empirical studies claiming to support the model were widely reported in
prestigious journals such as Science (Clutton-Brock et al., 1999) and the popular press. The newsworthiness of this
research is that something previously regarded as altruistic turns out to be selfish after all, yet the entire result is
based on a definitional shift from relative to absolute fitness. The information that establishes the trait as a group-level
adaptation is there for everyone to see, but few people bother because they unthinkingly equate individual selection
with the maximization of absolute fitness.
This is the tip of an iceberg of evolutionary thinking, both formal and informal, that invokes group selection without
knowing it. There is no logical justification for this kind of controversy. Sociologically, evolutionary biologists can
presently be divided into roughly three categories. The first has become fully comfortable with multilevel selection and
wonders what all the fuss is about. After all, don't the models and the empirical data clearly indicate the importance of
among-group selection? The second category acts as if nothing has changed since the 1960s, which is evidenced by
the formulaic statements about the rejection of group selection and failure to discuss the recent literature. The third
category has lapsed into silence about group selection, as if it never existed in the history of evolutionary thought. The
authors are presumably aware that something has happened that challenges the earlier rejection of group selection,
but evidently they don't want to get involved. Who cares, as long as their models correctly predict the outcome of
evolution? Thus, Bednekoff (1997) does not mention the term group selection, even though the ghost of group
selection is present in his rejection of the old interpretation of sentinel behavior as altruistic. As another example,
Giraldeau and Caraco (2000) wrote an entire book on group foraging without mentioning group selection, much less
evaluating their models or empirical data in a way that would enable the role of group selection to be identified.
A fourth category does not exist. No person today, to my knowledge, has accepted the theoretical framework that I
have described as the zone of agreement, presented the information in a way that allows the role of group selection to
be evaluated, and concluded that virtually all traits evolve by within-group selection. That was the claim that became
the foundation for the theory of individual selection in the middle of the twentieth century, and it currently stands
empty for the best of reasons. It cannot be sustained by plausible theoretical models and the empirical evidence. That
is why I am confident that an overall permanent consensus can be established on scientific grounds, whatever the
sociological challenges.
I will end this section with a brief discussion of the recent philosophical literature on multilevel selection, represented
by forums published in the Journal of Consciousness Studies (volume 7(1), 2000) and the journal Biology and
Philosophy (volume 17(4), 2002). These authors come close to being a single group of smart,
end p.84
informed people dedicated to the task. If they can't reach a consensus, what hope is there for the wider community?
My reading of the literature indicates that a consensus has been reached on the most fundamental issues and that the
frontier of controversy has moved on to issues that participants of the debate in the 1960s would have difficulty
recognizing. For example, there seems to be complete agreement that multilevel selection theory is logically consistent
and indicates an important role for group selection in the evolution of many traits. The question now centers on
whether alternative ways of working with the same informationfor example by averaging the fitness of individuals
across groups or the fitness of genes across individuals and groupscan be considered a form of individualism
alongside multilevel selection. Whatever the answer to this question, it is not the same question that was being asked
earlier. There will always be a frontier of controversy, but it is important to acknowledge the kind of progress that
renders past frontiers uncontroversial. That kind of progress has occurred among philosophers of biology and
theoretical biologists who remain interested in the subject of group selection, as opposed to ignoring its existence as a
fundamental issue in evolutionary biology.
4 Psychological Mechanisms and Indirect Products of Natural Selection
After the logical inconsistencies discussed in the previous section are acknowledged and avoided, another set of issues
remains. Suppose we observe a behavior that clearly provides a public good at private cost. Our nested series of
relative fitness comparisons reveals that the public good providers are less fit than nonproviders within their own
groups but that groups with providers are more fit than groups without providers. The behavior counts as a group-level
adaptation in terms of present-day fitness effects, but we still need to know how the behavior arose historically. After
all, most evolutionary definitions of adaptation require not only an appropriate fit to the environment but also the
historical process of natural selection that brings the trait about. Consider the following possibilities (which are not
intended to be mutually exclusive).
P1. The behavior that provides the public good evolved as the direct product of genetic group selection.
Behavioral differences reflect genetic differences, the behavior is selectively disadvantageous within groups, and
it evolves by virtue of the differential fitness of groups.
P2. The behavior evolves as a direct product of cultural group selection. Behavioral differences do not reflect
genetic differences, but the behavior is still transmitted from one individual to another by cultural inheritance
mechanisms. As in the previous case, the behavior is selectively disadvantageous within groups and evolves
only by virtue of the differential fitness of groups.
P3. The behavior originates and spreads by psychological processes rather than an evolutionary process. For
example, suppose that people in one group got together to decide and implement the best policy, which was
then quickly imitated by the members of other groups. The behavior is adaptive at the group level in terms of
fitness effects but did not evolve by a group selection process, genetic or cultural.
end p.85
P4. The psychological process could be conscious or unconscious. Conscious decision-making processes are the
tip of an iceberg of unconscious processes that are often very sophisticated, despite our unawareness of them
(Barkow et al., 1992; T. D. Wilson, 2002).
P5. The psychological process could count as altruistic or selfish, as psychologists and philosophers have
traditionally used these terms (Sober & Wilson, 1998). For example, according to the standard portrayal,
psychological egoists strive to maximize their welfare without regard to others, not in comparison to members
of their own group, and would decide to provide a public good if their share exceeds their private cost. This
behavior would be behaviorally disadvantageous within groups, requiring a process of group selection to evolve,
but it would be motivated by a psychological process that counts as egoistic.
The behavior is straightforwardly a group-level adaptation according to P1 and P2 because it arises directly from a
historical process of group selection. The distinction between genetic and cultural inheritance has no bearing on the
status of the behavior as a group-level adaptation. A direct process of natural selection is lacking in P3P5. To proceed
further, we need to think about the evolution of the psychological processes that directly account for the behavior.
Consider the following analogy. Many organisms have evolved to be cryptic to avoid detection by their predators and
prey. In some cases, the individuals have no control over their appearance; they are simply born a certain way. In
other cases, the individuals do have control, for example, a chameleon's wonderful ability to match its background
within minutes. Is the green color of a chameleon against a green background an adaptation? No, in the sense that it
arose from a flexible physiological process rather than a process of natural selection. Yes, in the sense that the flexible
physiological process evolved by natural selection.
Returning to human psychology, it is easy to imagine a mind designed entirely by within-group selection. Such a mind
would routinely produce behaviors that maximize relative fitness within groups and would seldom produce behaviors
that benefit the group at private expense, except as an occasional mistake. If people routinely produce behaviors that
qualify as group-level adaptations in terms of fitness effects, then the underlying psychological mechanisms are likely
to be a (partial) product of group-level selection. To summarize, if a behavior that qualifies as a group-level adaptation
on the basis of fitness effects exists by virtue of psychological processes such as decision-making and imitation, rather
than by an evolutionary process, that does not by itself constitute an argument against group-level adaptation.
Instead, it shifts attention to the evolution of the psychological processes. If the behavior is a typical product of the
psychological processes, then the processes are likely a product of group-level selection.
These comments apply to all of the psychological possibilities (P3P5) outlined above. What about the distinction
between conscious versus unconscious (P4) and altruistic versus selfish (P5) psychological processes? Neither of these
distinctions influences the status of the behavior or the psychological processes as a group-level adaptation. It is
commonplace in evolutionary biology to expect a given phenotypic trait to potentially evolve by more than one
proximate mechanism. If the trait
end p.86
evolves by among-group selection, it is a group-level adaptation regardless of the particular proximate mechanism that
evolves. Possibilities P3P5 represent alternative proximate mechanisms.
5 Advancing the Frontier of Controversy
I began this essay by saying that the ingredients for a permanent consensus on human groups as adaptive units are
already at hand. There is a theoretical zone of agreement on what constitutes a group-level adaptation and an
empirical zone of agreement about widespread cooperation in human groups. Despite decades of controversy, it is
possible to connect these two zones, concluding that group-level selection has been a very important evolutionary force
in human evolution, accounting for our groupish nature. Of course, within-group selection has also been an important
force, accounting for our tendency to subvert groups and our difficulty functioning cooperatively in groups beyond a
certain scale. Multilevel selection theory is ideally suited to explain human nature in all its prosocial and antisocial
complexity.
The importance of reaching a consensus on something so basic is that it allows us to advance the frontier of
controversy to more refined issues. In this spirit, I conclude by describing a number of issues that I regard as far more
interesting and worthy of attention than the raw fact of group-level selection.
1. The importance of ongoing cultural group selection. Part of the functionality of modern human groups has arisen by
cultural group selection and was never planned by anybody. We behave in ways that are smart, but we don't have a
glimmer of awareness about what, how, or why, prior to scientific investigation. Furthermore, well-documented
examples of cultural group selection exist in the social science literature, although they have seldom been associated
with the evolutionary issues discussed in this essay (see Wilson, 2002, for examples).
2. The importance of genetic group selection. Robert Boyd, who has championed cultural group selection for as long as
I have championed group selection in general, has said that the received wisdom about genetic group selection is
correct and that culture is required to make group selection a significant evolutionary force (most recently summarized
in Richerson & Boyd, 2005). One basis for Boyd's claim is his model of cultural group selection with Peter Richerson
(Boyd & Richerson, 1985), in which a conformity trait evolves by within-group selection as an adaptation to varying
environments, with consequences for cultural group selection that are initially a by-product. Another basis for Boyd's
claim is semantic. He acknowledges that so-called alternatives to group selection, such as kin selection theory and
evolutionary game theory, are themselves multilevel selection models that include within- and among-group selection.
However, he thinks cultural group selection is the only important new context in which group selection occurs, other
than the contexts that are already familiar in terms of kin selection and reciprocity. For example, he would claim that
group selection is never important in large groups of unrelated individuals in the absence of culture. I disagree, for
reasons that are presented in detail elsewhere (Sober & Wilson, 1998; Wilson, 2004). Culture is one kind of complex
process that can radically alter the partitioning of phenotypic
end p.87
variation within and among groups. Other noncultural processes exist that have the same effect, for example in
microbial ecosystems (Swenson, Arendt, & Wilson, 2000; Swenson, Wilson, & Elias, 2000) or in interactions among
insects (Rissing & Pollock, 1989). With respect to human evolution, the traits that make cultural group selection
possible have a genetic basis that almost certainly evolved at least in part by genetic group selection, Boyd and
Richerson's (1985) particular model notwithstanding (e.g., Wilson & Kniffin, 1999). Gene-culture coevolution is the
hallmark of the theory developed by Boyd and Richerson, but a consensus has yet to form on whether one or both are
multilevel processes. This current frontier of controversy is more interesting and substantive than arguing over the
raw fact of group selection.
3. A new look at conscious psychological processes. Conscious intentional thought is undeniably important in the social
organization of human groups. To give an example that I discuss in detail elsewhere (Wilson, 2002, ch. 3), John Calvin
and his contemporaries during the Protestant Reformation were in part savvy social planners thinking in pragmatic
terms. As another example, Boehm (1996) searched the anthropological literature for examples in which indigenous
people were forced to make an emergency decision and an anthropologist was there to observe what happened. In
each case, there was a pragmatic discussion of costs and benefits that we would recognize as rational, with minimal
reference to superstition, supernatural agents, and so on. However, conscious intentional thought is not the same as
self-interested thought. In all of the cases just cited, the question was what the group should do as a collective unit.
Exploitation and conflicts of interest within the group were often part of the discussion and were resolved in a way that
minimized the potential for within-group selection. For example, either the group would reach a consensus about what
to do that eliminated behavioral variation within groups (e.g., everyone goes to war), or it would regulate the fitness
consequences, for example, by agreeing to punish freeloaders or to restrict the benefits of public goods to those who
generated them (e.g., only those who go to war can enjoy the spoils of war). In short, when the elements of
intentional thought are examined in terms of the nested series of fitness comparisons described by multilevel selection
theory, they emerge largely as group-level adaptations that succeed by increasing the fitness of groups as corporate
units, not the fitness of some individuals compared to others within the same group.
4. The importance of unconscious psychological processes. The importance of intentional thought notwithstanding, there
are almost certainly many psychological processes relevant to group-level adaptation that are beneath conscious
awareness, simply because this is true for human mentality in general (Barkow et al., 1992; T. D. Wilson, 2002).
Unlike adaptations that evolved directly by cultural group selection, somebody thought them up. Nevertheless, we don't
have a glimmer of awareness about them prior to scientific investigation.
5. The possibility of group-level cognition. Human cognition is usually assumed to be an individual-level process, even
if the outcome of the cognition is adaptive at the group level (e.g., an individual deciding to provide a public good at
private expense). Another possibility is that the group becomes the cognitive unit, with social interactions comparable
to neuronal interactions. The concept of a group mind
end p.88
might sound like science fiction, but its likelihood follows directly from multilevel selection theory, has been well
documented in social insects, and is fully plausible for human groups (Wilson, 1997; Wilson et al., 2004).
6. The importance and plurality of nonegoistic psychological motives. The idea that all human motives are
psychologically egoistic is almost certainly false. Motives are proximate mechanisms that evolve by virtue of the
behaviors they cause. It is possible for egoistic motives to produce group- and other-oriented behaviors in principle,
but not as efficiently as nonegoistic motives (Sober & Wilson, 1998). The very idea of a single overarching motive is
outdated. If the kind of modularity emphasized by evolutionary psychologists (Barkow et al., 1992) is even partially
correct, there can be different evolved motives for different situations, and there is no theoretical reason why all of
them should be egoistic. In addition to a plurality of motives within individuals, evolutionary models predict that human
populations should consist of a mix of behavioral strategies, such as cheaters, cooperators who punish, and cooperators
who don't punish (Fehr & Fischbacher 2003). If motives are strategies that compete against each other in game-
theoretic fashion, then the outcome will almost certainly be a community of coexisting motives that succeed in different
ways, not a single overarching motive.
7. A subordinate role for proximate mechanisms in moral philosophy. Multilevel selection theory is ideal for studying
morality because it does not insist that morality (along with everything else) is a variety of self-interest. Philosophical
discussions of morality often concentrate on how people think about their conduct rather than on how they behave.
This emphasis ignores the relationship between proximate and ultimate causation in evolutionary biology. As mentioned
previously, it is common for a single phenotypic trait to evolve via more than one proximate mechanism. To pick an
imaginary human example, an other-oriented behavior could evolve via a psychologically egoistic mechanism, a
psychologically altruistic mechanism, a mechanism that is internalized so that is voluntarily performed, a mechanism
that is externalized as a form of social control, and so on. Every evolved behavior requires a proximate mechanism,
but the particular proximate mechanism that evolves is usually considered a minor issue and often a matter of chance.
Why, then, would morality be defined in terms of a particular proximate mechanism or set of mechanisms, when others
can potentially motivate the same human conduct?
6 How to Begin
These and other issues are already being discussed by those comfortable with multilevel selection theory and can
occupy center stage for everyone after the tedious debate over the raw fact of group selection is over. There is nothing
intrinsically difficult about multilevel selection theory. Dennett (2002) called it mind-bogglingly complex, but that is
only against the background of what preceded it. The history of science is full of ideas that initially appeared mind-
boggling, only to become the new common sense. In my experience, the average college student who approaches the
subject with a fresh mind can develop a workable set of intuitions in
end p.89
a single semester. These include thinking about groups as potentially adaptive units, in the same way that everyone is
currently accustomed to thinking about individuals, making the appropriate relative fitness comparisons, and thinking
clearly about proximate and ultimate causation. Advanced competence in multilevel selection theory is no more difficult
than for other theoretical frameworks, such as population genetics, inclusive fitness theory, and evolutionary game
theory. I look forward to the day when the basic intuitions of everyone who thinks about evolution and the advanced
competence of practicing scientists become the new and permanent consensus.
end p.90
6 The Baldwin Effect and Genetic Assimilation
Contrasting Explanatory Foci and Gene Concepts in Two Approaches to an Evolutionary Process
Paul E. Griffiths
1 The Papineau Effect
David Papineau (2003, 2005) has discussed the relationship between social learning and the family of postulated
evolutionary processes that includes organic selection, coincident selection, autonomization, the Baldwin effect,
and genetic assimilation. In all these processes, a trait that initially develops in the members of a population as a
result of some interaction with the environment comes to develop without that interaction in their descendants. It is
uncontroversial that the development of an identical phenotypic trait might depend on an interaction with the
environment in one population and not in another. For example, some species of passerine songbirds require exposure
to species-typical songs in order to reproduce those songs, while others do not. Hence we can envisage a species
beginning with one type of developmental pathway and evolving the other type. If, however, the successive evolution
of these two developmental pathways were a mere coincidence, selection first favoring the ability to acquire the trait
and later, quite independently, favoring the ability to develop it autonomously, then this would not be a distinctive kind
of evolutionary process, but merely two standard instances of natural selection. George Gaylord Simpson pointed this
out in the article that gave us the term Baldwin effect (Simpson, 1953). The real interest of the Baldwin effect and
its relatives lies in the mechanisms that might link the evolution of the two developmental pathways, so that acquiring
the trait through interaction with the environment makes it more likely that later generations will evolve the ability to
acquire the same trait without that interaction.
Papineau focuses on the way social learning can facilitate such Baldwin-like links. His basic idea is that the genes that
accelerate the social learning of some complex behavior might become advantageous only if that behavior is already
being passed on by learning in an animal culture. In this scenario, the relevant genes would be selected once the
population is socially transmitting the behavior, but not otherwise, thus yielding a scenario that satisfies the
specifications of the Baldwin
end p.91
effect. Papineau subjects this sort of process to closer analysis, showing that it simultaneously exemplifies two different
kinds of mechanism that the literature recognizes as possible sources of Baldwin effects. First, there is the process that
Papineau calls genetic assimilation. Here the focus is on some complex adaptive behavior, potentially under the
control of a suite of genes at different loci. The challenge is to explain how this suite can get selected in virtue of their
collectively producing the complex adaptive behavior. Prima facie, it seems that the whole suite of genetic changes
would need to occur simultaneously. An answer becomes available if the complex behavior is also learnable, for then
each gene can be advantageous on its own, in virtue of making the rest of the behavior more quickly or more reliably
learnable. The cumulative selection of the whole suite of genes thus qualifies as a Baldwin effect because it depends
essentially on intermediate stages in which (most of ) the behavior is learned.
This is one part of what Papineau thinks occurs in social learning cases. But he observes that there is a yet further
sense in which such cases fit the Baldwin requirements. The process he calls genetic assimilation takes it as given
that the complex behavior at issue is indeed learnable. But in many cases, it will be puzzling in itself that some
complex behavior can be learned, at least insofar as instrumental learning is supposed do the work, and reward only
accrues once the whole behavior is in place. This is where social learning plays its role: if the behavior is present in the
animal culture, then this in itself can render it learnable and so genetically assimilable. This now gives us a second
sense in which Papineau's social learning cases are Baldwin effects: the behavior is only individually-learnable-and-so-
genetically-assimilable because it is already present as a learned behavior in the animal culture.
Papineau suggests that this sort of double-strength Baldwin effect will exert powerful selection pressures in species that
exhibit a high degree of social learning. This is an interesting empirical conjecture that may or may not prove correct.
For my part, I am happy to agree that social learning can play a role in a distinctive form of niche-construction
(Odling-Smee et al., 2003) that can alter selective pressures in the way Papineau suggests.
I shall say nothing more here about social learning. Rather I want to focus on Papineau's discussion of genetic
assimilation. This term was introduced by Conrad H. Waddington (1942, 1953b) to refer to a specific process.
Waddington's process stands out among the other ideas listed earlier (organic selection, coincident selection,
autonomization, the Baldwin effect) both because Waddington was able to demonstrate it in laboratory selection
experiments and because it was part of his larger vision of the relationship between development and evolution, a
vision that has influenced contemporary work in evolutionary developmental biology, or evo-devo.
Let us look more closely at the way Papineau defines Waddington's genetic assimilation. He says:
Suppose n sub-traits, P
i
, i = 1, ... , n, are individually necessary and jointly sufficient for some adaptive
behavioural phenotype P. (... [E]ach individual sub-trait is no good without the others.) Each sub-trait can
either be genetically fixed or acquired
end p.92
through learning. ... Suppose further that each sub-trait is under the control of a particular genetic locus. ... So
for each sub-trait P
i
, we have allele I
G
which genetically fixes P
i
, and allele I
L
which allows it to be learned... .
Organisms who already have some I
G
s will have a head start in the learning race, so to speak, and so will be
more likely to acquire the overall phenotype. ... So the I
G
s that give them the head start will have a selective
advantage over the I
L
s. ... The population will thus move through a stage where P is acquired by learning
(Stage 1) to a stage where it is genetically fixed (Stage 2), thus yielding a prima facie Baldwin Effect.
(Papineau, 2005 , p. 48)
This process has little connection with the one described by Waddington himself.

1

1. It does resemble one version of organic selection. Patrick Bateson has argued that many learning processes have components that
might separately become independent of the environmental conditions originally required for their development and that the efficiency
or reliability of the learning process might thereby be improved. Like Papineau, he points out that these variations would only be
selected if organisms regularly undergo the complete learning process (Bateson, 2004, p. 289).
In itself, this is neither particularly important nor particularly surprising. Many different processes have been proposed
that might free traits from their developmental dependence on some aspect of the environment, and terms like
Baldwin effect and genetic assimilation have been used in numerous senses in this extensive literature (see, e.g.,
Belew & Mitchell, 1996; Weber & Depew, 2003). In fact, despite calling the process Waddington's genetic
assimilation, Papineau does not cite Waddington's work as a source, but instead cites a well-known computer
simulation of the interaction between learning and inheritance (Hinton & Nowlan, 1987). The interesting point is that
Waddington's actual model of genetic assimilation is simply not accessible to anyone who conceptualizes genes in the
way Papineau does in the passage quoted earlier. Several recent authors have stressed the need for biologists and
philosophers of biology to become more self-conscious about the existence of multiple gene concepts and of the
appropriate range of theoretical and experimental contexts in which those concepts should be deployed (Moss, 2002;
Falk, 2000; Stotz et al., 2004; Griffiths & Neumann-Held, 1999). I will argue here that paying attention to gene
concepts helps one to distinguish two radically different approaches to explaining how the development of a phenotypic
trait can become independent of certain aspects of the developmental environment. One approach looks to selection to
forge a link between the successive evolution of two developmental pathways to the same trait. The other approach,
represented by Waddington's genetic assimilation, looks to developmental biology. The latter approach seeks to explain
how the development of a phenotypic trait can become independent of an environmental stimulus (or become
dependent on that stimulus) by showing that in certain kinds of developmental systems such transitions can be
produced by small genetic changeschanges that are likely to occur spontaneously in a relatively short time. In the
first approach, the explanatory focus is on the relative selective advantage of the two developmental pathways. In the
second approach, the explanatory focus is on the developmental mechanisms that make suitable variants available for
selection.
end p.93
2 Genetic Assimilation and Gene-P
In the passage quoted earlier, Papineau employs a concept of the gene that Lenny Moss has labeled Gene-P:
Gene-P is defined by its relationship to a phenotype. ... Gene-P is the expression of a kind of instrumental
preformationism (thus the P). When one speaks of a gene in the sense of Gene-P one simply speaks as if it
causes the phenotype. A gene for blue eyes is a Gene-P. What makes it count as a gene for blue eyes is not
any definite molecular sequence (after all, it is the absence of a sequence based resource that matters here)
nor any knowledge of the developmental pathway that leads to blue eyes (to which the gene for blue eyes
makes a negligible contribution at most), but only the ability to track the transmission of this gene as a
predictor of blue eyes. Thus far Gene-P sounds purely classical, that is, Mendelian as opposed to molecular. But
a molecular entity can be treated as a Gene-P as well. BRCA1, the gene for breast cancer, is a Gene-P, as is the
gene for cystic fibrosis, even though in both cases phenotypic probabilities based upon pedigrees have become
supplanted by probabilities based upon molecular probes. (Moss, 2001, pp. 878)
Papineau's five genes are Gene-Ps, each defined by a specific part (sub-trait) of the phenotypic trait P. I take it that
these parts are dispositions to acquire behavioral modifications that together amount to a disposition to acquire the
new behavior P. The process he labels genetic assimilation is therefore simply the spread of certain of these
phenotypic traits as a result of selection. His trait I
G
is selectively superior to I
L
because I
G
individuals acquire P more
reliably than I
L
individuals. The sought-for link between individuals initially learning the subtrait P
i
and later individuals
possessing P
i
without learning is mediated by a process of niche constructiona change in the selective regime as a
result of behavior. In contrast, Waddington thought that the link between the ability to reliably acquire an adaptive
trait and the appearance of individuals with an intrinsic tendency to exhibit that trait was forged by the typical nature
of the development pathways underlying adaptively valuable traits. It was for this reason that he objected to Simpson's
term Baldwin effect with its implication that this evolutionary process is a special case. Waddington intended genetic
assimilation to be a ubiquitous feature of phenotypic evolution:
Simpson comes to the conclusion that the Baldwin effect, in the sense he describes it, has probably played a
rather small role in evolution. The genetic assimilation mechanism, however, must be a factor in all natural
selection, since the properties with which that process is concerned are always phenotypic; properties, that is,
which are the products of genotypes interacting with environments. (Waddington, 1953, p. 386)
According to Waddington, the tendency of phenotypes to become genetically assimilated reflects the fact that there is
little difference between the actual developmental processes that underlie a highly canalized phenotype that depends
on an environmental stimulus and one that has been rendered independent of that stimulus, as I will now try to
explain.
end p.94
3 Genetic Assimilation and Gene-D
Waddington was aware that his vision of development required a conception of the gene that does not intrinsically link
genes and specific phenotypic outcome. He made this point in The Evolution of Developmental Systems, an address
delivered in Brisbane in 1951:
Some centuries ago, biologists held what are called preformationist theories of development. They believed
that all the characters of the adult were present in the newly fertilized egg, but packed into such a small space
that they could not be distinguished with the instruments then available. If we merely consider each gene as a
determinant for some definite character in the adult (as when we speak loosely of the gene for blue eyes, or
for fair hair), then the modern theory may appear to be merely a new-fangled version of the old idea. But in
the meantime, the embryologists, who are concerned with the direct study of development, have reached a
quite different picture of it. ... This is the theory known as epigenesis, which claims that the characters of the
adult do not exist already in the newly fertilized germ, but on the contrary arise gradually through a series of
causal interactions between the comparatively simple elements of which the egg is initially composed. There can
be no doubt nowadays that this epigenetic point of view is correct. (Waddington, 1952, p. 155)
In Waddington's vision of development, the entire collection of genes makes up a developmental system that produces
a phenotype. Many features of the phenotype are explained by the dynamical properties of that developmental system
as a whole, rather than by the influence of one or a few specific alleles. Thus, for example, Waddington sought to
explain one of the major biological discoveries of his daythe fact that extreme phenotypic uniformity can be observed
in many wild populations despite extensive genetic variation in the same populationsby appealing to the global
dynamics of developmental systems. A canalized developmental system takes development to the same end-point
from many different genetic starting points. The development of wild-type phenotypes can thus be buffered against
genetic variation. Waddington represented this idea with his famous developmental landscape (fig. 6.1).
In modern terms, Waddington's developmental landscape is a representation of development as complex system
whose parameters are genetic loci and whose state space is a set of phenotypic states. The state space is depicted as
a surface each point of which represents a phenotype. The genetic parameters are depicted as pegs that pull on the
surface and thus determine its contours. Epistatic interactions between loci are represented by links between the cords
by which those loci pull on the surface. The development of an organism over time is represented by the movement of
a ball over the surface, which is dictated by gravity, so that the ball rolls downhill on a path dictated by the contours
of the surface. The development of the organism is thus represented by its trajectory over the surface, through
successive phenotypic states. The basic point Waddington uses this representation to make is that if the surface has
any significant contours, then the effect of a change at one genetic locus will be dictated by the overall shape of the
landscape, which is a global consequence of the states of all the other genetic loci. Some genetic changes, such
end p.95
Source: Waddington (1957), p. 36.
Figure 6.1 Waddington's developmental landscape. (a) The developmental trajectory of the
organism, represented by the rolling ball, is determined by a landscape representing the
developmental dynamics of the organism. (b) The shape of this landscape is determined by genes,
here represented by pegs pulling the landscape into shape via strings, and by epistatic interactions
between genes, here represented by connections between strings.
Source: Waddington (1957), p. 36.
as those that affect the tops of inaccessible hills, will have no effect on development. Other changes of the same
intrinsic genomic magnitude that affect the entrance of a valley, or canal, will have a massive effect on development.
The phenotypic impact of a genetic change is not proportional to the magnitude of the genomic change but depends on
the overall dynamics of development. Furthermore, the phenotypic difference produced by a genetic difference is not
explained by that genetic difference in itself but by how that change interacts with the rest of the developmental
system. This picture retains considerably validity in the light of contemporary developmental genetics (Gilbert et al.,
1996; Wilkin, 2003).
end p.96
Thus, in Waddington's vision, phenotypes are global expressions of genomes, but it does not follow that particular
parts of the phenotype express particular parts of that genome. The gene concept that fits this thoroughly epigenetic
view of development is the one Moss has labeled Gene-D:
Quite unlike Gene-P, Gene-D is defined by its molecular sequence. A Gene-D is a developmental resource
(hence the D) which in itself is indeterminate with respect to phenotype... . To be a gene for N-CAM, the so-
called neural cell adhesion molecule, for example, is to contain the specific nucleic acid sequences from which
any of a hundred potentially different isoforms of the N-CAM protein may potentially be derived. ... N-CAM
molecules are (despite the name) expressed in many tissues, at different developmental stages, and in many
different forms. The phenotypes of which N-CAM molecules are co-constitutive are thus highly variable,
contingent upon the larger context, and not germane to the status of N-CAM as a Gene-D.

2

2. Philosophers will note that Gene-P and Gene-D correspond, respectively, to descriptive and rigid readings of the phrase
the gene for P when this phrase is used in the usual way to report the fact that some DNA sequence accounts for a large
portion of the variance in trait P in some study population (see Sterelny & Griffiths, 1999, pp. 902).
(Moss, 2001, p. 88, his emphases)
To understand Waddington's vision of development, it is essential not to think of genes as genes for particular
phenotypes or phenotypic differences (Gene-P) but instead to think of them as parameters of a developmental system
(Gene-D). It is necessary to think in terms of what in Waddington's day was known as physiological genetics.
In a series of widely read articles, the philosopher Andre Ariew (Ariew, 1996, 1999) has used Waddington's concept of
canalization to explicate the concept of innateness. Innate traits, Ariew has argued, are those traits insensitive to
environmental variation, or, equivalently, those traits that are canalized with respect to changes in the environmental
parameters of a developmental system. Unfortunately, Ariew's work has led philosophers who know of Waddington only
through these articles to use the term canalization and even genetic canalization to mean insensitivity to
environmental variation. In fact, the idea of insensitivity to environmental factors, properly known as environmental
canalization (Wagner et al., 1997), cannot even be represented in Waddington's classic picture of the developmental
landscape (fig. 6.1). Environmental parameters are not included in this model, and whether a phenotype is canalized in
Waddington's original sense is a question of the dynamical structure of the developmental system, not the relative role
of genes and environment.

3

3. The evolutionary developmental biologist Brian Hall has written extensively on Waddington and has stressed that his thought was
profoundly gene centered, in the sense that he saw the developmental system as primarily and predominantly the expression of a
potential present in the genome (Hall, 1992, 1999, 2003).
But the model can easily be extended to include environmental parameters, and Waddington himself does so when
discussing genetic assimilation, as will be shown later. If these additional parameters are added, then we can define
both environmental canalization and genetic canalization.
end p.97
A phenotypic outcome is environmentally canalized if those features of the surface that direct development to that
end-point are relatively insensitive to the manipulation of environmental parameters. A phenotypic outcome is
genetically canalized if those features of the surface that direct development to that end-point are relatively insensitive
to the manipulation of genetic parameters. It should be noted, however, that we are not forced to draw this
distinction. The idea of canalization with respect to all the parameters that are included in a model of the
developmental system is equally legitimate. It is, after all, far from clear whether to classify many critical parameters,
such as the presence of DNA methylation or of maternal gene products in the cytoplasm, as genetic or
environmental. The issue of genes versus environment is peripheral to Waddington's central concern, which is how
developmental outcomes can be robust and reliable in the face of variations in developmental parameters.
Like some modern authors, Waddington believed that natural selection would favor the canalization of important
adaptive phenotypes. Developmental systems that produce important adaptive outcomes robustly will be selected over
those that are easily perturbed. Although I do not have time to explore this theme fully here, it is important to recall
that, like his contemporaries I. I. Schmalhausen and Theodosius Dobzhansky, Waddington saw natural selection as
optimizing not the phenotypic character itself but rather a norm of reaction that specifies a range of phenotypes as a
function of genetic backgrounds and environmental conditions: An animal is, in fact, a developmental system, and it is
these systems, not the mere static adult forms which we conventionally take as typical of the species, which become
modified in the course of evolution (Waddington, 1952, p. 155). When there is a single, optimal phenotype,
stabilising selection will operate to select a narrow reaction norm or, in other words, to canalize the phenotype. In
other circumstances, however, selection may favor a broader reaction norm, producing what we describe today as
adaptive phenotypic plasticity. The shape of the norm of reaction is itself a character produced by natural selection.
We are now in a position to see why Waddington thought there would be little difference between the actual
developmental processes that underlie a highly canalized phenotype that depends on an environmental stimulus and
those that underlie one that has been rendered independent of that stimulus. Waddington writes:
If natural selection was in this way acting in favour of the ability to respond in a useful way to some
environmental stimulus, it would also in time build up a canalised response, so that the most valuable degree of
expression was regularly achieved. Once that had been done, the genotype would have been modified so that it
determined a new valley on the developmental surface; but it would still require the push of an environmental
stimulus to cause one of the balls in our model to run into it. However, once the valley was formed and
canalised, the exact strength of the push, and the exact time at which it was applied, would be of lesser
importance. In fact, we might expect that, by this stage in the evolution [sic], there would be a number of
mutant genes available in the species which could divert development into the prepared channel; and thus, once
the ground had been prepared, as it were, an internal genetic mechanism could take over from the
end p.98
original environmental stimulus. We can thus envisage a mechanism by which a valuable response to the
environment could become gradually incorporated into the hereditary endowment of the species. (Waddington,
1952, p. 159)
I have discussed elsewhere how some of Waddington's contemporaries, particularly J. B. S. Haldane and his wife and
collaborator Helen Spurway, saw his work on genetic assimilation as demonstrating that there need be little difference
as regards developmental genetics between innate and acquired traits (Griffiths, 2004). Haldane and Spurway drew
on Waddington to argue that transitions back and forth between instinct and learning were to be expected in response
to the
end p.99
adaptive advantages of these two forms of development in specific environments. A couple of brief quotations will give
the flavor of this work:
[Discussing passerine song learning:] Some of these species must have passed through a stage where the song
was learnt by some individuals and was instinctive in others. As a geneticist I think that it is quite impossible to
make a sharp distinction between learnt and unlearnt behavior. (Haldane, 1955/1992, p. 605)
The number of generations during which a learned ethogenesis evolves into an instinctive ethogenesis, if it does
so at all, depends on the relative strength of the selection pressures favouring uniformity and variability in
development. (Haldane & Spurway, 1954, p. 275)
One of the most exciting features of this, Waddingtonian, vision of transitions between instinct and learning is its
symmetry. Most accounts of the Baldwin effect and its relatives focus exclusively on the elimination of dependence on
an environmental factor, but the mechanisms underlying Waddington's genetic assimilation can equally lead to
environmental assimilation when the selection pressures are reversed. Baldwinian phenomena are thus subsumed
under the more general topic of the selective advantages of different patterns of interaction between gene and
environmentthe field of research known today as adaptive phenotypic plasticity (Brakefield & Wijngaarden, 2003;
Schlichting, 2003). The traditional emphasis on the Baldwin effect and its relatives to the exclusion of other
evolutionary patterns reflects a misguided desire to get the effects of the environment on development written into
the germline, which in turn reflects the mistaken conviction that only in this way can the effects of the environment
on development be of evolutionary significance (Griffiths, 2003).
4 Gene Concepts and Explanatory Foci
In the evolutionary scenario described by Papineau, the genes for trait P (I
G
s) spread through the population in
response to a selection pressure caused by the spread of a learnt trait P whose acquisition requires five separate
dispositions each of which corresponds to a gene (allele I
L
and four companions). As I remarked earlier, these genes
are Gene-Psthey are DNA elements individuated by the criteria that their presence is a reliable statistical predictor of
a phenotypic difference. This, I suggest, is typical of one way of thinking about how the development of a phenotypic
trait can become independent of certain aspects of the developmental environment. The evolutionary problem is framed
as follows. (1) What are the adaptive advantages of having P conditional on an environmental factor? (2) What are the
adaptive advantages of having P independent of that factor? (3) Does the evolution of the first trait produce new
selection pressures that favor the evolution of the second? The genes that feature in typical scenarios designed to
address these questions are Gene-Ps corresponding to the difference between the first trait and the second.
In contrast, most of the genes that figure in Waddington's genetic assimilation scenariothe pegs in figure 6.1are
genes that are present both when the trait is dependent on the environment and when it is independent of the
environment. They are the genes (Gene-Ds) that play a causal role in building the P phenotype, not the genes that
differ between cases where that particular cascade of gene expression is switched on endogenously and cases where it
is switched on exogenously or even the genes that differ between individuals that have P and those that lack P. The
evolutionary problem is framed as follows. How does evolution produce traits that can be readily switched between
different triggers? This second way of thinking about how the development of a phenotypic trait can become
independent of certain aspects of the developmental environment corresponds to some of major themes in recent
evolutionary developmental biology, namely the evolution of developmental modularity (Gass & Bolker, 2003; Wagner,
2001) and the evolution of phenotypic plasticity (Preston & Pigliucci, 2004; Brakefield & Wijngaarden, 2003; Gilbert,
2001). These evolutionary problems simply cannot be posed if evolution is represented as change over time in the
frequency of genes for specific phenotypes (Gene-Ps).
We can compare these two ways of thinking about how the development of a phenotypic trait can become independent
of certain aspects of the environment with two ways to approach the evolution of sexual differentiation. The primate
SRY gene on the Y chromosome is a Gene-P with respect to the masculinization of the fetus: individuals who have this
gene are very likely to have a male phenotype. But it does not follow that the evolution of sexual differentiation should
be studied by asking how the SRY gene evolved. The masculinization of the mammalian fetus is the result of a complex
cascade of a gene expression. Both male and female fetuses have almost all the genes involved in this cascade. In
Waddington's terms, the developmental landscape has a deep branching valley running across it, and the SRY gene, or
its equivalent in other mammals, simply nudges the fetus into one branch rather than the other. With this picture in
mind, it is easy to understand how two rodents of the genus Ellobius have managed to lose the whole Y chromosome
while retaining the gene expression cascade of mammalian sexual development. Some other gene expression event
early in development acts to trigger the same cascade. Furthermore, sexual differentiation is an ancient characteristic
of vertebrates, and in this larger group, cascades of gene expression that are to some degree homologous with those
in mammals are triggered in still more diverse ways. Crocodiles, among others, have environmentally triggered sexual
differentiation. Some fish retain the capacity for females to be masculinized by an environmental trigger throughout
their life cycle. From the viewpoint of developmental genetics, understanding the evolution of the specific triggering
cause in one group or another is not the way to understand the evolution of the cascade of gene expression that
end p.100
constitutes becoming male. Nevertheless, there is nothing wrong with Gene-P thinking in the right contextif we want
to ask about the evolutionary pressures leading to genetic versus environmental sex determination, it is appropriate to
pose the question in terms of the selection pressures on the specific loci involved in these two modes of triggering. In
the same way, we can examine the selection pressures on the specific loci involved in making the switch between the
dependence of a behavior on learning and its independence of learning, as Papineau does, but this should not be
confused with the quite different project in which Waddington was engaged, namely asking how developmental systems
make such options readily available to selection.
5 Conclusion
Many evolutionary processes have been described in which a trait that initially develops in the members of a population
as a result of some interaction with the environment comes to develop without that interaction in their descendants.
Waddington's genetic assimilation is importantly different from the rest of this Baldwiniana because his explanatory
focus was not on the selection pressures at the point of transition but on how developmental systems come to be
structured in such a way that these evolutionary transitions are readily accessible to evolving lineages. Waddington's
approach also replaces the simple contrast between acquired and innate with a nondichotomous model of
developmental canalization and phenotypic plasticity that is in line with recent work on the evolution of development.
From a Waddingtonian perspective, evolutionary transitions between innate and acquired are only to be expected
because those categories have little meaning in terms of developmental genetics, and in some cases the difference
between the innate and acquired may require only a minimal change in developmental mechanisms. But to see this
it is necessary to use a gene concept suitable for thinking about development, and not a gene concept designed for
theoretical population genetics or for the prediction of phenotypic differences within populations.
end p.101
7 The Baldwin Effect and Genetic Assimilation
Reply to Griffiths
David Papineau
1 Canalization and Assimilation
Paul Griffiths argues that the process I called genetic assimilation in Papineau (2005) has little connection with the
issue C. H. Waddington had in mind when he coined the term. In order to establish this, he argues that Waddington's
understanding of genetic assimilation is simply not accessible to someone who conceptualizes genes in the way I did
in my article. According to Griffiths, where my article used a notion of gene-Pa gene as a difference-maker for a
specific phenotypeWaddington's thinking requires a notion of gene-Da gene as a developmental resource that is
indeterminate with respect to phenotype.
My response is that Griffiths is running two things together, genetic canalization and genetic assimilation. What he
says would make sense if the focus of my concern with Waddington were genetic canalization. But I was not discussing
genetic canalization, but genetic assimilation, and that is a different matter. (Once these issues are clarified, we shall
see that Griffiths points about different gene concepts, interesting as they are, constitute something of a red herring.)
Canalization is the phenomenon illustrated by Waddington's famous developmental landscapes. Certain phenotypic
outcomes are so important that natural selection has buffered them against environmental (and genetic) disruption. As
Griffiths explains, once some part of the developing organism finds itself in one of Waddington's valleys, then it will not
easily be deflected from its adaptive destination by unusual environments (or unusual genes).
Perhaps the most familiar kind of canalization occurs when developmental sequences that previously depended on
specific environmental interactions are
end p.102
brought under genetic control. This kind of genetic canalization will be involved in any process that deserves the
name Baldwin effect, since a Baldwin effect by definition requires that some item that was previously acquired from
the environment later comes to depend on genes. However, natural selection for such canalization is only part of what
defines the Baldwin effect, and not the most interesting part at that. As Griffiths himself makes very clear at the
beginning of his note, the interesting part of the Baldwin effect is the idea that natural selection is sometimes able to
bring development under genetic control specifically as a result of its previously being under environmental control. It's
not just that genetic control is selected over environmental controlit's more specifically that this selection occurs
because of the prior environmental control (chapter 6 here, p. 101).
However, having made this point clear at the beginning of his comments, Griffiths seems to lose sight of it. Even
though he presents himself as discussing Waddington's contribution to the Baldwin debate,

1

1. Thus Griffiths: Waddington's process stands out among the other ideas listed earlier (organic selection, coincident selection,
autonomization, the Baldwin effect) (chapter 6 here, 92). Again: Waddington's genetic assimilation is importantly different from the
rest of this Baldwiniana because ... (chapter 6 here, p. 101).
he does nothing to show how Waddington's thinking bears on the specific issue just emphasized (that is, the possible
existence of evolutionary processes where some aspect of development is brought under genetic control because it was
previously under environmental control). Rather, Griffiths simply focuses throughout on genetic canalization, which is a
far more general phenomenon, as I have just explained. Griffiths makes many fascinating observations about
Waddington's thinking on canalization. But by Griffith's own account, canalization per se does not count as a Baldwin
effect.
Griffiths would at least score a terminological point if the more general phenomenon of canalization were all that
Waddington was interested in. Of course, this wouldn't show that I mischaracterize Waddington's thinking about the
Baldwin effect, since it wouldn't show that Waddington was thinking about the Baldwin effect at all, given the
difference between canalization as such and the more specific Baldwin effect. Still, it would at least argue that I was
misguided to use the term genetic assimilation to refer to a species of Baldwin effect, as I did in the article Griffiths
is commenting on. After all, Waddington coined the term, and, if he wasn't interested in Baldwin effects, then that
couldn't have been what he was talking about. However, contrary to this suggestion, there is good reason to suppose
that Waddington was interested in genuine Baldwin-like processes and, moreover, that he used the term genetic
assimilation specifically in this connection.
2 Waddington's Understanding of Genetic Assimilation
Waddington introduced his conception of genetic assimilation by reference to his series of laboratory experiments on
fruit flies (Waddington, 1953b, 1957, 1961). These experiments did not just show how genetic control can be selected
over environmental control but, more specifically, how such a selective process can
end p.103
depend essentially on passing through a stage of environmental control. Moreover, Waddington was often quite specific
in emphasizing this point.
Consider the best known of Waddington's experiments, which induced environment-independent veinlessness in fruit
flies. Waddington subjected a population of fruit fly pupae to heat shocks (40 degrees Celsius for two to four hours).
As a result, some failed to grow crossveins on their wings (he called this trait veinless). Waddington then bred
selectively from these individuals, and again subjected the pupae to heat shocks. After repeating this process for 12
generations, he was able to isolate a strain of flies that displayed the veinless trait even in the absence of early heat
shocks and that subsequently bred true for this trait.
It should be clear that this experiment does not simply show that there can be selective regimes that will favor
spontaneous veinlessness over environmentally acquired veinlessness. Indeed, Waddington's eventual artifical
selection of the spontaneously veinless strain, as such, is a trivial matter. Rather, the interesting phenomenon is that
repeated selection of individuals who acquire the trait environmentally somehow increases the representation of
individuals who display it spontaneously and thereby makes them available for selection. The final artificial selection of
the spontaneously veinless strain depends essentially on the earlier selection of those who acquire veinlessness
environmentally.
Here is how Waddington himself described the significance of these laboratory experiments:
All these experiments demonstrate that if selection takes place for the occurrence of a character acquired in a
particular abnormal environment, the resulting strains are liable to exhibit that character even when transferred
back into the normal environment. That is to say, the process which has been defined as genetic assimilation
really occurs. Insofar as this is true, the appearance of acquired characters which are of value to an organism in
terms of natural selection will have evolutionary consequences. Natural selection for such characters will lead to
the appearance of populations in which the character is an inherited one and will be developed even in
environments other than that which originally provoked it and in which it is of adaptive value. We have,
therefore, experimental justification for using the notion of genetic assimilation to explain all those evolutionary
phenomena which people in the past have been tempted to attribute to the inheritance of acquired characters in
the Lamarckian sense. (1961, p. 263, my emphases)
I take it that this passage puts it beyond dispute that Waddington understood genetic assimilation to refer to
something specifically Baldwin-like, rather than simply to the more general idea of the evolution of genetic
canaliziation. Having said that, it must be said that Waddington was far less clear about the mechanism that might be
responsible for genetic assimilation. He tends to shy away from this topic, and often suggests that no further
explanation is needed beyond the general observation that evolution frequently favors genetic canalization (see
Waddington, 1957, 1961). Still, this doesn't alter my immediate point, which is that Waddington clearly uses genetic
assimilation to refer to the more specific selective processes by which some aspect of development is brought under
genetic control as a result of previously being under environmental control, even if he doesn't have any good
explanation of how this happens.
end p.104
3 An Explanation of Waddington's Experimental Results
Waddington is not alone in supposing that genetic assimilation is somehow self-explanatory. As Patrick Bateson has
observed, frequent references are made to genetic assimilation ... without thought being given to how a usually
implicit reference to Waddington might explain what was being proposed (2004, p. 290). Sometimes commentators
will refer to the role of the new environmental factor (for example, the heat shocks in Waddington's experiment) in
revealing hitherto unexpressed genetic variability (the presence or absence of the genetic factors that yield
veinlessness after heat shocks) and thus exposing these factors to selective pressure. But this by itself does not serve
to explain Waddington's results, for there is no intrinsic reason why selecting flies that are veinless-if-heat-shocked
should yield a population with an increased likelihood of innately veinless flies.
To see this more clearly, it will be helpful to think of the various flies in Waddington's veinlessness experiment as
having three sorts of genomes: those that ensure they have crossveins even if heat shocked; those that make them
veinless if heat shocked; and those that render them spontaneously veinless. Most of the flies in the original population
had the first sort of genome. By subjecting them to heat shocks and selecting for veinlessness, we get a population
with the second sort of genome. Now, why should the third sort of genome be more probable in the second population
than in the first? Why, so to speak, should the second and third genomes' similarity in phenotypic spacethey are
both capable of displaying veinlessnessmean that they are similar in genomic space, that is, a population with the
second genome makes the appearance of the third more likely? (See Mayley, 1996.) More generally, why should the
selection of genes that facilitate the environmental acquisition of the trait be a crucial step along the way to the
selection of its spontaneous appearance?
In my earlier article I offered the following baby model of what might be going on. Suppose veinlessness depends on
two factors: (1) some developmentally important protein loses its required conformation, and (2) the heat shock
protein needed to correct this is absent. Both of these factors can be genetically determined, but both genes are
originally rare, and so a spontaneously veinless fly is highly improbable. Now think of the extreme heat shocks
imposed by Waddington as an alternative nongenetic way of causing these two protein deficiencies. Not all flies subject
to the heat shocks will develop these deficiencies, but an appreciable proportion will. Now it is much easier for the two
rare genes for protein deficiencies to be selected for producing veinlessness: no longer do they have to find
themselves together with the other gene in order to produce the advantageous phenotype; either gene on its own
will now have a selective advantage, since it will mean the phenotype will appear in any case where the other protein
deficiency has been environmentally caused.
Obviously, the specifics of this explanation are speculative. But it seems plausible that something of this general kind
must lie behind Waddington's experimental results. We need only suppose that his advantageous phenotype results
from a number of factors, each of which gets produced in some individuals by his experimental manipulation of the
pupae, but each of which can also be fixed by some gene. Given these conditions, then the genes in question will
individually be selectively favored, because each on its own reduces the environmental
end p.105
contribution needed to produce the phenotype. A quantitative illustration of this process is given in the second chapter
of Jablonka and Lamb (1995, pp. 326), and they, too, argue that this is the natural explanation for Waddington's
results.

2

2. Bateson (1982) offers an alternative suggestion. Suppose veinlessness normally depends on the very rare homozygote of a rare
recessive gene, and suppose further that the heat shock reverses dominance so that even heterozygotes with one allele will display
veinlessness; this reversal will then create a significant selective pressure for the veinlessness allele, where none existed before, and
thus increase the allele's frequency to the point where homozygoteswho will display veinlessness even if not heat shockedwill
become common. However, this explanation, unlike the one offered in the text, is in tension with some further data reported by
Waddington: namely that in most of his experiments the final strain of flies that spontaneously displayed the advantageous phenotype
differed from the original normal flies at loci on a number of different chromosomes (Waddington, 1961, sec. 4B). Given this, Bateson's
explanation requires that the heat shock reverses dominance at all these different loci, not just one.
It is perhaps worth making it explicit why this model makes Waddington's experiments come out as instances of the
Baldwin effect (where Baldwin effect is understood, as before, as meaning that some trait comes under genetic
control as a result of its previously being environmentally acquired). In the foregoing model, an essential precursor to
the eventual spontaneous appearance of the advantageous phenotype is the stage where each of the genes is being
individually selected because it will cause the phenotype when the environment is producing the other determinants. In
this sense, the overall phenotype eventually comes entirely under genetic control only in virtue of the fact that
previously the environment was producing the various determinants of that phenotype.
4 Genetic Assimilation Generalized
I take this suggested explanation of Waddington's results to instantiate an important general structure. Take any case
where it would be biologically advantageous to have some phenotype genetically fixed, rather than dependent on
specific environmental stimuli. But suppose also that this requires a complex suite of genes, and that the initial rarity
of these genes makes their cooccurrence unlikely (and in any case liable to be undone by sexual reproduction).
However, if the various determinants of the phenotype can also be environmentally produced, then this selective
obstacle can be surmounted. As soon as the various determinants of the phenotype are environmentally produced in a
significant number of individuals, then each gene becomes advantageous on its own, even in the absence of the genes
at other loci, since it reduces the chancy dependence on the environment by ensuring that that the phenotype will
appear as soon as all the other determinants are environmentally produced. The selective process that ensues will
constitute a Baldwin effect in the sense defined earlier, since the eventual accumulation of genes for the different
determinants of the phenotype will hinge essentially on a prior stage where those determinants are also being
environmentally produced.
In my earlier article I used the term genetic assimilation to refer to this general structure. That is why I was happy
to include under this heading the kind of case where selection brings some complex behavior increasingly under
genetic
end p.106
control by cumulatively favoring genes that accelerate its learning. As Griffiths explains in his note, I modeled this
phenomenon by supposing that any such behavior has a number of subparts, each of which can either be learned or
genetically fixed. Even if it is selectively superior to have the whole behavior genetically fixed, initial rarity of the
relevant genes will present a prima facie evolutionary obstacle. Still, we can see how the genetic fixity of the behavior
could evolve if we suppose that the animals involved are also able to learn the various parts of the behavior. For then
each gene will indeed have an advantage on its own, even in the absence of the others, since it will increase the
speed and reliability with which the whole behavior is learned.
I am not alone in commandeering Waddington's terminology of genetic assimilation to cover a far wider range of
phenomena than he demonstrated in his original fruit fly experiments, including cases where behavior is brought under
genetic control via the cumulative selection of genes that lighten the amount of learning required to acquire the
behavior. I learned this usage from Peter Godfrey-Smith (2003), who uses genetic assimilation in this broad sense,
and he in turn takes it from a flourishing tradition of computer modeling of selective processes (see especially Hinton &
Nowlan, 1987; Turney et al., 1996). The rationale for this broad understanding of genetic assimilation, as I hope I
have made clear, is that the same general structure of interacting genetic and environmental processes is plausibly
present in both Waddington's experiments and in cases where learned behavior is brought under genetic control.
5 Arguments for Narrowing Genetic Assimilation
Not everybody is comfortable with this broad understanding of genetic assimilation. As we have seen, Paul Griffiths,
for one, queries my usage, on the grounds that my account of how learned behavior is brought under genetic control
assumes a conception of gene that is quite different from Waddington's. I shall discuss Griffiths's concerns in the next
section. But first it will be helpful to consider some rather different worries about applying genetic assimilation to
cases involving learned behavior. A number of other commentators, while not disagreeing with any of my substantial
analysis so far, feel that other differences between the learning cases and Waddington's experimental paradigm are so
significant that it is misleading to lump both under the same name. For example, Patrick Bateson lists a number of
ways the two kinds of case differ, and urges on this basis that we should restrict the term genetic assimilation to
Waddington's examples, and instead use the term organic selection for the behavioral cases (Bateson, 2004).
(Organic selection was the term originally used in the 1890s by Henry Osborne, Conwy Lloyd Morgan, and James
Mark Baldwin himself to describe the way that learning can lead to the genetic selection of behavior.)

3

3. Waddington also sought to distinguish genetic assimilation from organic selection, but on the rather different grounds that earlier
theorists failed sufficiently to appreciate the significance of canalization.
end p.107
Of course, there is no substantial issue here. All can agree that there are some similarities between Waddington's
examples and the learning cases, along with some differences, and moreover that we can adopt whatever terminology
we like, as long as we make it clear what we mean. Still, many of the points raised by Bateson are of interest in their
own right. I shall focus on two of them: the adaptiveness of learning and the role of mutation.
For the first point, note that the initial environmental cause of the novel phenotype works rather differently in the
Waddington and behavioral cases. In the Waddington cases, it is due to some novel environmental influence on early
development; in the behavioral cases, it is a result of a mature learning process operating in new environmental
conditions. In itself, this contrast might not seem to matter to the logic of selection for increased genetic control, but it
carries with it a further difference that does so matter: in the behavioral cases, the novel phenotype will
characteristically be adapted to the novel environmental conditions, whereas in realistic Waddington cases, as opposed
to those cooked up in his laboratory, such adaptedness will be a freak.
After all, there is no intrinsic reason why a real-life developmental novelty prompted by a natural analogue of
Waddington's heat shocks should be advantageous rather than deleterious. Of course, in his experiments Waddington
chose to select for the novelties his developmental shocks provoked. But there is no reason why natural environments
should be so cooperative, and in reality, genes that facilitate any given Waddington-style developmental novelty are far
more likely to be selected against than for. By contrast, behavioral novelties produced by learning mechanisms will
naturally tend to be advantageous, since learning mechanisms are themselves adaptations designed to produce
behaviors that are suited to current environments, and so there is a built-in reason why genes that facilitate these
behaviors will be selected for.
This is certainly a noteworthy difference. But it does not undermine the point that, whenever a Waddington-style
developmental novelty is advantageous, then just the same complex structure of selection as operates in the
behavioral cases can bring it under genetic control. Perhaps the fact that advantageous Waddington-style novelties will
be the exception rather than the rule argues that Waddington-style cases are a less powerful evolutionary force than
learning-based organic selection. Still, it is not as if it is alien to natural selection to work with sources of variation
that are more likely to produce deleterious variants than advantageous ones (see Jablonka & Lamb, 1995, p. 36).
I turn now to the suggestion that Waddington's cases are different from learning-based organic selection because
they work with preexisting genetic variability, whereas organic selection relies on new genetic mutations. I must say
that this does not strike me as a real difference. True, there is direct evidence that mutations played no role in
Waddington's experiments: when the experiments were tried on inbred strains of flies with no appreciable genetic
variability, selection for the novel phenotypes produced no genetically new strains (Waddington, 1961, sec. 4A).
Conversely, it is also true that most literature on the learning-based organic selection assumes that the eventual
genetic changes derive from the selection of new
end p.108
genetic mutations. However, it does not take much analysis to show that there is no real contrast here.
For a start, it just isn't true that learning-based organic selection needs new genetic mutations. Mutations may be
required for some other learning-involving processes that qualify as the Baldwin effect,

4

4. This is true, for instance, of the kind of process that Godfrey-Smith (2003) calls breathing spaces.
but if we focus on the kind of case at issue in this essay, where genes are favored because they reduce the amount of
learning needed to acquire some complex behavior, then it is clear that they can work with existing genetic variability,
and indeed for just the same structural reasons as apply in Waddington's cases. To illustrate, suppose that some
animal population begins to display some complex learned behavior (maybe the natural environment changes so as
make the behavior useful, or maybe a culture newly arises by happenstance). This will then create selective pressure,
when there was no such pressure before, for any gene that fixes some element of the behavior and thus reduces the
learning load. However, such genes could well have been present in the population all along, prior to their acquiring a
selective advantage. In that case, the relevant genetic variability would have been dormant, waiting for the
emergence of the learned behavior to allow the relevant genes to make a selective difference, just as the genes in the
Waddington experiments had to wait for his developmental shocks before they had any real chance of producing his
advantageous phenotypes.
Conversely, there seems no principled reason why, given enough time, Waddington-style experiments shouldn't depend
on mutations rather than preexisting genetic variability. Suppose the experiments on the inbred fruit flies had lasted
long enough for the genes that fix the required protein deficiencies to emerge occasionally by chance mutation. Then,
as long as the environmental shocks were still producing the protein deficiencies too, these mutant genes would have
been selected for, leading eventually to a strain that had both protein deficiencies genetically fixed, just as in the
original experiment. So, to sum up this point, both learning-based organic selection and Waddington-style cases seem
equally capable of working both with preexisting genetic variability and with novel mutations.
6 Gene-Ps and Gene-Ds
Let me now finally deal with Paul Griffiths's reason for doubting that the phenomenon I call genetic assimilation can
possibly cover what Waddington had in mind. In Griffiths's view, Waddington's thinking requires the notion of gene to
be conceptualized in a way that is quite different from the notion of gene that I assume when discussing genetic
assimilation.
Griffiths argues that Waddington's interest in developmental canalization meant that he thought of genes as
multipurpose developmental resources, rather than as difference-makers for specific phenotypes. Following Lenny Moss
(2001),
end p.109
Griffiths distinguishes between gene-Ds and gene-Ps. Gene-D is the notion Griffiths ascribes to Waddington. A
gene-D is a molecular sequence of DNA, but has no connection with any specific phenotype. The protein it determines
(or even the proteins it determines, when it has alternative regulators) can play different roles at different stages in
development, perhaps folding into different isoforms in different instances, and possibly contributing to the formation
of many different kinds of tissue. By contrast, gene-P is the perhaps more familiar notion of a genomic entity whose
presence or absence is a reliable sign of some specific phenotype, like blue eyes or cystic fibrosis. Gene-D is
Waddington's notion, says Griffiths, whereas he takes it that I need gene-Ps to analyze my kind of genetic
assimilation.
Well, I am more than happy to agree that Waddington's thinking about genetic canalization requires gene-Ds rather
than gene-Ps. In his note, Griffiths makes a compelling case that Waddington's ideas about the evolution of
canalization requires us to think of genes as alterable constraints on shifting developmental landscapes, rather than as
determinants of specific phenotypes. However, as I have stressed throughout this essay, genetic canalization is not the
same thing as genetic assimilation. So the fact that gene-Ds are required for understanding canalization does not imply
that they are required when we are thinking about genetic assimilation.
Thus I see no problem in the fact that my initial models of genetic assimilation, involving complex behaviors
originally acquired by learning, decompose these behaviors down into various subtraits, and assume that there are
genes for each of these subtraitsthat is, gene-Ps for these subtraits (see the passages quoted by Griffiths, chapter 6
here, pp. 9294). Since my topic is genetic assimilation, understood as the process whereby some complex trait is
brought under genetic control because it was previously under environmental control, rather than genetic canalization,
which is the topic Griffiths focuses on, I see no objection to my modeling it using a concept of gene different from the
one needed in order to think about canalization.
Still, perhaps Griffiths will want to press the issue further. Maybe genetic assimilation is not the same as genetic
canalization. But the point remains that Waddington was interested in genetic assimilation only because it is one source
of genetic canalization, and this in itself argues that the genetic assimilation processes he was interested in will need
to be analyzed in terms of gene-Ds. So, if my notion of genetic canalization demands gene-Ps, then it seems
unlikely, once more, that I can mean the same thing as Waddington by genetic assimilation.
This would be a good argument if the only way of thinking about genetic assimilation in my sense were in terms of
gene-Ps. But I do not accept this. While I think that there are cases of genetic assimilation that can happily be dealt
with in terms of gene-Ps, like my learned-behavior example that Griffiths focuses on, I am also ready to agree that,
when we come to the cases that Waddington was interested in, we need to switch to gene-Ds. And I take it that this
is exactly what I do when discussing Waddington's fruit fly experiments. There I don't break the complex phenotype
(veinlessness) down into subtraits, each with its own gene-P. Rather I speak of the various protein-level
determinants of veinlessness, and explain how selection can favor bringing these determinants under genetic control,
thereby
end p.110
rendering the development of veinlessness independent of environmental factors. This strikes me as more like gene-D
talk than gene-P talk, quite in line with the thought that Waddington's cases of genetic assimilation will resist analysis
in terms of gene-Ps.
So, once more, I see no reason to erect a principled distinction between Waddington's concept of genetic assimilation
and the way I understand this notion.
end p.111
8 Mental Number Lines
Marcus Giaquinto
The mental association of numbers with space is familiar: at school, children are taught to associate numbers with
positions on a visually presented line. Yet the nature of mental number lines, their role in our thinking, and even their
origin are not obvious and hold some surprises. Moreover, when cognitive scientists talk of a mental number line, they
are often not talking about any representation we acquire at school.

1

1. For example, the subject of a recent article entitled The Neural Basis of the Weber-Fechner Law: A Logarithmic Mental Number Line
(Dehaene, 2003) is the innate system of cardinal size representation found in humans and many other animals, not a taught
representation.
My aim is to get clearer about mental number lines, given the evidence to date; in particular, I shall try to show how
innate and cultural factors interact to determine the nature and role of mental number lines in very basic numerical
thinking. For this, we first need to take a look at our basic number representations.
1 Innate Sense of Cardinal Size
For representing positive whole numbers, as you know, we have our
(1) natural language number expressions (spoken and written), and
(2) numeral systems, such as the decimal place system.
And there is now considerable evidence that we also have
(3) an innate sense of cardinal size.
This number sense is a capacity for detecting the (approximate) cardinal number of a set of perceptually given items
such as a pack of predators, a sequence of howls, or a bunch of bananas. The capacity is exact for very small
numbers, which means that
end p.112
it enables us to discriminate reliably a small number from its neighbors. But for a larger number of things, one can
sense not its exact number but only an interval into which it fallsthe larger the number, the wider the interval. It is
possible that there are two innate systems in operation here, one for exact representation of very small numbers and
one for approximate number representation, which does not work for very small numbers, as proposed by Xu (2003).
In that case, the number sense should be regarded as comprising both systems. The limit of the capacity for exact
number discrimination is not fixed, but may be pushed up. Experience with finger arithmetic appears to have
considerable cognitive importance, as explained in Butterworth (1999, ch. 5). This, along with verbal counting, abacus
practice, and the like may sharpen the rough number sense so as to provide reliable discrimination into double-digit
numbers, thus extending the range of exact cardinal number representations. But the number sense itself is an innate
quantity spectrum, on a par with our sense of duration and our sense of spatial distance. Why do we take the number
sense to be innate? The short answer is that there is plenty of evidence that animals and human infants have a
number sense. For a more extensive and detailed answer, see Butterworth (1999, ch. 6).
1.1 Adult Sense of Cardinal Size
A normal child with decent education will learn to count, understand the decimal place system, acquire a store of
single-digit arithmetical facts, pick up some general equational rules, and master some calculation algorithms. So,
while a sense of number size is useful in the wild, for example, for rapidly gauging the number of nearby predators, in
numerate civilizations it will be an unused vestige of primitive cognitionor so one might think. In fact, that is very far
from true.
For a hint of the continuing importance of number sense for numerate adults, consider the following. You ask some
people if they can work out the value of seven to the power of six; one of them quickly writes down 1000000, saying
that this is the answer in base 7 notation. Understanding the place system of numerals, you will see that this smart-
alec answer is correct. But it will still leave you feeling somewhat in the dark, unless you already know the decimal
representation. Why? It correctly designates the number, and it does so in a language you understand. Given any other
number in base 7 notation, you would be able to tell which of the two is larger, and the algorithms you know for
multidigit addition and multiplication work just as well in base 7 notation. So what is missing? What you lack is a sense
of how large this number is. Obviously, it is smaller than a million. But is it smaller than half a million, a quarter of a
million, a hundred thousand, ten thousand, one thousand? You probably cannot tell without going some way toward
calculating seven to the power of six in decimal notation.
Admittedly, this is a rather large number, and base 7 notation is rarely used. So consider a much smaller number
presented in base 2 notation: 101101. Is this smaller or bigger than forty? Again, in order to answer this, you will
probably have to go some way towards translating the digit string into decimal notation or natural language number
expressions. Why is this? Why is it that you have a good idea of how large 45 is but a poor idea of how large 101101
is? The reason is that a strong
end p.113
association of number size representations with decimal numerals and with your natural language number expressions
has been established in your mind, while no such link has been established between number size representations and
multidigit numerals in other bases.
The nature of this association is indicated by a phenomenon known as the Stroop effect for numbers, which is
explained in Butterworth (1999, pp. 2583). This effect provides evidence that, even when number size is irrelevant to
the task at hand, when presented with numerals in a familiar system, we automatically access our sense of the
numbers designated by those numerals and order them by number size. Further experiments by Dehaene and
colleagues (1998) reveal that even when a digit is presented too quickly for us to be aware of seeing it, our sense of
the corresponding number size is accessed. Yet other experiments show that automatic accessing of number sense is
not restricted to single digits. You have a sense of the size of 45 (which in binary notation is 101101) and perhaps
even a vague sense of the size of 117,649 (which in base 7 notation is 1,000,000). All this attests to the fact that in
normal adults, representations belonging to the innate sense of number size are mapped onto the numerals of the
most familiar numeral system.
But how important is the number sense? We best know how important some faculty is to us when we have some idea
what it is like to be without it. This is revealed by the case of a bright young man, Charles, reported in Butterworth
(1999, ch. 6). Charles can count and reason normally. He lacks nothing but number sense and those abilities that build
on it. Subtraction, division, and multidigit calculation are impossible for him, and single-digit sums and multiplications
are solved slowly, using finger counting. For this man, ordinary activities such as shopping are awkward, to say the
least. So it appears that we cannot acquire normal arithmetical abilities without number sense.
2 Number Comparison and Number Sense
The nature of our capacity for sensing magnitudes of one kind or another is often illuminated by comparison tasks. In
number comparison experiments, subjects may be asked to indicate which of two given numbers is the larger, and the
time taken to respond (or reaction time, RT) is measured. An alternative is to specify a reference number beforehand
and ask subjects to indicate whether a given number is larger or smaller than the reference number. Number
comparison experiments vary in the format of the given numbers (words, digits, sets of dots) and in the manner of
responding. Two examples of the number comparison tasks are shown in figure 8.1.
There are two robust findings for comparison of numbers with one or two digits, the distance effect and the magnitude
effect.
The distance effect: the smaller the difference between the numbers to be compared, the slower the response, for a
fixed larger number. So it takes longer to respond for {6, 8} than for {2, 8}.
The magnitude effect: the larger the numbers, the slower the response, for a fixed difference. So it takes longer to
end p.114
FIGURE 8.1 Number comparison tasks.
For single-digit number comparison, the reaction time data conform pretty well to a smooth logarithmic Welford
function:
RT = a + k log [L/(L S)]
where L and S are the larger and smaller quantity, respectively, and a and k are constants. Even double-digit
comparison reaction times approximate to the Welford function. These phenomena are typical of response data for
comparison of physical quantities that are nondiscrete, such as line length, pitch, and duration.

2

2. Welford (1960). Moyer and Landauer (1967, 1973) first showed that the RT data for number comparison fit the Welford equation
pretty well.
This has led researchers to conclude that the mental number representations used in these tasks are quantities of a
nondiscrete analogue magnitude: see for example Moyer and Landauer (1967). It is at this point that we hear of a
mental number line: The digital code of numbers is converted into an internal magnitude code on an analogical
medium termed number line, says one article in a top cognitive science journal (Dehaene et al., 1990, 81).
The number comparison effects clearly do not justify the idea that number is mentally represented as line lengththe
same effects are found with comparison of sound volume, but we are hardly tempted to talk of a mental volume line
and in fact talk of a mental number line is often metaphorical.

3

3. Dehaene (1997, p. 81) calls it a simple yet remarkably powerful metaphor.
But it is widely held that the number comparison effects do justify the claim that cardinal numbers are represented by
quantities of an internal analogue magnitude, where this is taken to imply that the representing magnitude is
nondiscrete.
This is too hasty. The reaction time data can be explained using a system of discrete representations of cardinal
numbers: specifically, each number n is represented by n activated nodes, and the representation of each number
includes the
end p.115
representation of smaller numbers. Using this discrete thermometer model of number representation, together with a
certain computational model of number comparison, Marco Zorzi and Brian Butterworth found that the combined model
predicted RTs that showed the distance and magnitude effects and conformed to a Welford function (Zorzi &
Butterworth, 1999). To explain the difference effect on this model, consider, for example, the pairs {6, 8} and {2, 8}.
There is a difference of only two nodes in the representations of 6 and 8 and a difference of six nodes in the
respond for {9, 12} than for {2, 5}.
representations of 2 and 8. This means that there is a greater difference of input activity to the response nodes for the
pair {2, 8} than to the response nodes for {6, 8}, and so the competition between the response nodes for {2, 8} is
resolved more quickly.
What about the magnitude effect? This is due to a feature of the decision process, namely, that the output level of a
response node is a sigmoidal function of the input level, as illustrated in the pseudograph shown in figure 8.2.
This means that outputs of nodes for numbers (above the first few) will increase with the numbers but at a falling
rate; so the difference in output for the pair {3, 4} will be larger than the difference in output for the pair {8, 9} even
though the input differences are the same for both pairs. Because the output difference is smaller for the pair of
greater numbers, the competition between the response nodes for the greater numbers is resolved more slowly. Hence
the magnitude effect. Clearly the representations need not be continuous (or nondiscrete) like an uninterrupted line.
The mistake is to assume that the RT data must be explained by the nature of the representations alone, as opposed
to the nature of the representations plus the nature of the processes involved in performing the tasks.
FIGURE 8.2 Sigmoidal function.
end p.116
What the data do rule out is that comparison of double-digit numbers is performed by a digit-by-digit algorithm, first
comparing the left digits and then, if the left digits are the same, comparing right digits. See Hinrichs and colleagues
(1981) and Dehaene and colleagues (1990). But for three-digit numbers, it seems we do use digit-by-digit comparison.
However, there is some evidence that even for double digits we also evaluate tens and units separately in addition to
using the number sense, but the effect is relatively insignificant.

4

4. Nuerk et al. (2001). Consider pairs that are closely matched for difference and size of larger. A pair whose larger number has both a
larger tens digit and a larger units digit than the smaller number is said to be compatible; a pair whose larger number has a larger
tens digit but smaller units digit is said to be incompatible. We are slightly faster for compatible pairs than incompatible pairs. For a
review of recent work on this see Nuerk and Willmes (2005).
For single- and double-digit number comparison, the pattern of RTs matches that for comparison of continuous
quantities, such as sound volume and duration. For three-digit comparison, we seem to use the digit-by-digit
algorithm; but this piggybacks on single-digit comparison.
To summarize these observations about the number sense: it is an innate faculty probably strengthened and refined
under the impact of cultural practices. For single- and double-digit numbers, an adult's number sense is like an adult's
sense of duration, pitch, or sound volume. There is no reason to think that the number sense consists of or depends
on visual or spatial representations. In particular, the RT effects do not justify taking the spectrum of number size
representations to constitute a mental number line.
3 Association of Number and Space: The SNARC Effect
But there is evidence of an association of number with space. In a number comparison experiment run by Stanislas
Dehaene and colleagues, subjects had to classify a number as larger or smaller than 65, using response keys, one
operated by the left hand and the other by the right. Half of the subjects had the key for responding smaller in their
left hand; the other half had the key for responding smaller in their right hand.
So the two groups can be classified as (1) smaller-left and larger-right (SL) and (2) larger-left and smaller-right (LS)
(see fig. 8.3). Dehaene noticed that the SL group responded faster (and with fewer errors) than the LS group. When
the presented number was smaller than 65, SL subjects pressed their left-hand key faster than LS subjects pressed
their right-hand key; when the presented number was larger than 65, the SL subjects pressed their right-hand key
faster than LS subjects pressed their left-hand key.
What could explain the reaction time superiority of smaller-left and larger-right subjects? If subjects mentally
associated smaller numbers with the left and larger with the right, correct responses would have to overcome an
obstructive incongruity for larger-left and smaller-right subjects: numbers associated with the left
end p.117
FIGURE 8.3 The SNARC effect.
would have to be classified by the right hand, and numbers associated with right would have to be classified by the
left hand. Hence larger-left and smaller-right subjects would be slower, as was in fact the case.
But is the association with the hands? Or is it with the sides of space from the subject's viewpoint? In fact, the hands
are irrelevant. When subjects responded with hands crossed, subjects who had the smaller key on their left (but
operated by their right hand) and the larger key on their right (but operated by their left hand) responded faster. So
it is the left and right halves of egocentric space that are associated with smaller and larger numbers, respectively.
Dehaene named this association the (SNARC) effect.

5

5. This name, which is an acronym for spatial -numerical association of response codes, is a deliberate allusion to Lewis Carroll's poem
The Hunting of the Snark. See Dehaene (1993, 1997).
Another question: What determines whether a number is regarded as small or large? It depends on whether the
number falls into the lower or upper half of the test range, which subjects are made aware of prior to testing. When
the range is 0 to 5, responses for 4 and 5 are made faster with the key on the right; but if the range is 4 to 9,
responses for 4 and 5 are faster with the key on the left. This relativity to range excludes explanations of the SNARC
effect that are based on properties of the digits, such as visual appearance or frequency of usage (Dehaene, 1993).
A natural hypothesis is that for a number comparison task, the numberspace association is activated and the task
converted into one of finding relative positions on a left to right number line. But this is doubtful. The SNARC effect is
also found in number tasks for which the size of the number is irrelevant. In one experiment, subjects were asked to
judge the parity (odd or even) of the presented number. For
end p.118
each subject, the assignment of odd and even response keys to left and right was changed so that for half of the
trials the odd key would be on the left, and for half on the right. Regardless of parity, responses to numbers in the
upper half of the test range were faster when the appropriate response key was on the right, and responses to lower
half numbers were faster when the appropriate response key was on the left (Dehaene et al., 1991). This suggests
that the numberspace association is active even when it is not used to perform the current task; and this, in turn,
highlights the possibility that it is not used even in number comparison tasks, though it could be used for those tasks.
Present evidence, I believe, is insufficient for a verdict on this question.
3.1 Cultural Origin of the SNARC Effect
What causes this association of the leftright dimension of egocentric space with number in order of magnitude? This
was investigated by using as subjects some Iranian students living in France who had initially learned to read from
right to left, instead of left to right as Europeans do. Those who had lived in France for a long time showed a SNARC
effect just like native French students, while recent immigrants tended to show a reverse SNARC effect, associating
small numbers with the right and large numbers with the left. Thus all the subjects showed an association of number
size with the leftright dimension of egocentric space, but the direction of the association was determined by exposure
to cultural factors, such as direction of reading and of ruler calibration. The reverse SNARC effect has also been found
in Arabic monoliterates by Zebian (2005); the same study found a weakened reverse SNARC effect in Arabic biliterates
and no effect in illiterate Arabic speakers. Nonetheless, this deployment of a numberspace association is unconscious
and task-independent, and clearly lies beyond anything we are explicitly taught to do.
3.2 Visual Imagery and the SNARC Effect
Moreover, this numberspace association can easily be overridden by another one. Daniel Bchtold and colleagues
obtained a SNARC reversal within subjects, by getting them to indicate as quickly as possible whether a given number
between 1 and 11 (other than 6) is larger or smaller than 6, using right and left response keys under two different
conditions (Bchtold et al., 1998). In the first condition, subjects were led to visualize the numbers on a 12-segment
ruler, and in the second condition, they were led to visualize the numbers on an hour-marked clock face; otherwise,
the conditions were identical (see fig. 8.4).
While on the ruler the larger numbers would be imagined on the right, on the clock face the larger numbers would be
imagined on the left. Sure enough, subjects showed the SNARC effect under the first condition, and the same subjects
showed the reverse SNARC effect under the second condition. This points to the operation of visual imagery. In the
first case, the effect was probably due to the use of a visualized horizontal number line calibrated from left to right; in
the second case the effect was probably due to a visualized number circle calibrated clockwise.
end p.119
Source: Bchtold, Baumller, & Brugger (1998), fig. 1.
FIGURE 8.4 The reverse SNARC effect.
Source: Bchtold, Baumller, & Brugger (1998), fig. 1.
These findings raise a significant question for the nature of the numberspace association revealed in the SNARC effect
data from the standard number comparison experiments, that is, experiments without conscious deployment of visual
imagery. Is this numberspace association constituted by a representation in the visual imagery system?
4 Association of Number and Space: Bisection Shift
Some striking clinical data seem to point in this direction. The patients concerned suffer from a visual deficit known as
neglect.

6

6. It is also known as unilateral neglect, hemispatial neglect, or hemineglect.
Neglect patients fail to notice objects and events on one side of their visual field, usually the left, following
contralateral brain lesions, usually of the inferior parietal lobe. Neglect is not the same as hemianopia (left-field or
right-field blindness), as there is plenty of evidence of visual processing on the affected side.

7

7. See for example Cohen et al. (1995); Driver and Mattingley (1998); Schweiberger and Steif (2001).
Rather, it is usually regarded as a loss of visuo-spatial attentional control, an inability to attend to features on one side
of the visual field resulting in a loss of visual awareness on that side. Neglect patients may leave the food uneaten on
the left side of the plate, may shave only the right side of their face, or miss words on the left side of a page when
reading.

8

8. Robertson and Marshall (1993). Neglect patients are usually unaware of the deficit, unlike hemianopia patients, and so do not try to
compensate by motions of head or eyes.
When asked to draw a copy of a picture presented to them, for example, of a cat, they may draw just
end p.120
the right half; if the picture is of a clock face, they may omit the numerals on the left side. These symptoms reveal a
deficit of perception. A parallel deficit of imagination has been found to accompany it. A remarkable example of neglect
in visual imagery was provided by two neglect patients from Milan (Bisiach & Luzzatti, 1978). They were asked to
visualize and describe Milan's Piazza del Duomo from the side of the square facing the cathedral. Both patients
described features that would have been on their right from that viewpoint but omitted those that would have been on
their left. Afterwards they were asked to visualize and describe the square from the opposite side, as if they were
standing just in front of the cathedral facing away from it. Then they described features that were previously omitted,
and they omitted features previously described; so they were reporting just those features that were on their right in
the currently imagined scene and omitting features on the left in the currently imagined scene.
A relevant symptom of neglect is that when asked to mark the midpoint of a horizontally presented line segment,
patients typically choose a point to the right of the actual midpoint (see fig. 8.5).
For a given line, the extent of shift to the right varies within patients. For a given patient, longer lines mean greater
rightward shift (Halligan & Marshall, 1988). Corresponding to the line bisection test is a number-range bisection test:
subjects are presented with two numbers and are asked to state the number midway between them without
calculation. In a recent study, Zorzi and colleagues reasoned that if the mental number line were a fiction, neglect
patients would not show a shift above the midnumber corresponding to the rightward shift shown in line bisection
(Zorzi et al., 2002). They tested four patients with left-side neglect, four right braindamaged patients without neglect,
and four healthy subjects, all with normal numerical and arithmetical abilities on standard tests. While the healthy
subjects and nonneglect patients showed no deviation from the midnumber in the number bisection tasks, the neglect
patients systematically overshot the midnumber. Moreover,
FIGURE 8.5 Line bisection by left-neglect patients.
end p.121
this shift above the midnumber almost always increased with the range, that is, with the extent of the difference
between the given numbers, thus replicating the pattern of line-bisection errors typical of neglect patients.
Although this is not an overtly visuo-spatial task, the data can be explained on the assumption that number
representations are integrated with a visually imagined horizontal line, or a horizontal row of evenly spaced numerals,
and we attempt to locate the midnumber of the given range by means of an unconscious internal line-bisection,
choosing the number represented closest to the bisection point. When patients are given a pair of numbers, say 2 and
9, with the task of choosing a midnumber, an image of the segment of the number line from 2 to 9 is activated
automatically and unconsciously. There is no loss of the leftward part of the line-segment image because attention is
not required to produce the image. But attention is required to use the image, even when the use is not conscious; so
in left-neglect patients, the left side of the image is not available for use. As there was no effect of the order in which
the given numbers were presented, for example {3, 7} or {7, 3}, the constant shift above the midnumber suggests
that the patients used a number line that was mentally calibrated from left to right, thus cohering with the SNARC
effect for people from Western cultures.
More recent data, however, weigh heavily against the hypothesis that number bisection involves the use of an
unconscious horizontal number line in the visual imagery system. Doricchi and colleagues have found a double
dissociation between rightward shift in line bisection and upward shift in number interval bisection (Doricchi et al.,
2005). By mapping and comparing the lesions in neglect subjects who showed upward shift in number bisection and
those who did not, Doricchi and colleagues found that the damage probably responsible for the upward shift lay
outside the areas most frequently damaged in neglect subjects. Moreover, brain imaging studies suggest that different
cerebral regions are activated in number comparison tasks and visual line bisection.
At this stage, then, it is likely that if the numberspace association revealed in the standard SNARC effect data is used
in number bisection, it is independent of any visual number line representation.
5 Calibration
The expression mental number line may be applied to three things:
(1) the association of number magnitude representations of the number sense with number words or numerals;
(2) the association of number representations with positions in egocentric space;
(3) the association of number representations with positions on a visual line.
In each case there arises a question about the association, which we may think of as a question of calibration. Is it
regular as on an ordinary ruler? Or increasingly compressed as on a slide rule? What exactly this comes to varies in
each case; so let us look at each case in turn.
In the first case, consider two pairs of numbers, each pair differing by the same amount, such as {2, 5} and {9, 12}.
It is quite possible that although the numerical
end p.122
difference is the same for both pairs, the cognitive difference between the magnitude representations onto which the
numerals 9 and 12 are mapped is smaller than the cognitive difference between the magnitude representations
onto which the numerals 2 and 5 are mapped. The increasing compression hypothesis for the association of
numerals and number sense magnitude representations is that this holds in general: for a fixed numerical difference,
the greater the mean of a pair of numbers denoted by two numerals, the smaller the cognitive difference between the
magnitude representations associated with those numerals.
Why would anyone hold this increasing compression hypothesis for the number sense? One reason is that it explains
the magnitude effect in number comparison tasks: when comparing two numbers, the larger the numbers, the slower
the response, for a fixed difference. So it takes longer to respond for {9, 12} than for {2, 5}. This is just what would
be expected if cognitive difference between the number sense representations associated with 2 and 5 is larger
than the cognitive difference between those associated with 9 and 12.
However, there is a rival explanation of the magnitude effect. On the rival view, the association between numerals and
number sense magnitude representations is not invariant. A numeral will not always get mapped to the same
magnitude representation; on the contrary over time there will be a distribution of magnitude representations to which
the numeral gets mapped, and the standard deviation of this distribution increases in proportion to the number
denoted by the numeral. This is known as the scalar variability hypothesis, as variability increases to scale. Here then
is the rival explanation of the magnitude effect: the ease of discriminating two numbers falls as the numbers get
larger, not because corresponding number magnitude representations become more similar, but because the variability
in the mapping increases. This means that the mapping gets noisier for larger numbers. More specifically, the overlap
of distributions of magnitude representations to which 9 and 12 get mapped will be greater than the overlap of
distributions of representations to which 2 and 5 get mapped.
Gallistel and Gelman (1992) give reasons to prefer the scalar variability hypothesis to the increasing compression
hypothesis. One reason is that the psychophysics of number and duration discrimination appear to be identical in
animals, and it has been shown that the mental representation of duration is a linear function of actual duration, rather
than a logarithmic (or other compressive) function of actual duration. Another reason is that if the mapping of
numerals to number sense magnitudes is linear, plausible modeling of rough operations of addition and multiplication
are possible; but if the mapping is logarithmic, it is difficult to model addition.
But Dehaene (1992, 1997) cites other results as reasons for preferring the increasing compression hypothesis. When
subjects are asked to produce random numbers in a given interval, they typically produce more small numbers than
large numbers (Baird & Noma, 1975). In a related experiment, subjects are presented with a series of numbers in a
given interval (not in order of numerical size) and asked to judge how evenly and randomly the numbers in the series
are drawn from the interval (Banks & Coleman, 1981). Here are a couple of such series drawn from
end p.123
integers between 1 and 2000. Which one seems to you the more random and the more evenly spread?
A: 879 5 1,322 1,987 212 1,776 1,561 437 1,098 663
B: 238 5 689 1,987 16 1,446 1,018 58 421 117
Most people find series B the more random and more even sample; in series A, large numbers seem overrepresented.
In fact, series A is the more evenly spread, with intervals of just over 200. In B, the intervals decrease exponentially.
This does not seem to me to be strong evidence in favor of the compression hypothesis. Smaller numbers may be
overrepresented in our mental urn, as Dehaene puts it, because of our greater use of and exposure to smaller
numbers. Another possibility is that we are unconsciously categorizing the numbers by digit length (number of digits in
the numeral), thereby giving equal weight to the set of 9 single digit numbers, the set of 90 double digit numbers, the
set of 900 three digit numbers, and the set of 9000 four digit numbers. With that weighting, series B is indeed more
evenly spread.
Dehaene (1992) also mentions a small informal number bisection test reported by Attneave (1962). Fourteen adults
were asked for a quick intuitive answer to the following: suppose that one is a very small number and a million is a
very large number; now give a good example of a medium-size number. Though 500,000 is midway, the median of the
bisections obtained was 100,000.

9

9. The mean was a little over 186,000. The standard deviation was not given. Attneave (1962).
On the assumption that the number sense magnitude representations were used in this task, this underestimation is
what would be expected on the increasing compression hypothesis. But there is an alternative explanation, which
dispenses with the somewhat implausible assumption that number sense magnitude representations are being used for
such a large range of numbers. Linguistic salience among the ordinary verbal number expressions may be the key to
explaining this result. Powers of ten are more salient than intervening numbers, and so 100,000 is likely to be the
largest mental pole of attraction short of one million. Scott and colleagues (2001) give indirect evidence of this.
Perhaps that is why the median of responses to Attneave's test was 100,000.
Dehaene (2003) cites a third piece of evidence as decisively favoring the increasing compression hypothesis. Neurons
have been found in the primate brain whose firing rate is tuned to specific cardinal numbers. A neuron for 3 would
respond optimally to a display of three visual objects, less to two or four objects, and not at all to one or five objects.
Nieder and Miller (2003) analysed the response curves of number neurons in two monkeys engaged in discriminating
the cardinality of two visually presented sets. Dehaene reports that these neural tuning curves, when plotted on a
linear scale, are asymmetrical and assume different widths for each number (in the range 1 to 5), but became simpler
when plotted on a logarithmic scale: they were fitted by a Gaussian with a fixed variance across the entire range of
numbers tested. Thus, the neural code for number can be described in a more parsimonious way by a logarithmic scale
than by a linear scale. Without some
end p.124
further assumption this is hardly decisive. The fact that curves look simpler when the data are plotted one way rather
than another is not enough to decide between competing hypotheses. As a criterion of theory choice, parsimony must
be applied globally to explanatory theories or models in the context of relevant background knowledge, rather than to
the presentation of data sets. It may in the future turn out that, all things considered, the most parsimonious
explanation of all the relevant data implies a logarithmic scale. But at present the question seems to me undecided.
What about the calibration of the two other kinds of mental number line, the association of number representations
with positions in egocentric space and the association of number representations with positions on a visual line? I do
not know of data that support the hypothesis that the numberspace association is increasingly compressed. If the
numberspace association is in play in number bisection tasks, we should expect to find normal subjects showing a
leftward shift, one which increases systematically with the size of the number interval, on the increasing compression
hypothesis. For example, if log to the base 2 is the function which describes the mapping of number representations to
spatial positions, 3 would seem (just over) midway between 1 and 8, 4 would seem midway between 1 and 16, and 8
would seem midway between 1 and 64. But when we look at the performance of controls in the number bisection tasks
in the studies by Zorzi and by Doricchi and their colleagues mentioned earlier, we find no systematic leftward shift
increasing with the number interval. Conditional on the assumption, which may be false, that number bisection
involves the numberspace association, this would be evidence against the increasing compression hypothesis for the
numberspace association.
The next and final section considers the nature of visual number lines; calibration of visual number lines will be
considered in that context. What can be said now is that visual images of number lines can display either kind of
calibration, regular or compressive; so for images the question does not arise.
6 Number Lines in the Visual Imagery System
We typically visualize a number line as a graphical line with numbers represented as positions on the line ordered from
left to right for individuals in Western cultures. But there are many possible variations. Are numbers marked in the
image with little vertical line segments across the horizontal line? Does an image of the corresponding arabic numeral
appear just above (or below) each number position? Or are just some number positions, say multiples of 5, thus
labeled, or none at all? Probably such details vary from person to person, and perhaps from one occasion to another
for each of us. What seems likely to be constant is that each number is represented by a position on the line relative
to a unique origin, that is, a left end (or zero-marked position for lines extending endlessly in both directions), and
that the size of the number is represented by the relative distance between the origin and the number position.
Moreover, the line is conceived of as endless to the right to cater for the fact that the numbers are closed under
addition and addition of a positive number is strictly monotone increasing.
end p.125
6.1 The Infinity of the Number Line
But a visual image of a line that is endless in one or both directions, an infinitely extended image, is surely impossible.
The field of visual imagery is bounded. This can be sensed in the following way. Imagine walking toward an adult
giraffe from the side; the visualized giraffe will seem to loom larger as you continue the mental walk until not all of it
can be visualized simultaneously; head and hoofs begin to overflow image capacity. This phenomenon has been
tested and confirmed: image size is constrained (Kosslyn, 1978).

10

10. Subjects had to visualize different sized objects one at a time, to imagine walking towards an object until apparent overflow, and
then to estimate their apparent distance from the object. It turns out that the larger the object, the farther away it seems at the point
of overflow. More precisely, the estimated distance at the point of apparent overflow matches the distance at which the object subtends
a visual angle just too large for its edges to be simultaneously visible. Kosslyn (1978).
So there will be an upper bound on initial segments of the number line that we can visualize when number marks
appear clearly and evenly separated by a fixed distance. But when we talk and think of the number line, what we have
in mind is an infinitely extended line.
To best resolve this apparent conflict, we should note a distinction between two kinds of representation in the visual
system. Here I follow Kosslyn's (1994) account, but any account of the visual system will need some version of the
following distinction. There are on the one hand category patterns and on the other visual images. A category pattern
is a set of related feature descriptions stored more or less permanently; a visual image is a fleeting pattern of activity
in the visual buffer, produced by activation of a stored category pattern. The category pattern can include a
specification that a line continue in a certain direction endlessly; but for a single momentary image generated by
activation of that category pattern, only a finite line segment will be represented in the image.
From a given category pattern, more than one image can be generated; in fact, a sequence of continuously deforming
images can be generated over an interval of time that we can think of as a single continuously changing image. When
the category pattern is activated, the category pattern descriptions become input for a system I will call the image
generating function.

11

11. Kosslyn (1994) refers to this as the mapping function from the pattern activation system to the visual buffer.
That function has additional parameter input variables corresponding to viewpoint, distance, orientation, and others.
These variables can be continuously changed, and when that occurs, the result will be continuously changing imagery.
Imagine a regular mug as you take it from an eye-level shelf where it is stored upside down with its handle to your
right and bring it to a position and orientation that allow you to look into it from above. In that case, the image
generating function acts on the category pattern descriptions for a regular mug with continuously changing parameters
for viewpoint, distance, and orientation. The visual imagery system box, modified from Kosslyn (1994), shows the part
of the visual system relevant here, omitting all arrows other than those involved in generating and transforming
images (see fig. 8.6). The
end p.126
Source: Adapted from Kosslyn (1994), fig. 11.1, p. 383.
FIGURE 8.6 The visual imagery system. Part of Kosslyn's model of the system for generating and
transforming images.
Source: Adapted from Kosslyn (1994), fig. 11.1, p. 383.
vertical downward arrows to the box for the image generating function indicate the inputs that constitute parameter
values.
Certain continuous changes of parameter inputs to the image generating function provide important image-
transforming operations, namely, scanning, zooming (in or out), and rotating.

12

12. Mental rotation was discovered by Shephard and Metzler (1971). For its role in visual imagery see Kosslyn (1994). Scanning and
zooming-in were investigated by Kosslyn. See the accounts of this work in Kosslyn (1980) and his recent theoretical account of these
transformations in Kosslyn (1994).
For example, if we want to locate a large number, say 132, we might scan the line until the image covers, say, the
interval from 100 to 150, and then zoom in on just the decade of the 130s, and then attend to the 2 in that
decade. Of course, these imagery operations of scanning and zooming-in cannot be what they are in visual perception.
In particular, they are not operations on a fixed image. Imagistic zooming-in is transforming the image by smoothly
changing one or more egocentric distance parameter values for the image generating function. This continuous change
of image has the subjective visual effect of moving toward the imaged object, thereby increasing resolution. Imagistic
scanning, like imagistic zooming-in, results from smooth change of a parameter value for position. The description in
the category pattern that the line has no right end ensures that rightward imagistic scanning will never produce an
image of a right end-stopped line. (Similarly, the category pattern activated may have a description that the line has
no left end, ensuring that leftward imagistic scanning will never produce an image of a left end-stopped line.) This is
what constitutes the infinite
end p.127
extension of the number line in the visual imagery system. There is no implication in this account that a visual image
can be infinite in extent.

13

13. The kinds of image transformation just mentioned are not exhaustive. Moreover, Kosslyn's system has another type of scanning,
one that does not result from continuous change of a parameter to the image generating function. This second kind of scanning
consists in shifting an attention window over the image, thereby enhancing different parts of the image over time. Shifting the attention
window to a certain part of the visual buffer increases the relative level of activity in that part, thereby raising the likelihood that the
information of that part will be further processed.
6.2 The Calibration of a Visual Number Line
Now let us return to the question whether the calibration of a visual number line is regular or increasingly compressed.
One possibility is that the category pattern specification for a number line entails that the number line has regular
calibration. But not all of the number line images produced by activation of the category pattern have regular apparent
calibration. Some images will have apparent calibration that matches a perspectival projection; so the calibration will
appear increasingly compressed. To see how this is possible, recall that a current visual image depends not only on the
activated category pattern but also on parameter inputs to the image generating function. One of these will determine
whether the image viewpoint is perpendicular to the number line or oblique. If perpendicular, the calibration appears
regular in the image; if oblique, the calibration appears increasingly compressed in the image. Which of these kinds of
viewpoint (perpendicular or oblique) is selected may depend, at least in part, on the size of the numerical interval set
by the task at hand. For small number intervals, a perpendicular viewpoint is likely to be selected, dictating regular
image calibration. For large number intervals, the imagery system could first zoom-out and then rotate, thereby
selecting an oblique viewpoint; the result would be an image in which the calibration appears increasingly compressed.
6.3 Idiosyncratic Visual Number Lines
A regular horizontal visual number line is, I conjecture, common to the overwhelming majority of people subject to
mathematical education similar to our own. But a small percentage of us form idiosyncratic spatial number system
representations. These may be calibrated curved lines with loops, strips with differently colored intervals for different
number intervals proceeding upward and rightward, complex spatial arrangements of the numerals in a combination of
lines and rectangles, and many more. It is not to be assumed that people with idiosyncratic number lines do not also
have a standard number linein fact, some report having both. It is very likely that most of us have several spatial
representations of the numbers at our disposal: numbers up to 12 or 360 arranged on a circle, positive and negative
numbers on a vertical axis, as well as the standard number line. But it is possible that some who report a vivid and
durable idiosyncratic number line lack a
end p.128
standard number line, and those who report having both may deploy their idiosyncratic line when others would deploy
a standard number line. Although the phenomenon has been known since Francis Galton's nineteenth-century
investigation, there has been little follow-up.

14

14. See Galton (1880) for the original studies. For excellent further investigation and discussion see Seron et al. (1992).
What we can say is that idiosyncratic number lines are not taught, and so the phenomenon attests to an innate
propensity to form a number line once a written numeral system is acquired.
7 Conclusion
According to the triple code model of Dehaene and Cohen (1995), the basic resources of numerical cognition in
educated adults are:
In addition we have three kinds of resource that sometimes go under the heading of mental number lines. These are:
The evidence for (4), the mapping of numerals to number sense representations, comes from number comparison
tasks using numerals. Comparison task data also show that the system of number sense representations has the
characteristics typical of quantity senses, namely the distance effect and the magnitude effect. The evidence for (5),
the mapping of number sense representations to positions in space, is the robust SNARC effect. Both kinds of mapping
depend on a cultural element (the numerals) and innate element. The innate element includes number sense
representations, which we share with some other animals; it also includes a propensity to represent ordered sequences
of things in a spatial line. Apart from the numberspace mapping, left to right association has been found for familiar
ordered sets of nonnumerical items, namely, months and letters (Gevers et al., 2003). In this case the direction of the
mapping is determined by a cultural factor. But it has also been found that we have a mental mapping of aural pitch
height to spatial positions running vertically, with notes of higher pitch mapped to positions above notes of lower pitch
(Rusconi et al., 2006). Here the cultural training is the musical education of the subjects, but it is not at all clear that
musical education (in particular
end p.129
the practices of musical notation) is what determines that higher pitch goes with higher spatial position rather than the
reverse.
Mappings of kind (6), visual number lines, are cultural products; but I conjecture that there is an innate disposition
operating here too. We may get our first exposure to physical number lines from calibrated rulers and printed graphs.
But what gave people the idea for a ruler? What gave Descartes the idea for coordinate axes? Perhaps it rests on the
standard horizontal numberspace association. A further indication that an innate propensity is operative in the
formation of visual number line representations is that a small percentage of us form idiosyncratic, hence untaught,
visual number lines. Our propensity to form a visual number line once we have acquired a written numeral system is
valuable in ways I do not have space to detail here. Such a propensity would be a special case of a disposition found in
highly innovative mathematicians to integrate numerical and spatial representations, a disposition whose fruitfulness is
beyond dispute.
end p.130
1. The natural language number expressions, written and spoken.
2. Our working system of visual numerals, such as the decimal place system.
3. An innate sense of cardinal size, the number sense.
4. A mapping of numerals (or number words) to number sense representations.
5. A mapping of number sense representations to positions in egocentric space.
6. A mapping of number sense representations to a visual representation of a horizontal line.
9 Modularity in Language and Theory of Mind
What Is the Evidence?
Michael Siegal
Luca Surian
One essential characteristic of the human species that permits effective communication is possession of the grammar
of language. Grammar is a powerful system that is critical for reducing mistakes in communicating information about
potentially threatening events that are remote in time and space. A second essential human characteristic is possession
of a theory of mind (ToM) involving the ability to reason about the mental states of otherstheir beliefs, desires, and
intentions and how these differ from one's own. Such reasoning is vital for the appreciation and transmission of culture
in the form of novels, theatre, and song, and more generally for the maintenance of family and social life.
In this chapter, on the basis of evidence from cognitive developmental psychology, cognitive neuroscience, and
behavioral genetics, we discuss parallels in the development of grammar and ToM reasoning. Both grammar and ToM
are acquired spontaneously and employed effortlessly by all typically developing children. We view these processes as
the product of modular systems. Although acquisition in typically developing children is characterized by the poverty of
the environmental stimulus, the developmental trajectory is impaired when little or no environmental input is received
within an early critical period. We then consider proposals that language, particularly certain aspects of grammar,
serves to support ToM reasoning. We conclude that, to some considerable extent, dissociations between ToM reasoning
and grammar are present in both childhood and adulthood, but that in typically developing persons, these systems
interact to support word learning and the emergence of specific cultural beliefs.
end p.133
1 Parallels between the Expression of Grammar and ToM Reasoning
1.1 Evidence for the Poverty of the Stimulus Argument in the Acquisition of Grammar
Language is acquired spontaneously without formal instruction. Indeed, newborns learn sounds for speech even while
they are sleeping (Cheour et al., 2002), and, in the first few years of human development, virtually all children acquire
the grammar of their native language. In fact, the grammar of language is mandatory in that we cannot stop ourselves
from acquiring it.
One influential view on language acquisition is that the manifestation of structure in children's language is triggered
only by exposure to a linguistic input that is highly limited and fragmentedan indication of the fundamental
innateness of grammar. According to the poverty of the stimulus account (Chomsky, 1980; Crain & Pietroski, 2001;
Laurence & Margolis, 2001; Newport, 1990; Pinker, 1994; Stromwold, 2000), the acquisition of grammar proceeds
automatically in a modular fashion, largely independently of nonverbal intelligence. Despite wide variations in their
language environment, children acquire aspects of grammar in a fixed order at about the same time in their
development. They make sense of a language input that is compatible not only with the grammar of their own native
language but with the grammar of many others. The errors that they do make are highly predictable and often reflect
what would be grammatical in another language (Crain & Pietroski, 2001). Further, it has long been established that
children are not corrected for the grammaticality of what they say but for its truth value (Brown & Hanlon, 1970).
That children's grammar unfolds largely in the absence of feedback on the grammaticality of their utterances is further
testament to its biological foundations.
Evidence from deaf children who are cut off from speech input corroborates this account. Pettito and Marentette
(1991) found that profoundly deaf infants of deaf parents display manual babbling using a reduced set of the phonetic
units in American Sign Language. These results support the view that babbling is tied to the abstract structure of
language rather than to input from the speech modality. Further analysis reveals that hearing babies exposed to the
sign language of their deaf parents produce low-frequency hand movements inside a tightly restricted space in front of
the body that corresponds to the signing space in signed languages (Pettito et al., 2001). Moreover, deaf children
display similarities in the linguistic structure of their gestural communication despite wide variations in the spoken
language of their hearing mothers. The spontaneous gestural communication of American deaf children resembles that
of Chinese deaf children rather than that of their own mothers (Goldin-Meadow, 2003; Goldin-Meadow & Mylander,
1998). It involves language-making skills that do not require a language model to be activated: segmenting words
into morphemes and sentences into words, setting up a system of contrasts in morphology, and building syntactic
structures. As shown in a recent study of sequential cohorts of Nicaraguan deaf children exposed initially to a highly
degraded language environment, children spontaneously create structures for
end p.134
language acting largely independently of adults (Senghas et al., 2004). Moreover, there appears to be a critical period
for language acquisition as, in rare cases when children are not exposed to any language whatsoever, they do appear
irreparably impaired in their later language learning (Curtiss, 1977; Grimshaw et al., 1998; Lenneberg, 1967).
Of course the poverty of the stimulus account of language acquisition is not one to which researchers universally
subscribe. There are those who emphasize that the input plays a role more important than the role assigned to it by
nativist theories (e.g., Cowie, 1998; Tomasello, 2003). Still, additional recent evidence now exists to counteract this
opinion.
In a recent ingenious series of experiments, Lidz, Waxman, and Freedman (2003) demonstrated that infants aged 16
18 months comprehend the syntax of the pronoun one even though the language environment does not contain
sufficient information to guide unaided learning. The infants in their investigation were shown images of single objects
on a television monitor such as a yellow bottle. Then in a test phase, the infants saw two objects from the same
category such as a yellow bottle and a blue bottle. In the anaphoric condition, they heard the phase Now look. Do
you see another one? whereas in the control condition, they heard Now look. What do you see now? As predicted,
infants looked significantly longer at the familiar item in the anaphoric than in the control condition, in which they
preferred to look at the novel item. Lidz and colleagues (2003, p. B72; see also Lidz et al., 2003) point out that such
results demonstrate that innate linguistic structure guides language acquisition since the linguistic input available to
the infants cannot unambiguously support anaphoric representations.
Consistent with the view that linguistic structure within the language learner is the main source of grammatical
knowledge, Gelman (2005) documents how young children have an understanding that nouns can be used to refer to
generic kinds rather than solely to specific instances and that this understanding is not guided by either perceptual or
linguistic cues. Rather, on available data, the expression of a system of generic terms appears to be driven by the
theoretical assumption on the part of the child that a noun is a generic term, unless the context dictates to the
contrary. Even deaf children from hearing homes who are without a language model express generic kinds in their
gestures (Goldin-Meadow, 2003).
1.2 Evidence for a Poverty of the Stimulus Argument in the Acquisition of a Theory of Mind
We propose that a poverty of the stimulus account extends to the course of ToM reasoning in young children.
Investigations of children's understanding that beliefs about the world may be true or false have centered on their
responses to tasks designed to determine whether they identify how a person with a false belief can initially be
deceived about the location of an object or the contents of a container. In particular, these tasks often take a format
similar to the Sally-Anne task, involving unexpected changes of locations for objects (Baron-Cohen et al., 1985), and
the Smarties task (Wimmer & Perner, 1983), involving misleading contents of containers. In the Sally-Anne task,
children are told about Sally, a story character
end p.135
with a false belief about the location of a marble. They are told that Sally has placed the marble in a box but, when
she is away, another story character called Anne has moved it into a different location. The test question concerns
where Sallywho has not witnessed the deception and therefore has a false beliefwill look for the marble. In the
Smarties task, children are shown a Smarties tube (M&M candies in the U.S.) that, when opened, is revealed to
contain pencils. The test question concerns what another personwho again has not witnessed the deception and
therefore has a false beliefwill think is in the tube.
On such tasks, all typically developing children appear to develop ToM reasoning at about the same timeby about
four to five years of agedespite wide cultural variations (Avis & Harris, 1991) as well as variations in the extent to
which they are exposed to talk about mental states of others (Wellman et al., 2001). In this sense, the expression of
ToM reasoning is parallel to that of grammar that is largely independent of wide variations in the extent to which
children are exposed to language input, assuming that some language input does exist. As would be expected, within
this narrow age range, performance on ToM tasks is facilitated when the actual verb used implies that the actor might
have a false belief, as is the case in languages such as Mandarin Chinese, Greek, Turkish, and Puerto Rican Spanish
(Lee et al., 1999; Maridaki -Kassotaki et al., 2003; Shatz et al., 2003). However, children who speak English and are
younger than four also, for the most part, succeed if the tasks are made more pragmatically explicit. For example,
asking Will Sally look first for her marble? enables most (though not all) three-year-old children to inhibit the
interpretation that the question refers to where Sally will have to look or must look for the desired object and instead
to interpret the question as intended to refer to the consequences of Sally holding an initial false belief about the
location of an object (Joseph, 1998; Nelson et al., 2003; Siegal & Beattie, 1991; Surian & Leslie, 1999; Yazdi et al.,
2006). These findings remain a challenge to the position that children's ToM develops and undergoes a form of a
theory revision that takes place broadly between the ages of three and four years (e.g., Gopnik & Meltzoff, 1997;
Perner, 1991; Wellman, 1990).
Just as children are not taught the concepts of noun, verb, and grammatical subject, they are not taught the concept
of belief. Instead, they receive information on the truth value of beliefdesire propositions in discourse involving
mental state terms. This process may trigger the understanding that the beliefs of others may not correspond to
reality. With regard to deaf children, Marschark and colleagues (2000) have shown that late signers aged 813 years
have the ability to attribute mental states correctly in generating stories about others with whom they have interacted
hypothetically in story situations. However, the late signing deaf children of hearing parents have difficulties on ToM
reasoning tasks even in adolescence (Russell et al., 1998). These difficulties persist even on versions of the tasks in
which the test questions are made pragmatically explicit (Peterson & Siegal, 1995). In contrast to late signing deaf
children, both normal hearing and native signing deaf children appear to enjoy an early conversational access that
triggers the expression of beliefdesire reasoning (Courtin & Melot, 2005; Woolfe, et al., 2002).
This pattern of results supports a poverty of the stimulus argument for the acquisition of the ToM reasoning in normal
children. This acquisition hinges on
end p.136
receiving the requisite environmental input within a critical period in early developmenta process that parallels the
acquisition of grammar. Just as children require some minimal access to language for grammar to develop, they
require at least some minimal access to conversational opportunities to display normal ToM reasoning skills.
Conversational experience alerts children to the concept that others' beliefs can differ from their own and be false.
Through this experience, they come to recognize that speakers are epistemic subjects who store and seek to provide
information about the world, allowing access to a world of referents and propositions about intangible objects and
creating the potential for imagining the past and future (Harris, 1996; Harris et al., 2005).
Therefore, as is the case for language, the developmental trajectory of ToM reasoning is affected when the requisite
input is not received within an early critical period, as is the case for late signing deaf children, and possibly as well
for some nonvocal children with cerebral palsy who also cannot easily engage in conversation that necessarily involves
mental states (Dahlgren et al., 2003).
A relevant finding is that many children with autism also display protracted difficulties on ToM tasks (Baron-Cohen et
al., 1985). This difficulty appears to derive from a deficit in forming and employing efficiently a metarepresentational
competence, rather than a difficulty in correctly interpreting the test questions (Surian & Leslie, 1999). However, the
metarepresentional deficit may not be primary in nature but may result from difficuties in more basic processes, such
as the ability to represent attentional states as conveyed by information in the visual (Baron-Cohen, 1995; Milne et al.,
2002) or auditory modalities (Kuhl et al., 2005; Siegal & Blades, 2003). On the latter account, autism in a significant
number of cases may reflect an impairment in processing or attending to auditory information that prevents full
engagement in conversational exchanges and contributes to the preferential interest of persons with autism in objects
and physical causality rather than in people and psychological processes. In such circumstances, children can be
persistently impaired in appreciating that the minds of others contain a store of epistemic mental states, including false
beliefs and beliefs that differ from their own.
In this sense, it is instructive to contrast instances of concrete objects, for example bees, with instances of mental
state concepts, such as beliefs and other mental states. As indicated by research on the late signing deaf, children
who are without early language access can point to bees in the external worldand to concrete referents in generalin
order to communicate messages. However, without the means to use language in conversation, late signing deaf
children cannot easily rely on such ostensive acts to point and communicate about false beliefs. To be able to grasp
the concept that beliefs can be true or false and thus to attribute accurately the content of a false belief, they need to
exercise their capacity to inhibit the prepotent response that arises from a very simple ToMone that operates under
the premise that beliefs and reality truly correspond, as these very often do (Fodor, 1992). This process takes place
within conversational exchanges with others about the nature of the inner worlds of mental states. Children in
conversation are regularly faced with situations in which speakers may hold different beliefs or perspectives. Indeed, to
participate appropriately in conversation, children have to keep these differences in mind (Clark, 1997). Their full
expression, as shown on
end p.137
ToM reasoning tasks, may require extensive exercise, and it is just the daily involvement in conversation that may give
children the opportunity to practice the inhibitory skills required in false belief tasks (Leslie, 2000).
2 Language and Theory of Mind: What Is the Relationship?
2.1 Does Grammar Provide the Representational Template Necessary for ToM Reasoning?
It has been widely reported that there is a correlation between ToM and language as shown on measures of grammar,
as well as knowledge of vocabulary and semantic word usage, both in typically developing children (Astington &
Jenkins, 1999) and in children with autism (Happ, 1995). However, despite this well-documented relationship, the
nature of the languageToM relationship, and the extent to which language influences ToM reasoning, as shown in
performance on false belief tasks, remains controversial (Harris, 2005). Some have claimed that the grammar of
language enables children to entertain propositions that involve the simultaneous representation of alternative states of
affairs such as the consequences of behavior by individuals who hold true or false beliefs (Astington & Jenkins, 1999;
Perner, 1991; Smith et al., 2003). More specifically, others have maintained that it is the acquisition of sentence
complementation in the grammar of language that enables children to reason out solutions to false beliefs (de Villiers
& Pyers, 2002). By this account, ToM reasoning is dependent on the possession of syntactic structures such as those
that permit the embedding of false propositions within true statements (Mary knows that John [falsely] thinks
chocolates are in the cupboard).
However, it is likely that neither of these hypotheses fully captures the contribution of language to theory of mind. It
may be that a certain level of syntax and semantics is necessary for ToM performance but, nevertheless, many young
children are adept at syntax and semantics but still do poorly on ToM tasks. Although Hale and Tager-Flusberg (2003)
and Lohmann and Tomasello (2003) have reported success at training ToM performance with exposure to instruction
on sentence complementation, Ruffman and colleagues (2003) report evidence that ToM reasoning is related to general
language ability rather than to specific aspects of syntax or semantics. Moreover, as Lohmann and colleagues (2005)
recognize, training studies on sentence complementation may in fact involve exposure to discourse that may foster
conversational understanding, which in turn promote success on false belief tasks.
As has been previously noted (Astington & Jenkins, 1999; Custer, 1996; Woolfe et al., 2002), three-year-olds who fail
ToM tasks spontaneously produce sentence complements in their speech. They correctly answer questions that involve
comprehension of sentence complementation if these take the structure [person]-[pretends]-[that x] (e.g., He
pretends that his puppy is outside). By contrast, three-year-olds do poorly when given sentences based on the form
[person]-[thinks]-[that x] (e.g., He thinks that his puppy is outside). Both use the same complements, yet children
only pass when pretend is used. More recently, evidence from both Cantonese- and German-speaking children has
yielded no support for a link
end p.138
between understanding of sentence complements and ToM reasoning (Cheung et al., 2004; Perner et al., 2003). Given
these considerations, the syntax of sentence complementation falls short of providing a complete account of ToM
performance (Harris et al., 2005).
Converging evidence comes from studies of adults who have become aphasic following brain damage and have lost
grammar though retaining their ToM reasoning ability (Varley & Siegal, 2000; Varley et al., 2001). Though such
patients have a language-configured mind that could be seen to support ToM development, their performance is
consistent with the dissociation between grammar and ToM in childhood. Finally, there are many instances of sign
languages and spoken Aboriginal Australian languages in which there is no sentence complementation (M. A. Baker,
personal communication). Instead of clausal complements such as John told everyone that Mary washed the car,,
users of such languages instead employ clausal adjunct forms such as Mary having washed the car, John told
everyone (it). If complementation were necessary to instantiate ToM reasoning, no ToM would be possible in these
language groups.
Grammar may thus be seen as a coopted system that can support the expression of ToM reasoning but whose
possession does not guarantee successful performance on ToM tasks (Siegal & Varley, 2002; Siegal et al., 2001).
Moreover, the albeit controversial success of 13- to 15-month-old infants on non verbal ToM tasks involving measures
of visual attention suggests that, although grammar may be useful to support belief-desire reasoning processes once
language is acquired, it is not essential to ToM reasoning (Onishi & Baillargeon, 2005; Scott & Baillargeon, 2006; Song,
2006; Surian & Caldi, 2005). Rather than an ability dependent on grammar, ToM reasoning in young children is
triggered, tuned, and speeded up by engagement in conversation about mental states contents, such as what speakers
want, pretend, and believe.
2.2 Is Comprehension in Conversation Achieved by a Specialized Submodule of ToM?
Since access to opportunities to converse about mental states appears to be pivotal in the expression of ToM, it is
important to examine links between ToM and comprehension in conversation. To this end, it is useful to distinguish
between links at two levels: functional and ontogenetic. At the functional level, we discuss how and to what extent the
ToM module is involved in conversation and how it connects with other cognitive components to allow the successful
interpretations of communicative acts and the production of context-appropriate utterances. In relation to the
ontogenetic level, we aim to sketch how the acquisition of ToM relates to the development of communication skills.
2.2.1 Considerations at the Functional Level
There is a consensus that beliefs about beliefs, termed metarepresentations, are necessary to human communication,
particularly to inferential communication (Grice, 1989). If we posit that metarepresentations are the output of a
specialized mechanism, the ToM module, then the necessary and central role of the ToM module in conversation is
apparent,
end p.139
and it is well recognized in current theories of human communication (Sperber, 1996). The claim that
metarepresentations are necessary for many aspects of human communication is in line with commonsense intuitions
about conversational processes, but there is also abundant empirical support for it that comes from normally
developing children and persons with autism. For example, persons with autism frequently show a deficit on ToM
reasoning, and this deficit is associated with their ability to detect violations of the Gricean conversational maxims
(Surian et al., 1996) and understand figurative language such as metaphors and irony (Happ, 1993).
Therefore the ToM module is a necessary component of normal conversational competence in school -age children and
adults. Sperber and Wilson (2002) have recently proposed an interesting analysis of the relation between ToM and
conversation that goes beyond this view. According to their analysis, pragmatic inferences involved in communication
are computed by a specialized submodule that belongs to the human mind-reading system, rather than by the same
ToM mechanism that is used to make sense of, and to predict, actions in general. To support this proposal, they
emphasize that such inferences are usually processed in a fast and unconscious way, are drawn even by preverbal
infants, and concern a specific type of input. Moreover, the domain on which they operate is quite special.
Communicative acts exhibit a peculiar character: they can be about an infinite range of (informative) intentions. By
contrast, the intentions underlying others' actions, as portrayed in ToM tasks, seem comparatively limited, given that
real-world constraints apply to real actions, but not to the semantic content of communicative acts. There is so much
more that you can say than what you can do, or even try to do. In addition, inferences of this sort have certainly been
part of the human social interaction for long enough to make it plausible for a specialized submodule to have been
selected, given its adaptive value. On this view, there is a dedicated module to retrieve a speaker's meaning that is
part of a larger ToM module. The proposal that there is dedicated submodule to retrieve a speaker's meaning is a
departure both from Grice and from Sperber and Wilson's (1986) own previous work.
We wish to discuss two main claims of the Sperber and Wilson (2002) model. The first is that human communication
is, at least in substantial part, inferential in nature rather than code-like as it involves a great deal of
metapsychological processing. The second claim is that such metapsychological processing is the job of a specialized
modular subsystem dedicated to pragmatic inferences.
We agree with the idea that human communication is in large part inferential, but we do not think that the evidence
for this comes from the research showing the involvement of ToM in communication. Metarepresentations are a
necessary requirement not only in inferential models of communication but also in many code models of
communication, such as the mutual knowledge model proposed by Clark and Marshall (1981; see Sperber & Wilson,
1986, pp. 1521, for discussion). The critical difference concerns the presence of a shared coding-decoding system,
which is required only by the coding view. On this view, verbal and nonverbal communication are based on the shared
knowledge of verbal and nonverbal codes,
end p.140
respectively. By looking at me and then looking at the door, the speaker sends me a coded message that I will decode
as She thinks it is time to go, and this interpretation may be achieved without many assumptions about the
rationality of the speaker. By contrast, on the Gricean inferential view, no coding systems are necessarily involved on
this exchange, and the correct interpretation is derived by drawing a series of contextual deductions based on the
assumption that the speaker is cooperative and rational and that she treats the addressee as a rational agent.
However, since both views require metarepresentation, the involvement of metarepresentation does not allow us to
decide between the two alternatives.
Rational processes play an important role in the Sperber and Wilson model of utterance interpretation. Hearers choose
the most accessible interpretation among those that are contextually plausible or available. They do this by using a
rational procedure that follows a path in which the effort required in constructing an appropriate interpretation is
minimized. The chosen interpretation may not necessarily be the correct one, but it is the most rational interpretation,
given an expectation of relevance. However, the claim that hearers choose a specific interpretation is somewhat at
odds with the claim that they often stop at the first interpretation they construct because this interpretation satisfies
the expectation of optimal relevance. If a hearer does not even access or represent other contextually possible
implications of the speaker's utterance (Sperber & Wilson, 2002, p. 20), then her choice of one interpretation is in
the eye of the observer rather than the hearer's head.
The evidence from the infant literature is also suggestive at best, but certainly not conclusive. While preverbal infants
communicate rather efficiently in ways that seem to exploit metapsychological resources, children may achieve their
communicative success without the need to represent others' beliefs and desires (see Gergely et al., 1995; Gergely &
Csibra, 2003). Demonstrations that preverbal infants can use metarepresentations remain controversial (Onishi &
Baillargeon, 2005; Perner & Ruffman, 2005; Scott & Baillargeon, 2006; Song, 2006; Surian & Caldi, 2005). Even if we
insist that infants are indeed capable of inferential communication, then true inferential communication might be
achieved without the need for metarepresentational skills. This, of course, does not rule out that infants can evaluate
the rational grounds of some actions (Csibra et al., 1999; Gergely et al., 1995). But it is one thing to evaluate the
rationality of an action, given some biomechanical constraints, and it is another to evaluate the rationality of a
communicative act or to interpret it assuming the rationality of the speaker or its optimal relevance. In the latter case,
since both costs and benefits are mostly defined in cognitive terms, it is hard to point out how one could do it without
metarepresentations.
Turning to Sperber and Wilson's second claim, the idea that there is a submodule of the mind-reading system that is
dedicated to pragmatic inferences may sound somewhat peculiar, since modules are mechanisms that are relatively
context independent, and pragmatics concern aspects of conversational competence that deals with the relationship
between utterances and the communicative context. Nevertheless, this idea is appealing to those who are seriously
concerned with the speed and accuracy with which utterances are constructed and interpreted in real-life conversation.
It is speed, automaticity, and domain specificity, rather than
end p.141
informational encapsulation, that underscore the modular nature of the mechanism envisaged in Sperber and Wilson's
proposal. On the one hand, speed may just be the result of good automatization reached by means of practice (Bloom,
2002). On the other hand, speakers appear to be very fast in odd situations that are very different from what they
have encountered before. These achievements point neither to a practice effect nor to complex inferential chains like
those envisaged by Grice as the basis for following the implications of conversations.
Consider, as a working example, the case of scalar implicatures (Carston, 1998). If a speaker says I have two
brothers, the hearer would usually interpret this as implying exactly two brothers. Or if one says x might be y,
adult speakers tend to rule out that this can be interpreted also as x must be y, despite the logical compatibility of
such interpretation. Noveck (2001) found that children did not show the same bias, suggesting that children have
difficulties in drawing conversational implicatures. An investigation about the sources of such difficulties showed,
however, that when the goal of the experimenter is made clearer, via a short training on pragmatic evaluation of
utterances, children's peculiarities are drastically reduced (Papafragou & Musolino, 2003). In a Gricean perspective, the
hearer assumes the speaker's cooperativeness, and rules out that she means at least two brothers or x must be y,
even if logically such interpretations are indeed compatible with the speaker's utterance. Such interpretations are very
often explicitly expressed when subjects are asked to correct an infelicitous utterance. In Papafragou and Musolino's
experiments, six-year-olds corrected the speaker who said Some horses jumped over the fence when indeed the
case was that all horses jumped the fence. However, more evidence is needed to support the claim about the
psychological reality of conversational implicatures.
Suppose that we present a six-year-old child with a set of six clowns; three are happy and three are sad. In each
subset, one has a blue flower, one has a red flower, and one has no flowers. If we ask the child Give me the happy
one, the child will choose the happy clown without flowers (Surian & Job, 1987). This choice could be the outcome of
children's ability to draw a conversational implicature, roughly summarized as He did not say anything about flowers.
He would have said something about it if he intended to point to a clown with flowers. I can assume that he is
cooperative and he genuinely wants me to know which clown he intends to refer to. Therefore he must mean the
clown with no flowers. In these settings, the nonverbal context ensures that the child readily accesses potential
alternatives and chooses between them. But is this really a choice based on the recognition of an implicature? An
alternative, more likely process may have simply involved the construction, in the child's head, of a happy clown face
or the representation of the feature happy and then a search for the best match in the contextually available objects.
This match is found with the object that exhibits the mentioned feature but no other salient features that were found
in the other alternatives. A computation of relevance may have indeed guided the child toward a specific (correct)
referent with no need for complex counterfactual reasoning concerning what the speaker would have said had he
wanted to refer to a different clown. The speed and accuracy with which even young children perform such a task does
not support a long and reflexive process but is instead coherent with Sperber and
end p.142
Wilson's proposal about a dedicated pragmatics submodule. However, the pragmatic inferencing posited by relevance
theory is drastically reduced compared to the inferencing required in the traditional Gricean view (for further
discussion, see Siegal & Surian, in press).
A similar process may happen when the child is tested for the use of the mutual exclusivity constraint in word learning
(Markman et al., 2003). The child is presented with a couple of objects, one familiar (e.g. a banana) and one
unfamiliar (e.g. a whisk) and is asked Show me the fendle. The child, even a preschooler, would readily point to the
unfamiliar object. One way of explaining such success is to assume that the child will reason in this way (Bloom, 2000,
p. 68):
I know that a banana is called a banana.
If the speaker meant to refer to a banana, she would have asked me to show her the banana.
But she didn't, she used a strange word, fendle.
So she must intend to refer to something other than the banana.
A plausible candidate is the whisk. Fendle must refer to the whisk.
Bloom maintains that this explanation is roughly what previous research on the use of the mutual exclusivity constraint
seemed to support and that it makes unnecessary premises that are specific for word learning. Children, by contrast,
appear to perform similarly when they need to infer many of an object's properties, not just its name (Diesendruck &
Markson, 2001). It is, however, possible that children use a matching strategy both when they are inferring the
name and when inferring the function, or other properties, of an object. This matching strategy, as in the clowns'
tasks, may not require the level of mental state attribution and pragmatic reasoning outlined by Bloom. It would be
more similar to the explanation based on the mutual exclusivity assumption, as summarized by Markman and
colleagues (2003): That can't be a fendle, it's a banana.
To summarize, the main reasons to be skeptical about the existence of an early emerging pragmatic module are that
(1) there is, at best, only preliminary evidence that preverbal infants can draw such metapsychological inferences; (2)
the adaptive value of such a putative module is obviously not a demonstration of its existence; and (3) the
unconscious or nonreflexive nature of (some) pragmatic inferences is coherent not only with a modular cognitive
component but also with an automatized process, based on experience, as suggested by Bloom (2002).
We suggest that, while some pragmatic inferences are drawn by more general inferential processes involved in ToM
reasoning, others are drawn by dedicated modules. In other words, we propose that there is not a single pragmatics
module but rather a set of highly specialized modules for some instances of pragmatic inferencing. For example, the
inferences involved in reference resolution when the direction of the speaker's gaze is available may be carried out by
a dedicated submodule that takes, as its input, the output of the eye direction device postulated by Baron-Cohen
(1995) and the output of the language module. We see no convincing reason to think that the mechanism dedicated to
this kind of computation should also be the same mechanism that is exploited to interpret sarcasm and metaphors.
end p.143
To make the submodule envisaged by Sperber and Wilson work, one also needs to endow it with access to a very wide
sort of information, since potentially any piece of information may be relevant to utterance interpretation. This,
however, is not compatible with domain specificity and informational encapsulation. A family of more specialized
submodules may turn out to be more efficient, especially at the beginning of intentional communication.
Selective impairment is one indispensable source of evidence for modularity (Fodor, 1983). We have now a substantial
body of evidence showing selective impairment of ToM, not only in children with autism but also in brain-damaged
populations. Following damage to the right hemisphere, many adult patients have difficulty on ToM reasoning tasks,
though they retain grammar in their language (Happ, et al., 1999; Surian & Siegal, 2001). By contrast, following
damage to the left hemisphere language centers, many patients become aphasic, as shown by loss of grammar,
though they retain ToM skills (Varley & Siegal, 2000; Siegal et al., 2001). The known selective breakdowns, either in
impaired adults or in atypically developing children, do not provide clear evidence to support the idea of a dedicated
pragmatic module. To date, there are no reported cases in which difficulties in inferential communication are
accompanied by intact noncommunicative mind-reading skills such as those tested in false belief tasks. The absence of
relevant evidence, however, may simply be due to the fact that the methods used so far were not suitable to detect
such a selective impairment. Sperber and Wilson's (2002) proposal may foster future studies that are specifically
designed to assess the predicted dissociation, consistent with a modular model of ToM (Baron-Cohen et al., 1985;
Leslie, 1987).
2.2.2 Considerations at the Ontogenetic Level
The research on ToM reasoning in deaf children suggests that access to conversation is necessary for the development
of ToM. One way of portraying the role of conversation in the development of ToM is to see conversation as the
situation in which children are provided with crucial input to learn what a mental state is, what kinds of mental states
people can entertain, and how they come to entertain them. This process would lead them to abandon immature
theories of actions that include teleological, but not mentalistic, concepts. The poverty of the stimulus account of the
expression of ToM reasoning is contrary to such a view. We argue for an alternative hypothesis that recognizes both
the presence of a rich innate competence and the necessity for specific experiences during a critical period. In our
view, conversation is a powerful, perhaps the most powerful, source of exercise of metarepresentational skills. During
conversation, speakers are required to constantly update their representations of their interlocutors' minds and to infer
their informative and communicative intentions (Grice, 1989). In comparison, the mind reading required in the
interpretation of daily actions appears to be a much less demanding task. If children are deprived of this exercise, they
may be prevented from strengthening the links between ToM and central processes systems or the pathways required
to access ToM representational resources; this, in turn, would result in poor performances on ToM reasoning tasks;
however, such impairment would not be as severe as the metarepresentational deficit reported in most autistic
children. This view can account for the experimental
end p.144
evidence of late signers' difficulties in ToM tasks, which are very similar to the difficulties reported in autistic children,
and also the naturalistic reports of deaf children's ability to establish friendships and enjoy social interactions, which
are in contrast with the poor social abilities reported in autistic children.
Evidence from infant communicationfor example, referential pointing and shared attention activitiesdoes not
necessarily reflect a metarepresentational understanding, but rather the ability to infer intentions, goals, and possibly
perceptual or attentional states. Children display an actional understanding of agents (Leslie, 1994) when they are
involved in early forms of intentional communication, not necessarily a cognitive understanding. Although early forms
of metarepresentation may not be necessarily tied to verbal communication (i.e., conversation), it is with verbal
conversation that the communicative activity comes to a level of sophisticated comprehension and complexity
unmatched by other forms of animal communication. Competent participation in conversation requires continuous and
very fast computation of participants' mental states. It is therefore only with such activity that humans are required to
exploit and instantiate fully their metarepresentational skills.
3 The Genetics of Language and ToM
Relevant evidence for the dissociation between grammar and ToM comes from studies in early infancy on the
precursors of language. Holowka and Petitto (2002) videotaped 10 babies aged 5 to 12 months as they were acquiring
either English or French. They scored 150 randomly selected segments of babbles, nonbabbles, and smiles in terms of
left, right, or equal mouth opening and found that babbling was accompanied by right mouth asymmetry, whereas
smiling was accompanied by a left mouth asymmetry and nonbabbling by equal mouth opening. The left hemisphere
cerebral lateralization for language indicated by right hemisphere asymmetry in babbling clearly indicates that left
lateralized language functions in adults is present very early in human development before the actual onset of speech.
These findings are in keeping with neuroimaging studies that point to left hemisphere dominance for speech perception
in the first year of life (Dehaene-Lambertz et al., 2002), though greater plasticity in the maturing brains of infants may
allow for more right hemisphere substitution than in adults following right hemisphere damage (Dehaene-Lambertz et
al., 2004).
Given the presence of this very early asymmetry, is there support for a genetic basis to the left hemisphere structures
that control language? Using a sample of 10 monozygotic and 10 dizygotic twins, Thompson and colleagues (2001)
sought to investigate the influence of individual genetic differences on brain structure, as shown on three-dimensional
maps constructed from magnetic resonance images (MRI). Despite the underpowered sample size, they found highly
significant heritability in the asymmetry of Broca's and Wernicke's language areas in the left hemisphere.
Similarly, twin studies using large samples have revealed substantial nonoverlapping genetic influences on phenotypic
measures of language and nonverbal intelligence in infancy and early childhood (Dale et al., 2000; Price et al., 2000),
although the genetic overlap may be greater at later ages (Colledge et al., 2002).
end p.145
In the case of specific language impairment, both behavioral and molecular genetic research indicate powerful and
enduring genetic influences on grammar in both children and adults that nonetheless can spare nonverbal intelligence
(Dale et al., 1998; Lai et al., 2001; Spinath et al., 2004; Van der Lely et al., 1998; for a perspective on the
significance and limitations of such findings, see Bishop, 2003). Neuroimaging experiments using functional MRI (fMRI)
have been carried out involving members of the KE family who have a language disorder caused by a mutation in the
FOXP2 gene. These have shown that affected family members display underactivation in Broca's area on a verb
generation task compared to unaffected members who have a typical left-dominant pattern of activity in this area,
pointing the critical involvement of the FOXP2 gene in the neural substrate of language (Ligeois et al., 2003).
Few studies have directly investigated genetic influences on language and ToM. In a study of three-year-old twins,
Hughes and Cutting (1999) reported substantial nonoverlapping genetic influences on measures of verbal intelligence
and ToM reasoning such as the Sally-Anne task. By contrast, in a twin study of five-year-olds, Hughes and colleagues
(2004) found that environmental factors explained most of the variance on advanced ToM reasoning measures and
that the only genetic factors that influenced ToM were those that were shared with verbal ability. However, in the latter
investigation, children were given second-order ToM story tasks about a character's beliefs about another character's
beliefs. In the predominantly low socioeconomic status sample tested, children need to have attained sufficient verbal
ability to bear in mind the complex premises of each second-order task and to reason successfully. Such reasoning
ability may be more likely to be influenced by the social environment of family, peers, and schooling than is the case
of the simple first-order ToM reasoning of three-year-old preschoolers.
4 Mental Modularity and Cultural Diversity
The evidence that we have reviewed indicates that grammar and ToM reasoning are the product of mechanisms that
are modular to a significant degree. Dissociations between grammar and ToM performance on cognitive tasks, and
precursors in infant behavior, provide evidence for cognitive modularity. Neuroimaging and patient lesion studies that
demonstrate dissociations between grammar and ToM in brain activation and function provide evidence for neural
modularity. There is also evidence for genetic modularity, insofar as there is a strong genetic basis for the left
hemisphere language structures and performance on measures of verbal intelligence that do not overlap with
performance on measures of nonverbal intelligence. Genetic studies also highlight a largely nonoverlapping genetic
basis for grammar and ToM, as shown on false belief reasoning tasks. Thus, while there is no necessary connection
among forms of modularitya dissociation between grammar and ToM on cognitive tasks need not have a
corresponding dissociation at the level of the neural or genetic substrate (Coltheart, 1999)data exists and continues
to accumulate to support the specialized, modular nature of grammar and ToM at all three levels.
On the basis of this research, our view is that there are commonalities in the capacity forand emergence of
grammar and ToM, in that both represent the
end p.146
elaboration of innate processes that are achieved automatically and effortlessly by typically developing children. In this
sense, grammar and ToM can be seen as parallel modular systems that come together to provide a foundation for the
transmission of culture (Sperber, 1996).
Drawing a parallel between the expression of grammar and the expression of ToM creates insight into the nature of
culture in relation to universals in cognition. Humans, regardless of culture, acquire a grammar. Culture determines the
specific nature of the native grammar to be acquired. Similarly, all humans, regardless of culture, acquire the concept
that beliefs may be true or false once this understanding is triggered by exposure to conversations that are, for the
great part, about the mental states of others, including what they fear, want, know, and believe. Culture influences the
specific beliefs that people hold about the minds of others and shapes noncore aspects of ToM (Scholl & Leslie, 1999).
In this sense, ToM and grammar emerge as autonomous domain-specific systems that normally come online at set
times in development, despite wide variations in the environment. These systems interact to support word learning and
the acquisition of specific beliefs. In tandem with cues from the grammar of language, ToM, in the form of the ability
to interpret others' intentions, contributes substantially to how children learn the meanings of words (Bloom, 2000;
Diesendruck & Markson, 2001). For example, Gelman and Ebeling (1998) gave children aged two to three years
drawings of various nameable objects (e.g. a man). Each drawing was described as illustrating a shape that was
created intentionally (e.g. someone painted a picture) or accidentally (e.g., someone spilled some paint). Participants
were simply asked to name each picture. Children used shape as the basis for their naming primarily when the shapes
were intentional and substance (paint) primarily when the shapes were accidental. In this way, they displayed evidence
of the sharing of the speaker's viewpoint in conversation that is vital for effective communication.
With the support of grammar and ToM, children acquire the specific lexicon and beliefs of their community. These
languages and beliefs are encrypted to be accessible to those within a cultureand function to protect it. As Baker
(2001) remarks (see also Sperber, 1996), the parameters of variation in language in particular and in culture more
generally have many of the same properties as engineered codes and ciphers (with a secret key), insofar as these
properties function to conceal a message, rearrange its parts, and replace its symbols at different levels of structure.
More generally, the factoring of language into a universal grammar available to everyone and parameters encrypted to
be accessible to the few suggests that language variation is not an evolutionary accident. Instead, it is part of the
inherent design specifications for communication that serves the function of producing messages that are easily
understandable by the intended audience but not by those outsiders who may attempt to listen. Modular systems in
theory of mind and grammar interact to form the basis of problem-solving resources children use to acquire words and
culture, but their autonomy is reflected in the domain-specific breakdown of function following brain lesions in
adulthood.
One of the hallmarks of human culture is that it specifies what people value, what they take seriously in their daily
lives, and what they will fight for. These considerations play a pivotal role in humans' decisions to include or exclude
others
end p.147
in their groups (Premack & Hauser, 2001). Founded on the capacity for grammar and ToM, enculturation involves
specific languages and beliefs that are encrypted to be easily accessible only to those within a culture. The human
mind possesses the capacity to marshal a series of autonomous modular systems such as grammar and ToM. In this
way, as we have previously maintained (Siegal & Varley, 2002), the human processing system came to acquire an
unprecedented sensitivity to cultural variations and a functional architecture in which the sum is greater than its parts.
end p.148
10 Culture and Modularity
Dan Sperber
Lawrence Hirschfeld
1 The Causal Chains of Culture
Members of a human group are bound with one another by multiple flows of information. (Here we use information in
a broad sense that includes not only the content of people's knowledge but also that of their beliefs, assumptions,
fictions, rules, norms, skills, maps, images, and so on.) This information is materially realized in people's mental
representations, and in their public productions, that is, their cognitively guided behaviors and the enduring material
traces of these behaviors. Mentally represented information is transmitted from individuals to individuals through public
productions. Public representations such as speech, gestures, writing, and pictures are a special type of public
productions whose function is to communicate a content. Public representations play a major role in information
transmission. Much information, however, is communicated implicitly, that is, without being publicly represented.
Information can also be transmitted without being, properly speaking, communicated, even implicitly, as when one
individual acquires a skill by observing and imitating the behavior of others.
Most information that is transmitted among humans is about local and transient circumstances, and is not transmitted
beyond these. Some information of more general relevance, however, is repeatedly transmitted, and propagates
throughout the group. Talk of culture (whatever the preferred definition or theory of culture) is about this widely
distributed information and about its material realizations inside people's minds and in their common environment (see
Sperber, 1996).
One can study cultural phenomena in two main ways. One can interpret them, that is, try and make their contents
intelligible to people of another culture, or more intelligible to members of the culture in which these phenomena
occur, as do anthropologists and historians. One may also try and explain causally how
end p.149
these cultural phenomena emerge, stabilize, and evolve. Both approaches are, of course, legitimate and
complementary. Can they be pursued independently of one another? Sperber (1985b) has argued that while it is
possibleand indeed commonto adopt an interpretive stance with little or no concern for causal explanation, it is
impossible to adopt a causal-explanatory stance that does not rely to some degree on interpretation: the
characterization of cultural phenomena cannot be achieved without interpreting them, that is, without attending to the
mental and public representations of the people involved. The same behavior, say, eating a certain meat, can be a
ritual action, a breach of religious prescriptions, or an ordinary meal, according to people's representations, and in each
case the causal explanation of the behavior should be different.
To interpret a cultural phenomenon, and in particular a cultural representation, it may be enough to study its contents
without paying much attention to its material realizations. Thus a religious dogma, a law, or a folk tale can be
paraphrased, summarized, or submitted to exegesis without studying the processes involved in its public
communication or in its mental representation. Not so, however, when the goal is to explain the causes and effects of
cultural phenomena, for only material realizations have causal powers. Different material realizations of the same
content (for instance, oral vs. written transmission of a folktale) go together with different patterns of social
distribution, hence different cultural status, and in the end tend to favor different evolutions of the content itself.
If one wants to explain, for example, why an oral tradition tale such as Tom Thumb has propagated throughout
Europe, generation after generation, while so many other stories (told, for instance, by one mother for the edification
of her children) have failed to generate any tradition, one must consider the very process of oral transmission, which is
made of a vast number of public and mental microevents. An oral tradition tale corresponds to a causal chaining of
public narratives and remembered mental stories, a fragment of which can be schematically represented as in figure
10.1. (In fig. 10.1, as in figs. 10.2 4, oval boxes represent mental episodes, rectangular boxes represent public
episodes, and arrows represent causeeffect relationships among these episodes.) What makes Tom Thumb a folk tale
is the fact that, in a long and spread-out causal chain, almost every public representation of the tale has engendered
mental representations, and a sufficient proportion of these mental representations have in turn engendered public
representationsor else the tale would never have reached a cultural level of distribution.
In order better to understand the process, links in this causal chain can be magnified, as in figure 10.2. Every
individual who has played a role in the propagation of the tale (such as the individual represented by the dotted circle)
must have been able to understand and remember in a synthetic form the content of several narratives; she must
have been able to reformulate the memorized story in the form of a new but closely similar narrative, and, of course,
she must have been motivated to do so, for instance, by a request from her listeners. (Please, Granny, tell us the
story of Tom Thumb!)
To explain the success of a tale, at least during the period where this success exclusively depended on oral
transmission, one must describe what made it
end p.150
FIGURE 10.1 Fragment of the causal chain of a tale.
FIGURE 10.2 A causal link in the causal chain of a tale.
end p.151
FIGURE 10.3 Fragment of the causal chain of the mayonnaise recipe.
particularly easy to understand, to remember, and to tell. Different types of explanatory factors will have to be
invoked. Some pertain to the local conditions in which the tale was transmitted; others pertain to more general
cognitive or motivational dispositions of the human mind (see Rubin, 1995). Given the diversity of social and cultural
contexts where, through countries and centuries, a tale like Tom Thumb prospered, one may surmise that general
factors will be of particular explanatory importance in this case. Other oral tradition narratives such as the founding
myths of particular dynasties have a distribution that is more linked to local factors.
A folktale is a particularly simple case of a cultural phenomenon, since the causal chains that distribute its versions are
made just of an alternation of mental and public representations of the tale itself. Few cultural phenomena are that
simple. The case of an elementary knowhow, such as that involved in the domestic preparation of the mayonnaise
sauce, already involves a more complex causal chaining, a very simplified fragment of which is represented in figure
10.3. We have here two interconnected causal chains. One (in thick lines) transmits the mayonnaise knowhow from
cooks to cooks; the other (in thin lines) perpetuates the demand for mayonnaise that ordinary consumers address to
cooks (explicitly or implicitly, by showing their appreciation). On the mental side, there are at least three types of
representations: descriptive/normative representations of the mayonnaise itself (its composition, its taste, its texture,
its aspect); more or less explicit representations of the recipe; and representations of mayonnaise tokens (e.g.
intentions to prepare a mayonnaise, or appreciation of a mayonnaise). On the public side, there are actual
mayonnaises, requests for mayonnaise, and tokens of the recipe. Recipes can be transmitted orally or in writing, with
or without demonstration of the procedure. Each of these mental and public types of episodes is articulated with other
types (many of which are not included in the figure, e.g. public and mental representations of appropriate uses of the
mayonnaise) and contributes to the cultural success of the mayonnaise. Most cultural phenomena involve many more,
and more complex, causal chainings (and so does the mayonnaise itself,
end p.152
when one takes into consideration not just its homemade but also its commercial versions).
Whatever its complexity, the causal explanation of any cultural phenomenon has to invoke, as in the cases of Tom
Thumb or the mayonnaise, two kind of episodes, mental and public ones; it has to spell out how each kind of episode
triggers the episodes that follow in the causal chain; for this, the explanation will have to rely on a combination of
local and general factors. Local factors are involved in the explanation of cultural variations. General factors are
involved in the explanation of the very possibility of culture and of its variability.
2 The Microprocesses of Cultural Transmission
The basic structure of the causal chains of culture consists, as just illustrated, in an alternation of mental and public
episodes. How can such an alternation secure the stability of the contents transmitted? Two main types of processes
have been invoked: imitation and communication (see fig. 10.4). Imitation decomposes into a process of observation
and a process of reproduction of the behavior or of the artifact observed. In between these two processes, there must
be a third, mental one that converts observation into action. Communication decomposes into a process of public
expression of a mental representation and a process of mental interpretation of the public representation. Between
these two processes, there must be a third environmental process whereby the action of the communicator impinges
on the sensory organs of the interpreter. Ideally, imitation secures the reproduction of public productions (behaviors or
artifacts) while communication secures the reproduction of mental representations. Imitation and communication may
overlap or interlock when the imitator acquires a mental representation similar to the one that guided the behavior
imitated, or when the interpreter reproduces the public representation that is being interpreted.
Actually, recent work on imitation (e.g. Blackmore, 1998; Heyes & Galef, 1996; Hurley & Chater, 2005; Tomasello et
al., 1993; Whiten & Ham, 1992) and communication (e.g. Sperber & Wilson, 1995) tends to show that their power and
role,
FIGURE 10.4 Imitation and communication.
end p.153
even if crucial, have been overestimated. To begin with, imitation and communication are not strict copying
mechanisms. Imitators or interpreters construct a version rather than a replica of what they imitate or interpret. They
do so not just because the mechanisms of imitation and communication are imperfectwhich they arebut also, and
more important, because even if a strict copy could be produced, this is not what the imitator or interpreter is
generally aiming at: imitation or interpretation is a means to an end rather than an end in itself. With rare exceptions
(such as the forging of a banknote), the goal of imitators and interpreters is served well enough, or even better, by an
approximation or an adapted version of the model. Moreover, the production of behaviors and thoughts informed by
the behaviors and thoughts of others typically involves processes that are more constructive than is assumed by
common accounts of imitation or communication. An imitator often takes inspiration from the model rather than copies
it (and this is imitation only in a loose sense). An interpreter develops her own thoughts with the help of those of the
communicator without necessarily adopting these and, for that matter, without being concerned with the strict accuracy
of her interpretation.
To illustrate how imitation has been overestimated, let us make a detourbut is it really a detour?through the case
of animal cultures. Very often mentioned as an example of cultural transmission among nonhuman animals is the case
of the English tit and the milk bottle. At the time when, every morning, milk bottles with aluminum foil caps were
delivered in front of every English house, these birds had learned to peck a hole in the cap and to enjoy the cream at
the top of the bottle. In a matter of years, this skill had spread among tits throughout England. Unless one stipulates
that cultural applies only to humans, this is a clear case of cultural transmission: a skill shared by a whole population
and transmitted not genetically but through interactions among individuals.
If we mention this example, it is because it has undergone an interesting reinterpretation (see Sherry & Galef, 1984;
Galef, 1988). According to its classical description, each novice tit was observing the way expert tits procured cream by
piercing the milk bottle cap, and reproducing this action to achieve the same goal. According to the more
parsimonious, now generally accepted description, tits have an instinctive disposition to peck at objects made salient by
the pecking behavior of other tits. Hence a tit observing another tit pecking at a bottle cap will be inclined to do
likewise. It will then discover on its own the benefit to be gained from such a behavior and be reinforced to repeat it
when the occasion arises. According to this redescription, we are not, in fact, dealing with the imitation of a complex
action the structure and end-point of which would be understood by the imitator. The observation of other tits pecking
at bottle caps makes bottle caps more peckable objects, and the disposition to do what has proved beneficial
determines the adoption of pecking at milk bottles' caps as a regular type of behavior. The acquisition of the skill is
triggered by the observation of the behavior of others, but it consists not in an imitation but in a new individual
acquisition of the routine. It draws mostly on psychomotor resources already present in the individual, on stable
features of the environment. Rather than of imitation, one speaks in such cases of stimulus enhancement. Other cases
of the spread of a type of behavior in a population involve emulation rather than imitation: one animal observing
another
end p.154
animal achieving some result rediscovers a means, identical or not, for achieving the same goal.
One may, in such well -documented cases of the spreading of a skill in animal populations, speak of properly cultural
phenomena (see Whiten et al., 1999). Still, there is a major difference between these and human culture. Some
animals have cultural practices, but, apart from these, their social life is culture-free. Human life, on the other hand
and not just social activities but also individual activities and thoughtis soaked in culture from infancy. It would be
mistaken, however, to infer from this that human cultural transmission relies more on strict copying and less on
processes of individual construction stimulated by the observation of others.
3 Explaining Both Cultural Diversity and Stability
Anthropologists have been justly fascinated by the richness and variety of the cultures they have described and tried to
explain. They have relied on an image of the human mind as a blank slate, or, less metaphorically, as a learning
system without limits or biases, equally open to any kind of cultural content (see Sperber, 1985b; Pinker, 2002). To
most developmental psychologists, this view has become unacceptable. They see rather the acquisition of knowledge
and competencies as a process guided by innate learning dispositions that allow the child to approach different
domains with schemas that are, at least in part, domain specific (Hirschfeld & Gelman, 1994; Sperber et al., 1995).
The issue, then, is to articulate the diversity of cultures as documented in anthropology with our best understanding of
cognitive development.
Not only the diversity of cultures but also their relative stability calls for an explanation. The contents of cultural
representations and practices must remain stable enough throughout a community for its members to see themselves
as performing the same ritual, sharing the same belief, eating the same dish, and understanding the same proverb in
the same way. We are not denying, of coursein fact we are insistingthat culture is in constant flux and that its
stability is often exaggerated. Still, without some degree of stability, nothing cultural would be discernible in human
thought and behavior. In fact, a wide variety of representations, practices, and artifacts exhibit a sufficient degree of
stability at the population scale to be recognizably cultural. It is tempting then to assume that this stability is secured
by processes of faithful reproduction at the level of microtransmissions. Otherwise, it seems, the cumulative effect of
even small copy errors would jeopardize the stability and hence the properly cultural character of the contents
transmitted. Anthropologists (and, today, also memeticists developing the suggestions of Richard Dawkins, 1976,
1982) take generally for granted that human imitation, communication, and memory abilities are sufficiently reliable to
secure a faithful enough reproduction of contents through communities and generations. Faithful enough does not
mean absolutely faithful, of course; it means faithful enough at the micro level to explain the relative stability we
observe at the macro level.
This a priori argument to show that cultural items are truly replicated in the microepisodes of their transmission does
not withstand even a cursory examination
end p.155
of the facts of the matter. Variations are the norm rather than the exception at the level of individual episodes of
imitation, communication, and memory storage and retrieval. Neither memory nor the micromechanisms of
transmission come near the level or reliability that would explain cultural macro stability. But how, then, can this
relative stability be explained at all?
Just as we must articulate cultural diversity as evidenced by anthropology with the complexity of innate cognitive
dispositions discovered by developmental psychologists, we must articulate the relative stability demonstrated by the
very existence of culture with the observation of the transformations in content involved in most microtransmissions.
Our claim is that these two tasks not only can but must be carried out together.
To try and explain the diversity of cultures by assuming, as anthropologists have done, that the human mind is
indefinitely malleableinasmuch as the idea makes any psychological senseis to deprive oneself of the means to
explain cultural stability. Beings with an indefinitely malleable mind would, at every turn, adopt the last opinion, the
last practice, the last goal encountered. They could never achieve the deep and largely unconscious allegiance to the
ways of the cultural group that is so characteristic of human existence. Cultural ways themselves would not stabilize in
such conditions. If one imagines that, as clay drying, the malleable mind rigidifies as soon as it has acquired a given
shapelet us forget for a moment the psychological poverty of these metaphorsthen it is the adaptability that is
individually demonstrated by humans throughout their lives that becomes unexplainable.
One might be tempted to explain cultural stability by a human predisposition to acquire culture, a generalization, so to
speak, of the language faculty as seen by Chomsky. It is from such a perspective, for instance, that Susan Blackmore
(1999) attributes to humans a disposition to imitate that transforms them into meme machines. Nothing, however, in
developmental psychology or in neuropsychology confirms the existence of such a general culture faculty (based on
imitation or anything else). The acquisition of different types of cultural competencies (language, mathematics, dancing,
the sense of honor, for example) follows quite different patterns. Cultural competences can be selectively impaired
through brain damage. What this suggests is that cultural information is based not on an integrated and specific
culture acquisition mechanism, but rather on the interaction of several cognitive mechanisms with different
specialization.
Incidentally, postulating a culture faculty would raise the following problem. While modern humans emerged some
200,000 years ago, the existence of an omnipresent, richly symbolic culture is well evidenced only in the last 40,000
years or so. It is likely, of course, that currently available archeological data fail to do justice to the cultural wealth of
earlier Homo sapiens, but, even so, it is quite possible that, for a large part of its history, Homo sapiens had only
rudiments of culture, a richer version of what is found among other primates rather than a simpler version of the all-
encompassing culture we are familiar with. More generally, there is nothing implausible in the idea that there could be
an intelligent species with high communicative abilities but communicating only about local and transient states of
affairs and stabilizing only rudiments of culture. It could be that, for much of its history, Homo sapiens was such a
species.
end p.156
4 Modules and Their Domains
One hypothesis we would like to invoke here to help explain both cultural diversity and cultural stability is that of a
modular organization of the mind/brainand we stress help explain, since this is not meant to be more than an
important component of the overall explanation (with other important components involving, for instance, history and
ecology). According to the massive modularity hypothesis (see Carruthers, 2003a; Cosmides & Tooby, 1994; Samuels,
1998, 2000; Sperber, 1996, 2002), the mind is to a large extent made up of a variety of domain- or task-specific
cognitive mechanisms, or modules. It might seem that massive modularity would imply a level of cognitive rigidity
that is hardly compatible with cultural diversity. We want to argue, on the contrary, that massive modularity, properly
understood, is a crucial component in the explanation of this diversity.
A cognitive module is an autonomous mind-brain device characterized by specific inputs from which it derives specific
outputs through its own procedures. A module is autonomous not only in the way it functions but also in its
phylogenetic and ontogenetic development, which are distinct from that of other modules, and also in its failures,
which can be quite diagnostic.
Most innate human modules are learning modules (in the broad common sense of learning, not in that of learning
theory). Most modules in the mature human cognitive system are generated by these learning modules through an
epigenetic process and hence are not innate but do have an innate basis. While infants show fear of height without any
previous experiencepresumably a truly innate modulethe capacities of the face-recognition module develop with
each face the child learns to recognize: here the module requires the acquisition of, at least, a dedicated data basis.
Linguistic competence in a given language such as Tagalog or English has, we would argue, the form of a language-
specific module produced by a language acquisition module through a process where not just specific data but also
specific procedures have to be acquired.
There is, then, a continuum of cases between properly innate modules and more or less structured dispositions to
modularize specific types of cognitive and motor competencies (including some cultural competencies, such as reading,
that are too recent to have had a noticeable effect on the evolution of the genomesee Dehaene, 2003). What we are
suggesting, in other terms, is that we both distinguish and closely connect the notion of a module and that of a direct
biological adaptation. Innate learning modules are biological adaptations that perform their functions by drawing on
cognitive inputs to generate acquired modules. Acquired modules have an innate basis and have derived biological
functions (in the sense of Millikan, 1984) and direct cultural functions (Origgi & Sperber, 2000). With cognitive
adaptations and modules articulated in this manner rather than equated, the massive modularity thesis should become
much more plausible and acceptable.
To explain the role played by modules in cultural diversity and stability, Sperber (1996) introduced the notion of the
domain of a module. A cognitive modulefor instance, a snake detector, a face-recognition device, a language
acquisition devicehas as its function to process a given type of stimuli or inputs; for instance, snakes, human faces,
or linguistic utterances. These inputs are the proper domain of the
end p.157
FIGURE 10.5 Proper and actual domains of a module. (a) Proper domain (full line) and actual domain
(dotted line) of a venomous snake detector; (b) proper domain (full line) and actual domain
(dotted line) of a berry detector.
module. To recognize inputs belonging to its proper domain, a module uses formal conditions that an input has to
meet in order to be accepted and processed. All inputs meeting the input conditions of a module make up its actual
domain. These input conditions can never be perfectly adequate. Some items belonging to the proper domain of the
module may fail to satisfy thema snake may look like a piece of wood. Some items not belonging to the proper
domain of a module may nevertheless satisfy its input conditionsa piece of wood may look like a snake. If only
because cognition is a probabilistic activity, the actual and the proper domain of a module are unlikely ever to be
strictly coextensive. There will be false negatives, that is, items belonging to the proper domain but not to the actual
domain, and false positives, that is, items belonging to the actual but not to the proper domain. When false negatives
are much more costly than false positives, as in the case of a snake detectorbetter mistake a piece of wood for a
snake than a snake for a piece of woodit can be expected that the actual domain will be larger than the proper
domain and will almost entirely include it (fig. 10.5a). When false positive are much more costly than false negatives,
as in the case of a berries detectorbetter miss a few berries than swallow a poisoned fruitit can be expected that
the actual domain will much smaller than the proper domain and will almost entirely be included in it (fig. 10.5b).
The way the proper and the actual domain of a module overlap may also depend on the history of the environment in
which the module has been operating. Imagine, for instance, a venomous-snake detector selected at a time in the
history of the species where most snakes present on some island were venomous. The function of this module is to
help organisms endowed with it avoid these venomous snakes. However, its input conditions are met by all snakes and
not just venomous ones. In such conditions, the actual domain, which contains all perceptible snakes, was, from the
start, significantly larger than its proper domain, which contains only venomous snakes, even if the latter was large
enough to cause the evolution of the detector (fig. 10.6a). In a later period, the environment had changed. There were
still as many snakes activating the module, but most were harmless. In other terms, the proper domain dwindled,
while the actual domain remained as large as before (fig. 10.6b).
end p.158
FIGURE 10.6 What happens to the proper and actual domain of a venomous snake detector when
these snakes become extinct. (1) Initially, there are plenty of venomous snakes belonging to the
proper domain (full line); (b) later, there are many fewer venomous snakes belonging to the
proper domain (full line); (c) finally, venomous snakes are gone; the proper domain is empty; the
actual domain (dotted line) is reduced.
Today, there are fewer snakes on the island, and they are all harmless. The perception of a snake still activates the
venomous-snake detector, but its proper domain is now empty (fig. 10.6c). Given that there is no benefit to
compensate for the costs of this activation, there is selective pressure for the elimination of the module.
In some cases of particular interest here, the mismatch between the proper and the actual domain of a module results
in part from the exploitation of the module by other organisms. Striking illustrations are provided by animal mimicry.
Many insectivorous birds, for instance, have the ability to detect wasps, which are dangerous to eat. Hover flies, which
are good food for these birds, have evolved black and yellow stripes on their abdomen that mimic the appearance of
wasps and activate the birds' wasp-detecting module. These hover flies have invaded the actual domain of the birds'
wasp detector, where they trigger false positives to their own advantage. Camouflage is another form of exploitation of
the relative rigidity of modular detectors. While mimicry consists in the invasion of the actual domain of a detector by
organisms that don't belong to its proper domain, camouflage consistson the part of organisms belonging to the
proper domain of a detectorin eliminating, or at least attenuating, the features that would make them belong to its
actual domain, resulting in false negative that are advantageous to them.
end p.159
The manipulation of the cognitive modules of another organism can occur not only in interspecific relationships (as in
animal mimicry and camouflage) but also in intraspecific interaction (for instance, in cases involving sexual selection).
This takes place to a unique extent among humans. Humans seek to influence one another in many ways and hence
need to both attract and direct the attention of others. A reliable way to attract attention is to produce information
that falls within the actual domain of modules, whether or not it also falls within their proper domain. Moreover, given
the rigid patterns of modular processing, the direction in which such information is likely to be processed is relatively
easy to predict.
A great variety of cultural artifacts are aimed at specific modules. For instance, face-recognition modules found in
primates accept as input simple visual patterns that in a natural environment are almost exclusively produced by actual
faces. In the human cultural environment, many artifacts are aimed at the face-recognition module. They include
portraits, caricatures, masks, and made-up faces. The effectiveness of these cultural artifacts is in part to be explained
by the fact that they rely on and exploit a natural disposition. Often they exaggerate crucial features, as in caricature
or in makeup, and are what ethologists call superstimuli. The effectiveness of these artifacts in turn helps explain
their cultural recurrence. More generally, the actual domain of human mental modules is invaded and inflated by
culturally produced information. When some specific type of information is culturally produced in order to activate a
module, it can be described as a cultural domain of the module. For instance, portraits, caricatures, masks, and made-
up faces are cultural domains of the actual domain of the face-recognition module (fig. 10.7). Cultural domains are
likely to be outside of the proper domain of the module, as is the case with portraits, caricatures, or masks. They may
also fall within the proper domain, as in the case of made-up faces: these are genuine faces, and therefore it is the
function of the face-recognition module to analyze them; however, they are
FIGURE 10.7 Proper domain (full line), actual domain (dotted line), and three cultural domains
(shaded) of the face recognition module.
end p.160
faces that have been artificially transformed so as to be interpreted, for instance, as younger or healthier than they
really are.
We illustrate this approach with three types of cultural phenomena: folk biology, folk sociology, and supernaturalism.
4.1 The Case of Folk Biology
All animals interact with a variety of other animals and plants and must organize knowledge about them to guide their
own behavior and interpret the properties and behaviors of other species (e.g., aggression from predators or sweet
taste from ripe fruits). In the human case, categorization of living kinds is complex, comprehensive, and cultural (see
Berlin, 1992). In different cultural traditions, plants and animals play diverse roles (e.g., in activities ranging from
foraging and agriculture to totemism). Nevertheless, folk taxonomies the world over are remarkable in the degree to
which they structurally resemble each other and in the extent to which they match scientific taxonomies.
Sorting plants and animals into categories is largely guided by regularities in perceptual discontinuities in morphology
in local ecologies. However, reasoning about living things is not principally based on inductive processes.
Developmental findings provide evidence for a special -purpose module for folk or naive biology. Despite often
fragmentary and limited experience, young children's inferences and expectations about the nature of living things are
like adults': they are based on the fact that category membership supports very rich and varied inferences (Atran,
1995). These inferences obey a naive form of inference according to which each living kind has an unseen essence.
These implicit species-specific essences are treated as having causal effects on the appearance and behavior of
members of the kind (Gelman & Hirschfeld, 1999). Young children, for example, privilege common folk category
identity over similarity in appearance when inferring whether different living things share biologically relevant
properties. Young children also understand that a living thing's category membership is fixed; both with respect to
developmental changes organisms may naturally undergo and with respect to the imperviousness of species-typical
properties. Crosscultural evidence is scant, but what little exists indicates that both expectations do not vary culturally
(e.g. Atran et al., 2001; Sousa et al., 2002).
The unique importance of animals and plants in ancestral environments and the fact that they afford domain-specific
patterns of classification and inference suggest that a dedicated module might have evolved that governed the
categorization of living kinds and reasoning about them. The similarities of folk taxonomies across cultures and the
regularities in the acquisition and deployment of these taxonomies confirm this hypothesis (see Atran, 1990). The
proper domain of the living-kinds module would have been the local plants and animals with which the individual had
to interact. However, the fact that inputs to this module come not just from direct experience of the living creatures to
be categorized but also, and crucially, from communication with other people allows expanding the actual domain of
the module well beyond its proper domain and the limits of local ecology. Using verbal descriptions and pictures as
inputs, the module may build representations of many
end p.161
species with whom the individual is unlikely ever to interactincluding extinct species such as the dinosaurs, or
imaginary species such as dragons.
The module may enrich its categories with information about both familiar and unfamiliar species, information the
relevance of which is often cultural rather than practical. Indeed, folk biology strikingly illustrates how the existence of
evolved modular dispositions to attend to and organize information in a domain-specific way lends itself to a massive
cultural exploitation. For example, in modern societies, wolves are encountered, if at all, only in zoos. However, a
culturally transmitted representation of wolves as dangerous predators of humans (which they are not) is among
children's earliest acquisitions. This representation is a strong attention-catcher, a source of recurrent metaphors, and
it has played an important role in folklore and children's literature (see Zipes, 1993) and, recently, in an American
presidential election campaign. Culturally reinterpreted wolves have become superstimuli. Modular processing of
information about living kinds is similarly the basis for the variety of cultural exploitations lumped together in classical
anthropological theory under the label of totemism (Lvi -Strauss, 1963).
4.2 The Case of Folk Sociology
All social animals face the challenge of coordinating behavior with members of their own and other social groups. They
are likely to have, for this, dedicated cognitive abilities involving, in particular, the ability to categorize conspecifics as
members of different social categories or groups. Among primates, it has been argued that the increasingly complex
forms of group living have triggered the evolution of a higher order cognitive capacity to attribute mental state to
others. Such a naive psychology capacity may play a major role in cooperation, communication, deception, and its
detection, coalition formation, and social competence generally (see Whiten & Byrne, 1997). However, there is no
reason to assume that, in some primates and in particularly in humans, it replaces, rather than complements, forms of
social competence found in social species without naive psychology. Primates (human and nonhuman) simultaneously
belong to many social groupings (based on territory, intragroup status, sex, biological relatedness, and transient or
opportunistic coalitions), membership in any of which provides a basis for predicting and interpreting the behavior of
others (Hirschfeld, 2001). The cognitive demands of such inference are sufficiently specific and complex to suggest the
possibility of a special -purpose modular competence in naive or folk sociology quite distinct from folk psychology, and
of probably much greater ancestry.
Unlike the social lives of nonhuman primates, human social life is thoroughly cultural. All forms of social organization,
from biological-sounding kinship to such artificial groupings as monastic orders and political parties, vary culturally
and rely on culturally transmitted, partly explicit institutional rules. The distinction between the proper and actual
domains of a cognitive module makes it possible to understand this cultural diversity as a function of the evolution of
abilities found in other primates. The proper domain of primate and ancestral naive sociology modules consisted in the
group affiliation of conspecifics. The actual domain of these modules was determined by whatever (in an individual's
bodily appearance,
end p.162
behavior, or the reaction of others to him or her) provided evidence of an individual's group memberships (e.g.,
chimpanzee strategies of facial phenotypic matching used in kin recognitionsee Parr & de Waal, 1999).
The culturalization of social groupings must initially have consisted in the elaboration of these cues of group
membership. For instance, to natural sexual dimorphism was added a cultural gender dimorphism. Thus existing
mechanisms for social cognition were presented with culturally contrived superstimuli ( just as in the case of face
recognition superstimulated with makeup). Cognitively, groups are characterized by whatever cues make it possible to
identify their members and by the inferences this identification affords. In an ancestral environment, these cues were
natural whereas in modern human environments, they are typically culturally enhanced or even culturally constructed
cues.
Indeed, just as living kinds are categorized not only on the basis of direct experience but also, and crucially, on the
basis of communication, the recognition of social groups draws heavily on verbal labels and clichs and other
expressions of group membership and of attitude to other groups. The displacement of natural signs of group
membership by more salient cultural signs, together with communication about the consequences of group,
membership made possible the construction of novel social groupings, a process that has a self-realizing character (see
Hacking, 1995). If a culture recognizes, say, castes as genuine social categories with distinctive consequences for their
members, then they are genuine social categories (although their actual sociological character may be misrepresented
in the folk sociology). Whatever culturally constructed social groupings happen, at a given time and place, to fill the
actual domain of a social competence module also falls within its proper domain.
4.3 The Case of Supernaturalism
Folk biology and folk sociology are cultural systems of representations that, we argued, may be each grounded in a
specific evolved cognitive mechanism. However, not every system of cultural representations matches a distinct
cognitive disposition. It is implausible, for instance, that representations of supernatural beings and events of the type
found in all religions (and also in folklore, art, and literature) are grounded in an ad hoc cognitive mechanism. After all,
supernatural beings, unlike living kinds or social groups, were not part of the environment in which humans evolved. It
has, nevertheless, often been argued that religion responds to a basic human need, be it a need for answers to
fundamental questions, a need for transcendence, a need for comfort and reassurance, or a need for superior
authority. From a point of view informed both by cognitive science and evolutionary biology, the existence of such
needs and the ability of religion to satisfy them are quite questionable. Typically, religious beliefs raise more questions
than they answer, and cause anxiety as much as they comfort (there is, say, a promise of eternal life after death, but
it might be spent in hell). Explaining religion by a religious disposition lacks insight and plausibility (see Boyer, 2003).
The ubiquity and salience of cultural representations of supernatural beings may be accounted for in terms of a
modular cognitive architecture without
end p.163
assuming that there is a modular disposition to represent such beings or to look for supernatural explanations.
Representations of supernatural beings do not just depart from what is taken to be natural or ordinary. A zebra with
red and blue stripes or a person who, like Borges's character Funes, remembers everything, however out of the
ordinary and in practice impossible, are unlikely ever to become culturally recognized supernatural beings. Supernatural
beings are not just impossible in nature. They blatantly violate the kind of basic expectations that are delivered by
domain-specific cognitive mechanisms. In direct clash with naive physics, some are able to be in several places at the
same time or to pass through solid objects. In direct clash with naive biology, some belong to several species at the
same time or can change from one species into another. In direct clash with naive psychology, some can literally see
all past and future events. Despite these striking departures from intuitive knowledge, the appearance and behavior of
supernatural beings is otherwise what intuition would expect of natural beings. That is, they have enough of the
characteristic features of plants, animals, people, topographic entities, or celestial bodies to fall squarely in the actual
domain of cognitive modules. Supernatural animals have, apart from their supernatural features, a regular biology.
Supernatural agents have a beliefdesire psychology. As argued by Boyer, it is this combination of a few striking
violations with otherwise conformity to ordinary expectations that makes supernatural beings attention-arresting and
memorable, and rich in inferential potential (see Boyer, 2001).
Representations of supernatural beings, we suggest, spread and stabilize in different cultures because they act for one
or several cognitive modules as superstimuli. Unlike other superstimuli, which have some features exaggerated while
essential features are maintained, these cultural superstimuli typically combine exaggerated and paradoxical features
with ordinary and essential ones. One way they may be paradoxical is in falling simultaneously in the actual domain of
two different modules. For instance, a sacred tree may be attributed agency: its appearance activates a naive botany
module, whereas what is said of it and the way it is treated activates a mind-reading module. Representations
belonging to a complex system such as a religion (which involves not only representations but also practices, artifacts,
and institutions with a much more complex epidemiology) need not be all anchored in one and the same cognitive
module. On the contrary, multiple anchoring in several cognitive mechanisms may contribute to the cultural system's
stability (Atran, 2002).
5 Conclusion
The propagation, stabilization, and evolution of cultural representations have a variety of causes. They are helped or
hindered by demographic and other ecological conditions, in particular by humanmade features of the environment,
and by educational, political, and religious institutions. We agree with standard social science that culture is not human
psychology writ large and that it would make little sense to seek a psychological reductionist explanation of culture. We
believe, however, that psychological factors play an essential role in culture. Among these psychological factors, the
modular organization of human cognitive abilities favors the recurrence, crosscultural variability, and local stability of a
wide range of cultural representations.
end p.164
This page intentionally left blank.
11 Shaping Social Environments with Simple Recognition Heuristics
Peter M. Todd
Annerieke Heuvelink
Imagine walking through the dark streets of Berlin on a cold night, looking for a place to get a good hot chocolate.
You've been to this neighborhood only a couple of times before, so while you have experienced a few of the bars, you
don't know much about them. You can't see in through the steamy windows, so you just have to make a choice and go
in, hoping it won't be one of those places where the music screeches to a halt and all the locals look up from their hot
chocolates to glare at you as you step inside. How can you decide which to try? You could risk a choice at random, or
choose one of the places you already recognize; or you could call up a friend or two and ask for recommendations. But
if your phone is broken and you can't communicate with anyone who has more knowledge, you could also think back
to your previous experience and recall how many other people were in the bars when you were there, or even how
many acquaintances had been there at that time. What would happen to the popularity of the bars if you and
everyone else used one of these methods to choose where to drink chocolate? Would all bars be equally visited, or
would some become very popular while others foundered? Will your decisions create hotspots and dead zones, shaping
the social environment of Berlin bars?
There is good reason to believe that your choices will indeed shape the fate of the chocolate-purveying scene, rather
than just maintaining the status quo. The combined decisions of a population of agents can powerfully shape their
environment, often leading some thingspeople, novels, cities, and the liketo become much more well known, and
more widely preferred or chosen, than others. This can be seen in the J-shaped function relating the popularity or
success of items in some domain to their rank in that domainfor instance, best-selling authors sell vastly more books
than the great majority of little-known authors, and the most popular consumer brands, from soft drinks to soap, sell
much more than their lesser competitors (Hertwig et al., 1999).
But how does such agreement, in the form of a great number of people making the same choices, come about? What
psychological mechanisms underlie this
end p.165
cultural structure? Much research in anthropology and human behavioral ecology has gone into showing that some
simple psychological mechanisms can evolve and help people to find and converge on beneficial cultural innovations in
a variety of settings (see Henrich & McElreath, 2003, for an overview, and Boyd & Richerson, 1985, for details). In
particular, prestige-based mechanisms direct people to copy the behaviors and choices of successful individuals, while
conformity-based mechanisms specify determining the most common behaviors and choices in a population and
following those. Both types of biased transmission mechanisms can, for example, lead all the hunters in a group to
adopt bows rather than blow-darts, or the farmers in a region to plant potatoes rather than corn.
However, even simpler cognitive mechanisms, which neither seek to identify successful models nor keep track of
frequencies of behaviors in a population, can enable a population of interacting individuals to coalesce strongly on a
few cultural options, as seen in modern environments, despite the vast number of choices available. As we will
demonstrate in this chapter, just making choices based on the options one recognizes can lead to population
convergence of the same sort seen in the anthropological models with two options of differing qualityeven when
there are many options, and when they all have the same underlying quality (as, for instance, if the many bars in
Berlin all got their hot chocolate from the same central source). This convergence relies on the recognition knowledge
of individuals arising through their interactions with others, through either communication or indirect observation. Thus
we argue that simple recognition-based decision mechanisms operating in a social setting may achieve some of the
same culture-shaping effects as the biased transmission mechanisms explored previously.
1 Making Decisions Using Recognition
While we humans may pride ourselves on our ability to make intelligent choices in a challenging world, we are limited
in the amount of information we can process, the amount of time we can process it in, and the amount of computation
our minds are able to carry out. For most of our decisions, we rely on simple cognitive heuristics, shortcuts that enable
us to make good-enough choices quickly and cheaply. The surprising finding of a growing body of psychological
research is that such fast and frugal heuristics can exploit the structure of information in the task environment to
make decisions that are as good as, and in some cases better than, what more complex and information-hungry
mechanisms would produce (Payne et al., 1993; Gigerenzer et al., 1999). These simple components of our mind's
adaptive toolbox (Gigerenzer, 2001), some of which are evolved and some of which are learned, are thus, by virtue
of their fit to the environment, often the best tool available for a particular inferential job.
Perhaps the simplest decision heuristic is the recognition heuristic (Goldstein & Gigerenzer, 1999, 2002), which actually
makes use of an individual's lack of knowledge. It is based on the deep-rooted cognitive capacity to store and
recognize (rather than recall) particular names, faces, locations, and objects. The recognition heuristic can be used by
agents who do not know anything about a set of options they must choose between, other than whether or not they
have encountered each
end p.166
particular option before. The heuristic then simply says to select one of those options that are recognized (in a binary
fashion, yes or no) over those that are not. If there is more than one available option that is recognized, then the
recognition heuristic chooses randomly among them; if none of the options are recognized, then a completely random
choice is made. Thus, the recognition heuristic can only be used when the decision-maker knows about some of the
objects in a particular set but is ignorant of others.
People and other animals use the recognition heuristic in a variety of settings. For instance, Norway rats use
recognition knowledge gained by smelling the breath of their nestmates to guide their food choice on subsequent
foraging trips, preferring to sample recognized foods (Galef, 1996). In laboratory settings, people use the recognition
heuristic to decide which of two cities is larger, or which of two rivers is longer, or which of two sports teams wins
more often (Goldstein & Gigerenzer, 2002). Furthermore, these recognition-based decisions are highly accurate when
the heuristic is used in a domain where recognition and ignorance are appropriately structured, that is, where objects
higher on the criterion (e.g., length of river) are more often recognized. This is likely to be the case whenever objects
that are extreme on some criterion dimension are more often talked about among individuals or mentioned in the
media. Goldstein and Gigerenzer (1999, 2002) showed how this holds for the city-size dimension: large cities are more
often mentioned in newspaper headlines than small ones, which in turn can drive the greater recognition for larger
cities that makes the recognition heuristic ecologically rational to use in this task environment.
Recognition knowledge is often a highly valid cue to the structure of the environment, giving the recognition heuristic
high rates of inferential accuracy. And people are sensitive to this power of recognition: in small-group settings, when
the goal is to agree upon a particular decision, such as which of two cities is larger, those individuals who can use the
recognition heuristicthat is, those who recognize one city but not the otherare often given more influence than
others who know both cities (Reimer & Katsikopoulos, 2004).
When all of the options to be decided among are recognized by an individual, then the recognition heuristic as just
presented cannot be used to choose between them. However, there can still be informative differences in the
recognition knowledge for each optionsome things may have been encountered more recently, or more often, than
others, and so may have a higher overall activation in memory. Something like this memory activation probably
underlies the recognition judgment in the first place: as long as it is above a particular threshold, the object is judged
recognized, and if the activation is below the threshold, the object is unrecognized (Schooler & Hertwig, 2005). The
recognition heuristic throws away any differences in activation values that are above the recognized threshold, but
another heuristic, Schooler and Hertwig's fluency heuristic, capitalizes on those differences to choose the highest
activated option. This strategy works well for selecting options that have been more frequently (and recently)
encountered in the environment, and thus it is ecologically rational when objects that are higher on the choice criterion
are also more often experienced. Schooler and Hertwig have furthermore shown that both the recognition and fluency
heuristics benefit from a particular
end p.167
amount of forgetting, so that recognition memory does not become clogged with every object ever encountered, no
matter how far back in time. (See also Todd & Kirby, 2001, for the importance of forgetting in agent-based recognition
models of the sort investigated here.)
Where do we get the knowledge stored in recognition memory? It can come from individual experience, encountering
different objects or behavioral options as we move about in the world. In this case, the more commonly encountered
things will have stronger traces in memory because they receive more frequent and recent updates, and these strength
differences are what the fluency heuristic exploits. But our recognition knowledge can also come from others. We can
directly hear about things that our conspecifics recognize, as when a friend tells us about a great new bar she's just
been to, and again, the more we hear about some thing, the stronger is its activation trace in memory. The strength of
recognition memory could in addition be influenced indirectly through social interaction, without communication of
knowledge between individuals. Particular options or behaviors may be activated more highly if we see that others
have made the same choice. Much work has been done on social conformity to explore how information about the
decisions of others can sway one's own decisions. Asch (1956) showed that people would change their judgments of
the length of a line segment shown to them when others around them made obviously wrong judgments. This social
conformity increased as the number of others increased up to five; bigger groups did not increase the conformity effect
much further. Another study done by Milgram, Bickman, and Berkowitz (1969) also showed this effect of group size.
Milgram had confederates look up into the sky in the streets of New York City; the greater the number of
confederates, the more people passing by would also stop and look up, ranging from 4 percent of passers-by with 1
confederate up to 84 percent with 15 confederates.
Latan (1981) summarized such results in his law of social impact to explain the effect of groups of people on single
individuals. This law says that the total impact of a group on a single target person will increase with the strength of
the group members, their number, and their proximity to the focal individual in time and space. Strength can be
authority, but it can also be familiarityyou are more likely to conform with people you are close to socially than to
strangers. Such factors underlie some of the model-based cultural learning mechanisms discussed by Henrich and
McElreath (2003), such as focusing on social models similar to oneself. Here we will concentrate on the effect of the
number of other individuals who have made the same choice as oneself. (Similar effects of the influence of other
conspecifics on an individual's behavior can be seen in other speciessee Noble & Todd, 2002, for connections.)
The spreading of ideas (whether in the form of knowledge, memes, fads, products, etc.) through societies has also
been studied from other perspectives over the years, ranging from conditional decision models (in which the decisions
of individuals are based on the decisions made by otherssee Granovetter, 1978) in sociology to the use of statistical
mechanics for modeling socioeconomic interactions (Durlauf, 1997). Economists have developed models to explore
what determines the eventual share of a product in a certain market. Arthur (1988) proposed that when self-
reinforcing spreading mechanisms are present in an economic
end p.168
system, common features will arise. These features include the existence of multiple equilibria, in terms of what ideas
or products will ultimately be adopted (different asymptotic market-share solutions are possible, so that the outcome
is not uniquely predictable); possible inefficiency (if one idea is inherently better than others but has bad luck in
gaining early adherents, the eventual outcome may not be of maximum possible benefit); lock-in (once an equilibrium
is reached, it is difficult to exit from); and path-dependence (the early history of market sharesin part, the
consequence of small events and chance circumstancescan determine which idea prevails). More recently, the
increased interest in networks and their structure has led to new research on the spreading of ideas or products from a
more sociological perspective, taking into account the structure of the social networks that individuals find themselves
in. This work addresses questions such as in which network structures ideas spread fastest, or which nodes should be
targeted in order to get an idea adopted (Grnlund & Holme, 2005).
In the models we present here, some of these features can arise, but others are currently not present because of our
simplifying assumptions; for instance, because we assume equal fitness of the spreading items, the phenomenon of
inefficiency cannot arise. In our minimalist approach, we also do not incorporate preexisting social networks (though
networks can be observed as emergent aspects of the agents' interactions in our models). Furthermore, we do not
include more complex features such as varying expectations or personalities of the modeled individuals, as we aim to
show that even much simpler processes can give rise to strong spreading patterns. (For a more complex model that
incorporates such aspects, see Lane, 1997.)
2 MethodsAgent-Based Models for Simulating Social Decisions
To investigate how decision-making agents can shape their environment in a coordinated fashion without direct
communication, we built a family of agent-based simulation models in NetLogo. In these models, agents inhabit a world
full of locations they can choose to visit, and each agent maintains a memory of locations it has seen, as well as, in
some cases, of other agents it has seen. As agents build up knowledge about their world and use it to decide where to
go, we watch for whether their decisions combine to create new structurehotspots and dead zones in how agents are
spread across locationsin their environment. Note that in these simulations, we assign all agents to use a particular
decision mechanism and see how that affects the structure of the environment they help create, rather than looking at
the evolution and spread of a particular decision or learning mechanism through the population, as has been done by
other modelers (e.g., Boyd & Richerson, 1985; Henrich & McElreath, 2003).
We look for the emergence of environment structure in these simulations in two main ways, as follows. The distribution
of how many patches or locations are chosen by different numbers of agents can vary from (1) a Poisson distribution in
which most patches are chosen by only one or two agentsthe unstructured environment in which our models start,
shown in figure 11.1to (2) a situation where a few patches are currently chosen by many agents (e.g., 9 or 10), and
are known
end p.169
FIGURE 11.1 Histogram of the number of patches (y-axis) that are known by a certain number of
agents (x-axis) showing the near-Poisson distribution of agents randomly scattered in the
unstructured environment at time step 1. (Most patches contain a single agent.)
(recognized) by nearly all of the agentsa clumpy world where knowledge and choices are focused on a small subset
of the possibilities (shown for example in fig. 11.4). We also track the correlation between how often the patches are
visited, or chosen, by agents and how well they are knownin other words, the correlation between choice and
recognition, or behavior and knowledge. If there is indeed coevolution of the knowledge about the environment in
terms of who knows what, and the structure of the environment in terms of who decides to go where, we expect this
correlation to rise.
For the sake of speed, we started off exploring the different models with 121 patches forming the 11 11
environment and with 200 agents forming the population. Incorporated in all our models are a memory for patches and
a memory for other agents for every agent. The program starts by randomly scattering the agents into the
environment. At the beginning, all the patches and agents are homogeneous. This is important, because we want to
explore how one patch can be known more and consequently visited more than another without there being any
end p.170
FIGURE 11.2 Another view of the distribution of agents (now on y-axis) among the patches (on x-
axis) at time step 1, with patches rank ordered from left to right by number of agents present.
Note the absence of a strong J-shaped distribution, indicating an unstructured (clumpless) social
environment.
underlying difference between them (e.g., the difference in bar attendance should not be explainable by some secret
ingredient that a particular bartender puts in her bar's hot chocolate).
As the simulation runs, at every time step, each agent is presented with a choice of four patches it can go to. Each
agent makes a decision among these somehow, using a rule or heuristic applied to its current knowledge. As
mentioned at the beginning of this chapter, humans can make decisions simply by looking at how well they recognize
the options. We use the same recognition heuristic to let our agents decide which of the presented options they want
to go to. They recognize a patch if it is in their memory. They can pay attention to that information in two different
ways. The first is strictly binary: Do I recognize this option, yes or no? This binary recognition knowledge gets used by
the recognition heuristic described earlier: agents always go to a patch (i.e., choose an option) they recognize. If they
recognize more than one patch, a decision is made at random between the recognized options. When none of the
options is recognized, the agent selects one at random. The second way to use recognition knowledge is as a
continuous variable: How well do I recognize this option? This real-valued knowledge is used by the fluency heuristic:
agents always go to the patch they recognize best. If no options are recognized or multiple options are recognized with
the same value (which is unlikely), the decision is made at random between the tying options.
After the agents decide where they want to go, every agent moves to its selected patch and increases the activation (if
any) of that patch in its memory with a value
end p.171
of 1.0. (Note that this and most of the other parameter values are arbitrary, with the rough differences between them
being more important than their precise values. The goal is to see whether any reasonable settings of the parameters
will lead to the emergence of social environment structure.) As indicated earlier, forgetting is also an important
component of this memory model; here, the memory trace of each patch simply decays (falls) by a fixed value with
every time step. This memory trace decay rate defaults to 0.1 in the models here. If the memory trace for a certain
patch falls below zero, that patch is no longer remembered. (Thus the recognition threshold for this rudimentary
memory model is 0.0; any positive memory trace results in the object being judged as recognized.)
A second important aspect of our models is the attention the agents pay to other agents around them. As indicated
earlier, people are readily influenced by others, the more so the more familiar they are with those others. In our family
of models, we explore different ways agents pay attention to other agents and various amounts of attention that
agents pay to other agents. The attention weight given to the presence of other agents defaults to 1.0. All the agents
have a memory for other agents they meet, updated on each encounter with a default value of 1.0, just as in the
memory for locations. This memory trace also decays every time step with a default value of 0.1.
We begin with a default model in which agents do not pay any attention to other agents, only to the patches they visit.
Next, we look briefly at the effect of allowing individuals to communicate with each other about the patches they
recognize. Finally, we consider two models with indirect social influence in which agents pay attention to the other
agents they encounter, in the following ways. First, individuals can pay attention to how many other agents are on the
current patch. In this case, this patch is stored in their memory with the default value plus a certain value for every
other agent on that patch. Think about a person walking into a bar and finding a lot of people inside. That person will
deduce that this is a quite popular bar and remember it as a good place to go. Second, individuals can notice the
agents they recognize in the current patch and can use this agent knowledge to modify how strongly they store their
experience of the current location. In particular, they remember the patch they are on with the default value plus a
certain increment for every other agent on that patch that they recognize. Imagine, again, the person walking into a
bar and seeing a few others she recognizes from other popular bars she goes tothat's an indication that the current
bar is also a happening spot to frequent.
3 ResultsWhen Does Environment Structure Emerge?
To look for the emergence of environment structure with various direct and indirect forms of social influence and
sharing of knowledge between agents, we ran a number of models according to the variations just described, with
agents using either the recognition heuristic with binary memory values (models marked bin) or the fluency heuristic
with continuous memory values (models marked con). All the results shown here are the average results for 10 runs
after 20,000 time steps (to allow each model to reach a more or less steady state). For detailed presentation and
discussion of these results, see Heuvelink (2004).
end p.172
3.1 Model 1: Agents on Their Own
The two models in which the agents pay no attention to the other agents in their environment do not produce
emergent environment structure; instead, the distribution of agents over locations (chosen options) remains much the
same as at the beginning of the simulation, still creating a Poisson distribution. The correlation between the number of
agents at a location and the number of agents that recognize that location is about 0.3. This is in part because agents
often do not have the opportunity to use their choice mechanism. On average, agents know about 10 out of 121
patches, because they store each patch they visit with a value of 1.0 and this value decays by 0.1 every time step.
Thus, in (1 9.9/121)
4
71 percent of their choices, agents recognize none of the four options and have to choose
between them randomly. Furthermore, all the patches end up being known by similar numbers of agents on average.
We can change this by giving the agents a longer memory (lower decay) and letting them remember more locations.
But even when they recognize 50 locations, and so can use the recognition heuristic about 90 percent of the time, still
no environment structure emerges. Why not? After all, the agents must be more likely to end up at certain patches
the ones they knowcompared to other patches. The problem here is that all the agents have their own set of options
that they recognize. This set is personal, local to each individual, and there is no mechanism here that allows for this
knowledge to spread through the population and become global. This is the situation in which everybody knows some
bars and goes to one of those again and again without paying attention to whether other people, strangers or
acquaintances, also go to those bars. To see structure emerge in the environment, so that some patches are more
visited than others, knowledge about options must spread through the population and become correlated among
individuals. For the knowledge to spread, we need some form of information sharing between agents.
3.2 Model 2: Agents Listening to Others
In the previous model, agents acting independently on the basis of their own individual experience, choosing to go to
locations that they personally recognize by having visited before, did not suffice to create emergent environment
structure. It seems more likely, and more realistic, that transmission of information between agents will enhance any
clustering of choices in the space of options (here, locations)the social computation enabled by a communicating
population of simple decision-making agents should lead to greater environmental impact (as has been found in
simulations where the interactions of many generations of simple language learners enable syntax to emerge; see
Kirby, 2001). This information transmission can be accomplished either directly, through communication in which
agents tell each other about locations that they recognize, or indirectly, through agents observing the actions of
others. We first consider the former situation before turning to models of indirect communication in later sections. To
add direct communications to our models, we must specify who can talk to whom, how often, and about what.
end p.173
In earlier models (Todd & Kirby, 2001) we found that when individuals could hear from one other agent at the same
location (i.e., an agent that has currently made the same option choice) about one location that the agent recognized,
this could foster the emergence of clustered or J-shaped distributions of agents over choices. Such environment
structure did not emerge particularly readily, though; if individuals told others about any randomly chosen location they
currently recognized, there was no effect. This was because many of the locations that a given individual recognized
were known because they had been heard about from others, who may also have heard about them from others,
which means it could have been a long time since any of the agents in this communication chain had actually
personally been to (chosen) that location. This time delay meant that the agents' recognition knowledge could be out
of step with the actual choices currently being made by others in the population (also indicated by a low
choice/recognition knowledge correlation), keeping choice clusters from appearing. When we restricted individuals to
talking only about locations they had actually been to recently, and thus recognized from personal choice, the temporal
lag in communication was reduced, and agents did indeed begin to cluster more strongly on particular locations.
In our new models, we relax and simplify the communication somewhat. Instead of only listening to other agents that
have made the same current choice (are on the same location), now individuals can hear from all the agents in the
population. And instead of announcing a location that they recognize (at all, or only recently), agents now mention just
the location they have currently chosen. Thus, on each time step, each individual hears from one randomly selected
other agent about the location that agent is currently on. When one location happens to have more visitors than
average, it will be heard about more than average, and stored in the recognition memory of a number of individuals,
influencing their later choices. What happens when this form of direct communication is used along with the recognition
and fluency heuristics?
When we look at the distribution of agents across locations in the last 100 time steps of a 20,000-time-step run, we
see (fig. 11.3) that the recognition heuristic does not create population clusters any more than the random distribution
of agents did (as shown in fig. 11.2). However, when agents can use continuous recognition memory to distinguish
between locations they have been to and possibly heard about more often or more recently, a strong J-shaped
distribution does emerge (fig. 11.4). This means that directly learning about a location where there are currently other
agents (or in other words, hearing about an option that others have currently chosen for themselves), particularly
when more popular options are more likely to be learned about, can allow agents to coordinate their knowledge and
their choices sufficiently to produce a degree of conformity and thereby shape their environment appreciably. This is
certainly what we would generally expect from observing cultural conformity in the real world, where people do talk
about their choices with each other all the time; the interesting aspect of this model is that this structure can appear
just through the use of so simple a choice mechanism that just relies on recognition knowledge.
end p.174
FIGURE 11.3 The distribution of agents among the patches averaged over the last 100 time steps of a
20,000-time-step run, with patches rank ordered from left to right by number of agents present. Agents
used the binary recognition heuristic, and no clumpy patch structure emerged.
3.3 Model 3: Agents Counting Others
Direct sharing of information between agents allows them to coordinate in such a fashion that general agreement and
conformity of choices develops at the population level. But is this direct communication a necessary component for
such environment structure to emerge? What happens when agents can only indirectly influence each other's choices?
When agents share their knowledge and behavior by simply paying attention to how many other agents have made
the same choice and strengthening their recognition memory according to this count, this proves to be enough to allow
coordination once again. As shown in figure 11.5, both the recognition and fluency heuristics lead to some locations
becoming known by all the agents in the population, which in turn creates a J-shaped distribution of agent choices and
a high choicerecognition correlation (.44 and .75, respectively).
The structure found in the environment stems from an inequality in how well the patches are known by the agents. To
understand how this inequality arises, it is important to remember how the social informationsharing rule of this
model works. When an agent goes to a certain patch, it stores this patch in its recognition memory with a value of 1.0,
plus an extra value of 1.0 for every other agent that is on that patch at the same time. When many agents are on a
certain patch, these agents all store the patch with a high value in their memory. Those agents now recognize this
patch for quite some time. Since agentsif they cango to patches they recognize, it is likely that those agents end
up returning to that patch again in the future. Agents arriving later on that patch for the first time will probably meet
more than an average number of agents there, and will also remember the patch well. Thus, as soon as a patch has
many visitors, which early on may happen accidentally, a
end p.175
FIGURE 11.4 The distribution of agents among the patches averaged over the last 100 time steps of a
20,000-time-step run, with patches rank ordered from left to right by number of agents present.
Agents used continuous recognition in the fluency heuristic, allowing a clumpy J-shaped
distribution to emerge.
self-reinforcing mechanism is kicked off that eventually can lead to the situation in which that patch is known by all
the agents.
For the memory-decay settings used for the simulations presented in figure 11.5, there are more locations known by
the entire population when continuous-valued recognition is used (with the fluency heuristic) than with binary-valued
recognition. This can arise because the fluency heuristic allows more discriminating choices between options than does
the recognition heuristicthe former allows the most recognized of all of the recognized options in a choice set to be
chosen, whereas the latter leads to random selection from those recognized options. Thus the fluency heuristic enables
agents to return to more-recognized locations preferentially and thus to build up even more recognition of those
locations. This, in turn, means more chance to return to that location again in the future, hence more agents at that
location at any point in time, which also leads any other agent that ends up visiting that location to note the increased
number of others and hence to store that location with greater memory strength so that it, too, is likely to return
there. In this way, more locations will become known by all agents more quickly than if they used binary recognition.
However, when the presence of other agents making the same choice is given greater influence (i.e., the count of
other agents increases the strength of the recognition memory activation even more), this pattern can reverse, with
fewer locations known by everyone in the continuous case than in the binary case. The reason for this reversal is that
greater influence can cause some locations to become widely known even more quickly, and once this happens, the
agents are likely to choose to go to that small set of locations exclusively (if given the choice), so that those initially
popular locations alone become more and more visited and known. In other words, the stronger feedback process
created by greater social
end p.176
FIGURE 11.5 Histograms of the number of patches (y-axis) that are known by a certain number of
agents (x-axis) after 20,000 time steps for runs with agents using the recognition heuristic (top)
and the fluency heuristic (bottom), with a weight-given-to-other-agents of 1.0 and a memory-
trace-decay-rate of 0.1.
end p.177
influence can result in rapid convergence onto a smaller set of options, effectively shutting out the competition.
3.4 Model 4: Agents Recognizing Others
In the fourth model, the simulated individuals are a bit more discriminating. They no longer pay attention to every
stranger but instead only pay attention to their acquaintances. This means that instead of counting the number of
other agents on their current patch, agents just count the number of agents they recognize that are on the patch.
Under these circumstances, different patterns arise, depending on how strongly agents attend to the presence of their
friends.
When individuals only pay little attention to the other agents they recognize, and when they forget about patches and
agents rather quickly (after 10 time steps), no environment structure emerges. In this model, every agent an
individual meets is stored or updated in the individual's memory, with the standard value of 1.0, and this memory
trace decays every time step (the same as for the location memory). Looking at that memory for agents, it can be
seen that the individuals in this model only know (recognize) about 16 other agents on average. Since at the beginning
of each run, no patches are more known than any others and all agents are equally likely to be on any of the 121
patches, the chance that an individual meets a friend (recognized agent) again before forgetting about that friend is
consequently very small. So while agents in this model do pay attention to other agents they know, since they almost
never meet again, there is effectively no influence of this agent recognition, and so no structure will emerge.
What if agents were more impressed by seeing someone they recognize? When we increase the weight of attention
paid to every other agent to 4 (rather than 1), friends will now be remembered four times as long, and any patch
where two friends meet is stored with an extra value of 4 in both their memories. Now, again, we see structure
emerge, with fewer universally known locations when continuous recognition is used, again because of the feedback
processes operating. In that fluency-use case, because there are fewer patches that are very well known and thus well
visited, agents are likelier to meet, simply because there are fewer meeting places and so the population is less
spread out. On the other hand, agents using binary recognition might select and end up at the less well known patch
from the choice set because they do not use how well they recognize a patch, which could also indicate how well a
patch is known. In the latter situation, when agents end up at the less known patch, they are also less likely to meet
many other agents there, which means they will store that location less strongly in memory and return to it with lower
probability, making that patch less likely to become widely known.
What can be concluded from these models is that the emergence of environment structure, in the form of J-shaped
distributions of agents across chosen options and universal recognition of a few options, is enhanced by a slower
memory decay rate for recognized patches, a slower memory decay rate for recognized agents, and a greater weight
or influence given to the presence of other agents on the same patch when storing patch recognition. Furthermore,
how that influence of other agents is distributed makes a difference: when much attention is paid to a small
end p.178
group of agents (e.g., only those recognized), structures emerge less easily than when less influence is spread out
across more agents, even when the total amount of influence is made equal. Furthermore, inequality in how well
patches are known is not a guarantee for structure to emerge in the environment in terms of choices made (i.e.,
distribution of agents across options). In the models using binary recognition, where agents do not go to patches they
know best, there must be a large difference in how well patches are known for choice structure to show up in the
environment. However, when there is inequality in how well patches are known when continuous fluency is used, these
knowledge differences will almost immediately influence the choice structure, because of the stronger feedback loop
enabled by the more discriminating fluency-based decisions.
4 What We Have Learned, and Where To Next
Agreement can be useful. Even when there is no independent advantage of choosing one option or course of action
over another, it can still be advantageous if most people settle on the same option. Individuals can share the
knowledge they gain about this common option with others (e.g., how to fix the latest wormhole in a Microsoft
product), allowing them to get more use out of it. Individuals can coordinate with each other for different purposes
through selecting the common option (e.g., planning a spontaneous weekend trip with friends after meeting up by
chance at the favorite hot-chocolate bar). And social cohesion can increase from the shared knowledge about the
common option (e.g., more conversations around the water cooler after everyone watched the same episode of Iron
Chef). If everyone made their own independent choiceif conformity disappearedthese social advantages would be
greatly reduced.
What we have demonstrated in this chapter is that it does not take much cognitive machinery to make decisions that
will have a conformity-producing impact on the environment. Just using recognition knowledge, whether and how often
or how recently particular options have been encountered, to distinguish and choose between available options is
enough to enable clustered choices to emergeprovided that the recognition knowledge is at least partly coordinated
between individuals. This coordination can come about either through direct communication, in which individuals tell
each other about options they recognize, or through indirect observation, in which individuals store how many others
they have seen making a particular choice. And while having a more precise memory of experienced options, in the
form of continuous rather than binary recognition, helped speed the emergence of environment structure, adding extra
information in the form of a memory for other agents (model 4) did not strengthen this effect.
Several other factors should be examined in more detail to fill out this story. First, we need to explore the impact of
environment size, in terms of the number of options available for individuals to choose among. In the models
presented here, we saw two major different types of structure emerge, one in which all options were known
(recognized) by some medium number of agents, and another in which some patches ended up being known by all
agents. Could a larger environment lead to the emergence of multiple clusters of locations that are highly known and
chosen
end p.179
by separate subsets of agents, as we see, for instance, in consumers split into different brand-loyal clans?
A related question is how stable the clustered choices are that agents make in these models. That is, will a group of
agents that have all converged on one option dissipate over time and be replaced by another group of similar size
clustered on another option? So far, we have not analyzed where the clusters are, only the degree of clustering (in
part because all clusters have had the same quality up until now), but we expect that in the situations where only a
few locations are known by nearly all of the agents, these clusters will be very stable for long periods. Patterns of
chosen-option change over time in these models need to be related to similar patterns observed among consumer
choices, for instance.
Another interesting avenue to explore is to make the nature of communication more realistic in our simulations. In the
cases we have investigated here, each individual has an equal chance of serving as a model, or communicating, to all
other individuals in the population. In reality, some models are more influential or prestigious than others (leading to a
prestige biassee Henrich & McElreath, 2003), and some objects or ideas dominate media channels. Such effects may
create biased recognition without initially biased frequencies of the objects or ideas in the population. For example,
teachers, political leaders, and celebrities can potentially spread recognition of their opinions and behaviors to many
individuals (whether for good or ill). Cavalli -Sforza and Feldman (1981) term this one-to-many or few-to-many
transmission, which they argue sometimes has important consequences for the evolution of knowledge.
One-to-many transmission processes, whatever the details of their functioning, can alter the frequencies of behaviors,
objects, and ideas by creating greater recognition for a smaller number of behaviors, objects, and ideas. Essentially,
some people or organizations drown out other potential models. A recognition-based mechanism would then use this
familiarity to make choices about what to imitate, consume, or trust. In this way, one-to-many transmission may lead
some things to come to be both more recognized and more chosen more quickly, even when the initial frequencies and
underlying qualities of the items are similar.
We would also like to make the influence of other agents more psychologically plausible, for instance by having a
decreasing impact of greater numbers of other agents, as Asch (1956) found. Even more important, we need to include
a consideration of network structure in our analyses. We have begun looking at how this environment can be
considered a bipartite network, with agents and locations forming the two sets of interconnected nodes and the
recognition of the latter by the former forming the connections between those sets of nodes. Can such an analysis help
us understand how knowledge spreads through the population in different environment structures?
We have shown here that individuals can use simple decision mechanisms based on innate recognition abilities, along
with direct or indirect sharing of knowledge, to link their behaviors in a way that strongly impacts the environment.
Just deciding where to get your next cup of hot chocolateon the basis of what bars you recognize, have heard of, or
have observed strangers and acquaintances sipping in on your last visitcan suffice to get everyone coordinated in a
world-shaping way.
end p.180
12 Simple Heuristics Meet Massive Modularity
Peter Carruthers
This chapter investigates the extent to which claims of massive modular organization of the mind (espoused by some
members of the evolutionary psychology research program) are consistent with the main elements of the simple
heuristics research program. A number of potential sources of conflict between the two programs are investigated and
defused. However, the simple heuristics program turns out to undermine one of the main arguments offered in support
of massive modularity, at least as the latter is generally understood by philosophers. So one result of the argument will
be to force us to reexamine the way the notion of modularity in cognitive science should best be characterized, if the
thesis of massive modularity isn't to be abandoned altogether. What is at stake in this discussion is whether there is a
well-motivated notion of module such that we have good reason to think that the human mind must be massively
modular in its organization. I shall be arguing (in the end) that there is.
1 Introduction: The Two Programs
Two distinct research programs in cognitive science have developed and burgeoned over the last couple of decades,
each of which is broadly evolutionary or adaptationist in orientation, and each of which is nativist in the claims it
makes about the architecture and much of the contents of the human mind. One is the evolutionary psychology
program and its associated claim of massive mental modularity (Gallistel, 1990, 2000; Pinker, 1997; Sperber, 1996;
Tooby & Cosmides, 1992). The other is the simple heuristics movement and its associated claim of an adaptive
toolbox
end p.181
of cognitive procedures (Gigerenzer et al., 1999). Each is, in addition, committed to explaining the variability of culture
in terms of the flexible application of modules/heuristics in local conditions.
My question is this: Are these competing research programs or do they complement one another? The proponents of
each of these programs don't often mention the existence of the other. Yet both are in the business of constructing
explanations that are plausible, not only in evolutionary terms but in relation to data from comparative psychology.
And it would seem that both are in the business of explaining (or trying to explain) how cognition can be realized in
processes that are computationally tractable. However, there is some reason to think that these programs offer
explanations of human cognition that are inconsistent with one another, or that otherwise undermine each other, as
we shall see.
I shall begin by briefly elaborating and elucidating the twin theses that cognition is massively modular, and that it is
constructed out of an adaptive toolbox of simple heuristics. I shall then turn to the question of consistency, arguing (in
the end) that the two research programs should best be seen as natural bedfellows and mutual supporters rather than
as competitors. But in order for this to be the case, the thesis of massive mental modularity will have to undergo a
(well-motivated) transformation.
2 Massive Modularity
Modularists claim that evolutionary thinking about the mind delivers specific predictions about the mind's architecture,
the most important of which is that the mind is massively modular in its organization. This conclusion can be reached
via a number of distinct (but mutually supporting) lines of reasoning. I shall sketch two of them here. (For more
extensive discussion, see Carruthers, 2005.)
2.1 The Argument from Biology
The first argument derives from Simon (1962) and concerns the evolution of complex functional systems quite
generally, and in biology in particular. According to this line of thought, we should expect such systems to be
constructed out of dissociable subsystems, in such a way that the whole assembly could be built up gradually, adding
subsystem to subsystem; and in such a way that the functionality of the whole should be buffered, to some extent,
from damage to the parts.
Simon (1962) uses the famous analogy of the two watchmakers to illustrate the point. One watchmaker assembles one
watch at a time, adding microcomponent to microcomponent one at a time. This makes it easy for him to forget the
proper ordering of parts, and if he is interrupted, he may have to start again from the beginning. The second
watchmaker first builds a set of subcomponents out of the given microcomponent parts, and then combines those into
larger subcomponents, until eventually the watch is complete. This helps organize and sequence the whole process,
and makes it much less vulnerable to interruption.
Consistent with such an account, there is a very great deal of evidence from across many different levels in biology to
the effect that complex functional systems
end p.182
are built up out of assemblies of subcomponents, each of which is constructed out of further subcomponents and has a
distinctive role to play in the functioning of the whole, and many of which can be damaged or lost while leaving the
functionality of the remainder at least partially intact. This is true for the operations of cells, of cellular assemblies, of
whole organs, and of multiorganism units like a bee colony (Seeley, 1995; West-Eberhard, 2003). And, by extension,
we should expect it to be true of cognition also.
The prediction of this line of reasoning, then, is that cognition will be structured out of dissociable systems, each of
which has a distinctive function, or set of functions, to perform. (We should expect many cognitive systems to have a
set of functions, rather than a unique function, since multifunctionality is rife in the biological world. Once a component
has been selected, it can be coopted, and partly maintained and shaped, in the service of other tasks.) This gives us a
notion of a cognitive module that is pretty close to the everyday sense in which one can talk about a hi-fi system as
modular, provided that the tape deck can be purchased, and can function, independently of the CD player, and so
forth. Roughly, a module is just a dissociable component.
Consistent with the foregoing prediction, there is now a great deal of evidence of a neuropsychological sort that
something like massive modularity (in the everyday sense of module) is indeed true of the human mind. People can
have their language system damaged while leaving much of the remainder of cognition intact (aphasia); people can
lack the ability to reason about mental states while still being capable of much else (autism); people can lose their
capacity to recognize just human faces; someone can lose the capacity to reason about cheating in a social exchange
while retaining otherwise parallel capacities to reason about risks and dangers; and so on and so forth (Sachs, 1985;
Shallice, 1988; Stone et al., 2002; Tager-Flusberg, 1999; Varley, 2002).
2.2 The Argument from Computational Tractability
The second line of reasoning supporting massive modularity begins from the assumption that mental processes must
be realized computationally in the brain.

1

1. This assumption is common to all of classical cognitive science. It has been challenged more recently by proponents of distributed
connectionism. But whatever the successes of connectionist networks in respect of pattern recognition, there are principled reasons for
thinking that such networks are incapable of the kind of one-shot learning and updating of variables that humans and other animals are
manifestly capable of. See Gallistel, 2000; Marcus, 2001.
And the argument, deriving from Fodor (1983, 2000), is that computational processes need to be encapsulated if they
are to be tractable.
The argument goes something like this.

2

2. I don't mean to endorse this argument exactly as stated. Some of its assumptions will get unpacked and challenged, and the
argument will get rebuilt, as the discussion proceeds.
If a processing system can look at any arbitrary item of information in the course of its processing, then the algorithms
on which that system runs will have to be arbitrarily complex also. For those algorithms
end p.183
will have to specify, in respect of each item of information that the system could access, what step should be taken
nextpresumably different for each such item of information, if the system is to be a context-sensitive one. So the
more items of information a program can look at while processing, the more complex its algorithms will need to be.
So, conversely, if a system's algorithms are to be computationally tractable, limits will need to be placed on the set of
information items it can look at.
Consistent with Fodor's argument, what more than a quarter century of research in artificial intelligence appears to
have taught us is that computational processes need to be divided up among a suite of modular subsystems if they are
to be tractable (Bryson, 2000; McDermott, 2001). Note that this line of argument doesn't start from commitment to
some particular agent architecture (e.g., Brooks's 1986 subsumption architecture) and say Hey, this system is
modular; therefore cognition is modular. Rather, the argument is a metainduction across recent trends in artificial
intelligence (AI). The claim is something like this: over the last half-dozen years, virtually everyone in AI has
converged on modular architectures of one sort or another. This has been forced on them by the experience of trying
to design systems that actually work. So this gives us good reason to think that any real agent must have a modular
internal organization.
Now it may well be, as we shall see, that the notion of module at work in AI isn't quite the same as Fodor's. But the
main premise in the foregoing metainduction can be challenged more directly. For there are some agent-architectures
on the market that are avowedly amodular in character, such as Newell's (1990) SOAR architecture. However, it can
be claimed that there is a confusion over terminology underlying these avowals. Although everyone can agree that a
system is a module only if it is domain specific (hence that an architecture is amodular if the systems within it aren't
domain specific), different research traditions mean different things by domain. So when someone coming from one
tradition says that their architecture is an amodular one, it might actually be modular in the sense deployed within the
other tradition. Let me elaborate.
Developmental and cognitive psychologists tend to think of domains in terms of kinds of content, or kinds of subject
matter. When they say that development is a domain-specific process, what they mean is that it proceeds at a
different pace and follows a different trajectory in the different areas of cognitive competence that adult humans
display (folk psychology, folk physics, folk biology, and so on). Call this a content-domain. Evolutionary psychologists
and massive modularity theorists, in contrast, think of domains as characterized by a function. In this sense, the
domain of a module is what it is supposed to do within the overall architecture of the cognitive system. Call this a
task-domain. The confusion arises quite naturally, and may easily pass unnoticed, because many of the task-specific
modules postulated by evolutionary psychology are also content specific in nature. (The folk psychology module is
targeted on mental states; the folk physics module is about physical movements of inanimate objects; the cheater-
detection module is about costs and benefits in exchange; and so on.) But there is nothing in the notion of a module
per se to require this, from an evolutionary-psychology perspective.
end p.184
When someone coming out of the cognitive psychology tradition says I've built a system that is a-modular in its
architecture, what that person probably means is I've built a system that doesn't operate on any specific type of
subject matter. And it is true that Newell's SOAR, for example, which is designed for decision-making, can acquire the
ability to make decisions concerning many different kinds of subject matter. But it may still be a modular system from
the perspective of the evolutionary psychologist (it may consist of isolable systems with specific functions whose
internal operations are encapsulated). You have to actually look and see. And when you do look at SOAR, it does seem
to be modularly organized (despite advertising itself as amodular). For different goals and subgoals come with
frames of relevant information attached. When figuring out what to do in pursuit of a goal, the program is only
allowed to look at what is in the relevant frame. So its operations would seem to be encapsulated.

3

3. Quite what sense of encapsulated is involved here will loom large in later sections, especially section 8.
It should be noted that the information contained in a given frame can change with time, however. This requires us
to distinguish between weakly modal and strongly modal construals of encapsulation. In the strong sense, to say that
a given system is encapsulated from all but the information in its proprietary database is to say that it cannot access
any other information at any time during its existence, no matter what else happens. This is one sense in which
SOAR's frames aren't encapsulated, since they can and do alter with time. In a weaker sense, however, we can say
that a system is encapsulated provided it can only access whatever information is contained it its proprietary database
at that time.
There seems to be no good reason to insist on the strongly modal construal of modularity. For the weaker construal
gives us all that we need from modularity, which is that the system's operations should be computationally tractable.
And think of the language faculty, for example, which Fodor (1983) considers to be one of the archetypal modules. The
processing database for this system surely isn't frozen for all time. New irregular verbs can always be learned, for
instance, and these would surely then be counted as belonging to the system's processing database.
Putting together the foregoing two lines of reasoning, then (from biology and from computational tractability), what we
get is the prediction that the human mind should consist of a whole host of functional and multifunctional systems and
subsystems, which are to some degree dissociable from one another, and whose internal computational processes are
encapsulated from most of the information held elsewhere in the mind at that time. This is the thesis of massive
mental modularity, as generally conceived.

4

4. There are, of course, many objections to the thesis of massive modularity, too. Most of them have to do with the apparent holism of
human central cognitive processes of inference and belief-formation (Fodor, 1983, 2000), and with the distinctive flexibility and
creativity of the human mind. These facts make it hard to see how the mind can consist entirely (or even largely) of modules. This is
not the place to pursue and reply to these difficulties. See Carruthers, 2002a, 2002b, 2002c, 2003b, 2004.
end p.185
3 Simple Heuristics
Whereas evolutionary psychology starts from reflection upon biological systems generally, and proposes a research
program for uncovering the elements of a modular mind, the initial focus of the simple heuristics movement is more
limited. It starts from consideration of the rationality or irrationality of human reasoning.
Psychologists have been claiming since the 1970s that humans are extremely bad at many kinds of reasoning. For
example, numerous studies involving the Wason conditional selection task suggest that people are quite poor at
discerning under what circumstances a simple conditional statement would be true or false (Wason & Evans, 1975;
Evans & Over, 1996). And human reasoners commit frequent fallacies, especially when reasoning about probability,
where they commit the conjunction fallacy, base-rate neglect, and so on (Kahneman et al., 1982). But it is evident
that for us to move beyond these factual claims to make a judgment about human irrationality may well require us to
make some assumptions about the nature of rationality.
In fact, the question What is rationality? is the same as the question How should we reason? Philosophers and
psychologists alike have traditionally assumed that we should reason validly, where possible; and in reliable ways more
generally (e.g., in domains of nondemonstrative inference, such as science). But in fact truth cannot be our only goal.
We also need enough truths in a short enough time-frame to enable decision-making and action. Moreover, reasoning
and decision-making have to be realized in computationally tractable processes, if they are to be computationally
realized at all.
For example, it has traditionally been assumed that any candidate new belief should be checked for consistency with
existing beliefs before being accepted. But, in fact, consistency-checking is demonstrably intractable, if attempted on
an exhaustive basis. Consider how one might check the consistency of a set of beliefs via a truth-table. Even if each
line could be checked in the time that it takes a photon of light to travel the diameter of a proton, then even after 20
billion years the truth-table for a set of just 138 beliefs (2
138
lines) still would not be completed (Cherniak, 1986).
There is a whole field of computer science devoted to the study of such problems, called complexity theory. But it is
important to realize that computational intractability, for the purposes of cognitive science, can include problems that
wouldn't be characterized as intractable by computer scientists. This is because our goal as cognitive scientists is to
explain processes that happen in real time (in seconds or minutes rather than millennia), and because we need to
operate with assumptions about the speed of processing of brains (significantly slower than modern computers, albeit
with much of that processing being conducted in parallel), as well as making assumptions about memory power. In
effect, this means that the idea of computational intractability, for the purposes of cognitive science, doesn't admit of a
formal definition. But that is just as it should be, since we are dealing here with an empirical science.
This line of thinking leads to the idea of naturalized rationality. We need reasoning processes that are reliable enough,
but also fast and frugal enough (in terms
end p.186
of the temporal and computational resources required), given the demands of a normal human life. And, of course,
what counts as fast or frugal isn't something that can be specified by philosophers a priori. Rather, these things will
depend on the properties of the computational systems employed by mammalian brains generally, and by the human
brain in particular; and on the task demands our ancestors regularly faced.
This is the background against which the simple heuristics research program has been developed. The goal is to find
computational procedures that are fast and frugal but are reliable enough, in a given environment, to be worthwhile
having. For example, one heuristic explored by Gigerenzer and colleagues (1999) is recognitionif asked which of two
German cities is the larger, the heuristic tells you to select the one that you recognize, if you recognize only one. This
heuristic proves remarkably successful, even when pitted against much fancier (and much more computationally and
informationally demanding) choice procedures like Bayes' rule, multiple regression, and so on; and it proves successful
across a wide range of decision types (including the selection of companies that are most likely to do well in the stock
market).
Note that there is one important point of similarity between the simple heuristics movement and the evolutionary
psychology program, then. This is that each lays a similar emphasis on computational tractability among cognitive
mechanisms. But each then appears to follow a different strategy in pursuit of computationally tractable processes. One
postulates a set of simple heuristics; the other postulates a set of encapsulated modules. These seem like distinct
perhaps inconsistentapproaches to the same problem. I will pursue these issues in the sections that follow.
4 An Inconsistent Pair?
Are the massive modularity and simple heuristics research programs inconsistent, then? At the very least, it would
seem that each can incorporate elements from the other without inconsistency, and perhaps to their mutual benefit.
Thus a massive modularist might believe that some of the processes that occur internally within a module are heuristic
in character. For example, rather than searching exhaustively through all the information in its proprietary database, a
module might adopt the satisficing heuristic of stopping search when it has found an item of information that is good
enough for use in its current task. Likewise, a modularist might accept that simple heuristics play a role in
orchestrating the interactions among modules and their influence on behavior. Similarly, believers in simple heuristics
could surely accept that at least some of the processes that issue in belief or that lead to a decision are modular in
character.
Moreover, massive modularity theorists emphasize that part of the point of a modular architecture is that different
modules can be structured in such a way as to embody information about the content-domains that they concern and
can deploy algorithms that are tailored to a specific set of task demands. A similar idea seems to be at work within the
simple heuristics framework, in the notion of ecological rationality. The idea is that, in connection with any given
heuristic, there will be a range of different environments and environment types within which that heuristic
end p.187
will operate with a significant degree of reliability. And we can think of the heuristic as having been selected (by
evolution, by individual learning, or through the success of a particular culture) to operate in those environments,
thereby (in a sense) embodying information about them.
Nevertheless, an impression of inconsistency between the two research programs might remain. For it might appear
that they offer competing models of the overall innate architecture of the mind. Much of what goes on within the
simple heuristics program is an attempt to model various aspects of decision-making; and many people assume that
the decision-making system has to be an amodular one. (It certainly can't be domain specific in the content-specific
sense of domain, anyway.) Moreover, many of the heuristics discussed by those working within the simple heuristics
program would seem to apply in widely diverse and evolutionarily distinct domains; and some of them might be
learned, too. By contrast, the hypothesis of massive modularity is generally thought to suppose that the mind consists
of a set of evolved modular systems, targeted on domains of special evolutionary significance.
Practical reasoning can actually be thought of as modular, however, in the relevant sense of module. For recall that
modularity is about encapsulation, and not necessarily about domain (in the sense of content) specificity. A practical
reasoning module would be a system that could take any belief or desire as input, but which was nevertheless
encapsulated in respect of its processing of that input. As sketched in Carruthers (2002a), such a system might be set
up (in animals, if not in us) to take whatever is the currently strongest desire, for P, as initial input, and then to query
other modular belief-generating systems and initiate a memory search for beliefs of the form Q P. When a
conditional of this form is received as input, it checks Q against a database of action-schemata to see if it is something
doable; if so, it goes ahead and does it; if not, it initiates a further search for beliefs of the form R Q. And so on.
Perhaps it also has a simple stopping rule: if you have to go more than n conditionals deep, or if more than time t has
elapsed without success, stop and move on to the next strongest desire. This looks like it would be an encapsulated
system, all right; but not a content-specific one.
One can easily imagine, then, that the operations of such a system might be supplemented by a set of heuristics, such
as: if you want something, first approach it. Or, for another example, much of the literature on navigation suggests
that children and other animals operate with a nested set of heuristics when disoriented (Shusterman & Spelke, 2005).
The sequence appears to be something like this: if you don't know where you are, seek a directional beacon (e.g., a
distant landmark or the position of the sun); if there is no directional beacon, seek to match the geometric properties
of the environment; if geometric information is of no help, seek a local landmark. Likewise, it is plausible that the
practical reasoning system might employ a variety of heuristics for ending search (e.g., when deciding whom to marry;
Gigerenzer et al., 1999). And so on. None of this seems inconsistent with the claim that the practical reasoning system
is modular.
As for the fact that heuristics seem to apply across what are evolutionary distinct domains, recall the metaphor of an
adaptive toolbox, which appears central to the way of thinking about the mind adopted by the simple heuristics
program. One way
end p.188
to spell this out would be to propose the existence of heuristic procedures that can be multiply instantiated within a
range of distinct modular systems. (So on this account, a given heuristic is a type of processing rule, which can be
instantiated many times over within different systems in the brain, rather than a processing system in its own right.)
For there is no reason to think that each module has to deploy a unique set of algorithms for processing items in its
domain. There might be a range of algorithm types/heuristic strategies that have been adopted again and again by
different modular systems.

5

5. Marcus (2004) explains how evolution often operates by splicing and copying, followed by adaptation. First, the genes that result in
a given microstructure (a particular bank of neurons, say, with a given set of processing properties) is copied, yielding two or more
instances of such structures. Then, second, some of the copies can be adapted to novel tasks. Sometimes this will involve tweaking the
processing algorithm that is implemented in one or more of the copies. But often it will just involve provision of novel input and/or
output connections for the new system.
Nor is there any reason to think that modular systems should use maximizing or exhaustive algorithms. On the
contrary, pressures of speed and time should lead to the evolution of quick and dirty intramodular decision-rules,
just as they lead to such rules for cognition as a whole.
Equally (and in addition), massive modularity certainly isn't inconsistent with learning. Many modules are best
characterized as learning modules, indeed. And some modules are likely to be built by other learning mechanisms,
rather than being innate. (The example of reading comes to mind.) Moreover, while some of these mechanisms might
be targeted on just a single content-domain, some might involve the interactions of a variety of different modules and
modular learning mechanisms (hence giving the appearance of domain generality). And then it may well be that there
exists a suite of heuristic operating principles that can be selected from among some sort of preexisting toolbox for
implementation in one of these modules, if appropriate. The learning process would partly involve the selection of the
appropriate tool from the toolbox.
5 Is the Argument from Computational Tractability Undermined?
We have seen that the massive modularity hypothesis seems to be fully consistent with the claims made by the simple
heuristics program. It appears, nevertheless, that the successes of this program must undermine one of the main
arguments in support of massive modularityspecifically, the argument from computational tractability. For it appears
that heuristic-based computational mechanisms offer a way for computations to be rendered tractable without the need
for informational encapsulation. If so, then cognitive processes can be computationally tractable (because heuristic
based) without being structured along modular lines, and it will turn out that the argument Cognition must be modular
in order that it should realized in a computationally tractable form collapses.
In order to evaluate this objection, we will need to undertake a closer examination of the notion of encapsulation. But I
propose to approach this initially by
end p.189
going back one step further: to the considerations of computational tractability that supposedly give rise to the demand
that cognition should be constructed out of encapsulated systems.
We can distinguish two different ways a system might fail to be computationally tractable. One is that its algorithms
might require it to consult too much information to reach a solution in real time. For example, consider a very simple
consistency-checking device. Given a candidate new belief, the system crawls through the total set of the subject's
beliefs, looking for an explicit contradiction. Although the algorithm being executed here might be an extremely simple
one (essentially, it just looks for any pair of beliefs of the form P, P), in attempting to take every belief as input
(whether sequentially or in parallel), it almost certainly wouldn't be feasible for mind-brains like ours. Call the
corresponding demand on computationally tractable systems information-frugality. We can say that cognition needs to
be realized in information-frugal systems if it is to be tractable.
The other way a system might fail to be computationally tractable is if the algorithms that it runs are too complex to
be feasibly executed in real time. Consider, for example, a consistency-checker that operates using the following crude
heuristic, which only requires it to consider relatively small sets of beliefs. For any candidate new belief, it randomly
selects a smallish set of a hundred or so other beliefs and generates a truth-table for the total set of propositions,
checking each line to see if there is a complete set of F'ss on any line. It is easy to see that the amount of time and
working memory that this system would need in order to complete its task would go up exponentially with the size of
its input-set. As we noted earlier, even if it checks each line of the truth-table in the time that it takes light to travel
the diameter of a proton, it would take the system longer than the time that has now elapsed since the beginning of
the universe to check the consistency of just 138 beliefsand note that this doesn't include the time needed to
generate the truth-table in the first place! Call the corresponding demand on computationally tractable systems
processing-frugality. We can say that cognition needs to be realized in processing-frugal systems if it is to be
tractable.

6

6. It should be stressed that the notions of too much information, and of processing that is too complex, as deployed here, remain
highly indeterminate. Commonsense reflection on the circumstances of human life can get us some sense of the sorts of time-scales
within which cognitive systems have to perform their tasks, of course. But most of the other parameters that would need to be taken
account of are lacking. We don't know much about the speed of processing of brain-systems, when described at a cognitive as opposed
to a neurological level. Nor do we know very much about the memory capacity of the various systems that might be involved. So any
judgment that we make to the effect that a given system is or isn't sufficiently frugal will have to be tentative, and hostage to future
discoveries in cognitive science.
The argument from computational tractability, then, leads us to think that cognition should consist of systems that are
both information-frugal and processing-frugal. Now one way of making a system information-frugal would be to deny it
access to any stored information at all. This gives us the archetype of an input-module, as explored by Fodor (1983).
This would be a system that can receive and process sensorily transduced information but can't access any of the
stored information held elsewhere in the mind. But, of course, this can't be a general model of
end p.190
what a module should look like, if we are seeking to extend the notion of modularity to central systems that operate
on beliefs (and desires) as input. Once we shift away from considering input-modules to putative central-modules, then
we can no longer think of encapsulation as a matter of isolating the system from stored information. For central
modules will often need to operate on stored information when processing their input.
The natural way forward, at this point, involves distinguishing between the input to a module and the processing
database of a module (Carruthers, 2003b; Sperber, 2002). A non-content-specific central module would be a system
that could take any stored information as input but would be encapsulated in respect of its processingeither it can
access no stored information in executing its algorithms (in which case the system is wholly encapsulated) or it can
only access a limited database of information that is relevant to the execution of those algorithms (in which case the
system is encapsulated to a degree inversely proportional to the size of the database).
With this rough suggestion on the table, the issue comes down to this. If computational tractability (hence frugality)
requires informational encapsulation, then for each computational system and subsystem (1) we must be able to
identify its input, and distinguish this from its processing database (if any), and (2) its processing database must be a
small subset of the total amount of information available. If the simple heuristics program suggests a way one can
have frugality in the absence of either (1) or (2), in contrast, then the argument from computational tractability to
massive modularity would seem to be undermined.
6 Heuristics and Processing Databases
In order to see whether or not the simple heuristics program undermines the argument for processing encapsulation,
then, we need to examine whether particular applications of that research programsuch as the recognition heuristic,
Take the Best, Take the Last, and so oncan support a suitable division between a system's input and its processing
database.
The recognition heuristic operates somewhat as follows. When required to decide which of two items scores higher
along some dimension (e.g., which of two German cities is the larger), if you only recognize one of the two items, then
select that one. (If both items are recognized, then some other heuristic must be employed.) For my purposes here,
the important point to notice is that the recognition heuristic is fully encapsulated in its operation. No other information
in the mind either does or can influence the outcome, except perhaps information that is somehow implicated in the
recognition process itself. Once the system has received a judgment-task to process, it just has to look to determine
which of the objects presented to it evokes recognition.

7

7. Hence the processing database for the system would consist in the set of concepts possessed, together with any information required
for object recognition. This is likely to be a small subset of the total information contained within a mind.
No other information needs to be consulted
end p.191
(nor can it be, indeed, or at least not internally to the operation of recognition heuristic itself ), and the inferential
procedure involved is a very simple one. So it would appear that instantiations of the recognition heuristic deserve to
be counted as modules in the traditional sense.
Now consider the Take the Best heuristic. Unlike the recognition heuristic, this heuristic does require the system to
search for and consult some further information concerning the items in question. But it doesn't look at all the
information concerning those items. Specifically, it searches for the item of information concerning the two target
items that has most often been found in the past to discriminate between items of that type along the required
dimension. Gigerenzer and colleagues (1999) have shown that this heuristic can perform almost as well as a bunch of
fancier processing algorithms, but it can do so while being much more frugal in the information it uses and the
demands it places on the computational resources of the system.
In this case, it isn't easy to see how the distinction between input and processing database should be drawn. One
might try saying that the relevant subset of total information that a system instantiating Take the Best can consult
during processing consists of the system's beliefs about relative cue validity together with its further beliefs concerning
the cues in question. When it gets a query about the relative size of two German cities, for example, it must look first
at its beliefs about which properties of cities have correlated best with size in the past. If having a top-division soccer
team was the best predictor, then it will query the wider database: does either of these teams have a top-division
soccer team? If it receives back the information that just one of them does, it selects that one as the larger. If neither
or both do, it moves on to the next best predictor of size listed in its processing database. And so on.
Note, however, that which beliefs such a system can consult in the course of its processing is a function of what its
beliefs actually are. If the system believes that having a high crime rate is the best predictor of city size, then that is
the information it would seek out. And, in principle, any belief could have had an impact on processing. So it seems
that our best hope of finding a place for the notion of encapsulation, here, would be to adopt an idea from our
discussion of the SOAR planning architecture. We could regard the specific beliefs that the system instantiating Take
the Best happens to acquire as carving out a functionally individuated processing database from the wider set of stored
information in relation to each dimension of comparison, such that the system can only consider that narrower set in
answer to a given question. But it has to be admitted that this looks pretty forced and unnatural.
Now consider heuristic processes that rely on such phenomena as the salience of a piece of information in a context, or
the accessibility of that information, given the recent history of its activation. (The latter is closely related to the
heuristic that Gigerenzer et al., 1999, call Take the Last.) Consider language comprehension, for example, on the
sort of model provided by Sperber and Wilson (1996), in which accessibility of beliefs plays a major role. On their
account, one of the factors in interpretation is saliency in the present environment, and another is relative recency
(e.g., whether or not an item of information has been activated earlier in the conversation).
end p.192
Might the comprehension process nevertheless count as an encapsulated one, although in principle any belief might be
made salient by the present environment, or might have been activated previously? If so, we shall have to think of the
comprehension process, as it unfolds in the course of a set of linguistic exchanges, as creating a sort of local
comprehension module on the fly, whose encapsulation-conditions are continually modified as the conversation
continues. But what becomes of the idea that there is some subset of the total information available that the
comprehension system can look at, if any item of information could have been salient?
It might be replied, however, that we are dealing here with a briefly existing encapsulated system, created out of the
resources of a longer lasting comprehension system by facts about the recent environment. Given the previous history
of the conversation, then some items of information are much more accessible than others. So a search process that
operates on principles of accessibility can only look at that information, and other information in the mind can't
influence the comprehension process. Granted, if the earlier facts about the conversation had been different, then
other information could have had an influence on the comprehension of the sentence in question. But this doesn't alter
the fact that, the previous history of the conversation having been what it was, that information cannot now have an
influence.
Although there is a sense in which this reply works, the victory is a Pyrrhic one. For the resulting notion of modularity
is highly problematic. Cognitive science, like any other science, is in the business, inter alia, of discovering and
studying the properties of the set of natural kinds within its domain. And a natural kind, in order to be a worthwhile
object of study, must have a certain sort of stability, or regular recurrence. In contrast, the state of a comprehension
system that has undergone a specific conversational history, hence that has a particular distribution of degrees of
accessibility among its representations, is something that might exist just once in the history the universe. That
particular combination of processing principles and accessibility (yielding the processing database of an on-the-fly
module) might never recur again.
If cognitive science is to attain the sort of generality one expects of a science, it needs to carve its natural kinds at
recurring joints. This requires us to think of the comprehension system as a single system over time, operating partly
on principles of accessibility that help to make its operations information-frugal. Likewise, even in the case of SOAR (to
return to an example discussed earlier; similar things could be said about Take the Best, discussed more recently): we
should probably think of this as being the same system that is employed in pursuit of a variety of different goals, in
which information-frugality is ensured by organizing the system's database into frames linked to each type of goal.
We shouldn't think of it as a whole set of overlapping encapsulated systems (one for each processing-system-and-
frame pair) that share a common set of algorithms.
7 Input Information versus Processing Database
Some examples of processing drawn from the simple heuristics program appear to put severe pressure on the notion
of an encapsulated system, then, where the latter
end p.193
is explicated in terms of an intuitive distinction between input and processing database. It is worth asking directly how
this distinction is to be explicated in turn, however. The foregoing arguments attempt to use the distinction between
input and processing database, without saying what that distinction amounts to. But how is this distinction to be
drawn?

8

8. Note that the distinction is only problematic in respect of central modules, whose input can include beliefs or other stored
propositional states. Where the module in question is an input-module, the distinction is straightforward: the input to the system is the
information that reaches it from the sensory transducers, and any other information that gets used in processing can be counted as
belonging to the processing database.
One way to do it would be to think of the input to a system as whatever turns the system on. But this doesn't seem a
very plausible proposal. Surely we would want to allow that, once turned on by something (e.g., by one desire
winning out over others in the competition to control the resources of the practical reasoning system), a system might
query other systems for information, without all such information thereby being counted as belonging to the processing
database of the system in question. As currently practiced, AI is full of networks of distinct systems that can query
each other for information after they have been turned on. But if we were to adopt the foregoing proposal, then we
would have to say that there was a sense in which they were all really one big system, since information produced by
each one would form part of the processing database of each of the others.
Another way we might go would be to say that the processing database of a system, to count as such, must be a
dedicated database, whose contents aren't available to other systems. This fits quite well with the way people think
about the language module. One might regard the processing database for the language faculty as a set of acquired
language-specific items of informationconcerning grammatical and phonological rules, for examplethat isn't
available to any other system to make use of. It doesn't seem well motivated, however, to insist that memory systems
and processing systems should line up one-for-one in the way this suggestion postulates. For modularity is a thesis
about processing, not a thesis about storage. It is quite unclear why there shouldn't be multipurpose information
storage systems that can be accessed by any number of processing systems in the course of their processing. Nor is it
clear why the modular status of those processing systems would have to be compromised as a result.
Another alternative is to think of the processing database of a system as the body of information it must consult in
order to execute its algorithms. The input to the system (if the system isn't a content-specific one) could in principle
come from anywhere. But once the system is turned on, it would be required to start consulting some part of its
processing database in order to handle its input. (The system needn't attempt to consult all of the information in its
processing datbase, of course. This is one of those places where it is helpful to imagine various heuristics and search-
rules operating within a given module.)
This proposal seems to fit quite neatly with the ways in people tend to think about a language module, or a theory of
mind (ToM) module, for example. When
end p.194
linguistic input is received, the language module must start consulting its database of grammatical and phonological
rules, its lexicon, and so forth. In the course of processing, it might also send out requests to other systems for
information concerning the speaker's likely interests or knowledge, for example; and the replies to these queries can
be thought of as further inputs to the system. Likewise, when a description of an item of behavior is received by the
ToM system, it must start consulting its database of acquired information (e.g., concerning the conventional
significance of a handshake in the surrounding culture, or concerning the mental states previously attributed to the
actor). And in the course of its processing, it, too, might send out requests to other systems for information
concerning the likely physical consequences of the subject's observed behavior, for instance.
There are good reasons why one can't explain encapsulation in terms of the information that the system must consult,
however. Consider the practical reason module sketched earlier. Once it has been turned on by a desire, its algorithms
require it to seek information of a certain conditional form. But these conditional beliefs can, in principle, come from
anywhere and be about anything. So if we said that the processing database for a system is the set of beliefs that it is
required to consult, then almost all beliefs might belong to this database, and practical reason would turn out to be
radically unencapsulated and amodular after all.

9

9. Am I begging the question by assuming here that practical reason is modular? No. For what is in question is whether there is a
notion of module encapsulation that can be put to the service of a massive modularity thesis; and the latter must surely maintain
that practical reason is modular. Moreover, the practical reason system, as initially sketched here, did seem like it might be
computationally tractable. So if tractability is supposed to require encapsulation, then, again, we need a notion of encapsulation that
can fit the operations of such a system.
A final option is to make use of the distinction between conducting an information search oneself and sending out a
query for information from elsewhere. We could say that the processing database for a module is the stored
information that it (the module) searches, rather than the search being devolved to other systems. But now, focusing
on that aspect of practical reason's requirement for conditional beliefs that involves a search among stored conditional
information (as opposed to requests to other systems to see if they can generate such information in the
circumstances)why shouldn't this be conducted by the practical reason system itself? If memory is content-
addressable, or addressable by syntactic form, one might be able to search under the description conditional with P as
a consequent. And is there any reason to think that conducting such a search would render practical reason
intractable? (Or any more intractable than if there were some other system to which this search was devolved?)
It appears that the distinction between input and processing database can't do the work required of it in the context of
a thesis of massive mental modularity, thenat least, not if we want to allow for modular systems that can take
unrestricted input, can query other systems for information, can conduct searches for information themselves at
various points during their processing, and so forth. At this point, the notion of encapsulation, and with it the notion of
modularity,
end p.195
appears to be under severe pressure, in the context of a thesis of massive mental modularity.
8 Whither Modularity?
The simple heuristics program places considerable pressure on the claim that cognition must be constructed out of
encapsulated systems, then, if an encapsulated system is one that might be capable of receiving anything as input but
can only access a limited database of information in the course of its processing. But is this how we must understand
the notion of encapsulation? Are there any alternatives open to us?
Put as neutrally as possible, we can say that the idea of an encapsulated system is the idea of a system whose
operations can't be affected by most or all of the information held elsewhere in the mind. But there is a scope
ambiguity here. We can have the modal operator take narrow scope with respect to the quantifier, or we can have it
take wide scope. In its narrow-scope form, an encapsulated system would be this: concerning most of the information
held in the mind, the system in question can't be affected by that information in the course of its processing. Call this
narrow-scope encapsulation. In its wide-scope form, on the other hand, an encapsulated system would be this: the
system is such that it can't be affected by most of the information held in the mind in the course of its processing. Call
this wide-scope encapsulation.
Narrow-scope encapsulation is the one that is taken for granted in the philosophical literature on modularity, following
Fodor (1983).

10

10. The use of the term module in the AI literature is probably rather different, however (Joanna Bryson, Jack Copeland, John Horty,
Aaron Sloman, personal communications). It may be closer to a combination of the everyday sense of module, meaning functionally
individuated processing component, together with a requirement of what I am here calling wide-scope encapsulation. If so, then the
argument for massive modularity from recent trends in AI, sketched in section 2, can still hold up, given the intended sense of
modularity.
Most of us naturally picture modularity in terms of there being some determinate body of information that can't
penetrate the module. And this way of looking at the matter is only reinforced if we explicate encapsulation in terms of
the distinction between the input to a system and its processing databasefor here there is supposed to be some
determinate body of information (the information in the processing database) that can affect the operations of the
module; implying that all other information can't (except by being taken as input).
However, even without some determinate subdivision between the information that can affect the system and the
information that can't, a system can be set up in such a way that its operations can't be affected by most of the
information in the mind. For the system's algorithms can be so set up that only a limited amount of information is ever
consulted before the task is completed or aborted. Put it this way: a module can be a system that must only consider
a small subset of the information available. Whether it does this via encapsulation as traditionally understood (the
end p.196
narrow-scope variety) or via frugal search heuristics and stopping rules (wide-scope encapsulation) is inessential. The
important thing is to be both information-frugal and processing-frugal.
In the end, then, the following argument is a failure: if cognitive processes are to be tractably realized, then the mind
must be constructed out of networks of processing systems that are encapsulated in the narrow-scope sense. So the
argument for massive modularity, as philosophers traditionally conceive of it, fails, too. But we still have the argument
that computational tractability requires wide-scope encapsulationwe still have the argument that if cognitive
processes are to be tractably realized, then the mind must be constructed out of systems whose operations are both
information-frugal and processing-frugal; and this means that those systems must only access a small subset of the
total available information while executing their tasks.
Does this mean that the thesis of massive mental modularity is insufficiently supported, and should be rejected? That,
of course, depends on what we continue to mean by a module. We still have in play the argument from biology that
we should expect cognition to be built out of separable task-specific systems (this is the everyday meaning of
module). And we still have the argument from computational tractability that these systems need to be both
information-frugal and processing-frugal. This requires that those systems should be wide-scope encapsulated. (They
need to be systems that can't access more than a small subset of the total information available before completing
their tasks.) And it is open to us to say that this is how the thesis of massive modularity should properly be
understood.
Moreover, we still have in play the metainductive argument from recent trends in AI. Researchers charged with trying
to build intelligent systems have increasingly converged on architectures in which the processing within the total
system is divided up among a much wider set of task-specific processing systems, which can query one another, and
can provide input to each other, and can often access shared databases. But many of these systems will deploy
processing algorithms that aren't shared by the others. And most of them won't know or care about what is going on
within the others. The fact of such convergence is then good evidence that this is how the mind, too, will be
organized.

11

11. Indeed, the convergence is actually wider still, embracing computer science more generally. Although the language of modularity
isn't so often used by computer scientists, the same concept arguably gets deployed under the heading of object-oriented programs.
Many programming languages, like C++ and Java, now require a total processing system to treat some of their parts as objects that
can be queried or informed, while the processing that takes place within those objects isn't accessible elsewhere. And the resulting
architecture is regarded as well-nigh inevitable whenever a certain threshold in the overall degree of complexity of the system gets
passed.
The term module has been used in many different ways within the cognitive science literature, of course, from Fodor
(1983) onward. One reaction to this mess of different usages would be to urge that the term should be dropped, and
that people should describe in other words what it is that they believe. But it is actually quite handy to have a single
term to express what one means. And provided that one is
end p.197
explicit about how that term is being used, no confusion should result. I propose, then, that by module we should
mean something along the lines of a distinct task-specific processing system whose operations are both information-
frugal and processing-frugal (and hence is wide-scope encapsulated).

12

12. For consideration of a wider set of arguments in support of massive modularity, and a resulting notion of module that isn't quite
the same as the one outlined here (incorporating the idea of inaccessibility), see Carruthers, 2005.
And the thesis of massive modularity then becomes the claim that cognition must be built up out of such systems.
Thus understood, the thesis of massive mental modularity is both well supported and fully consistent with the insights
of the simple heuristics research program.
What really matters in the end, of course, isn't what the systems in question get called but rather what we can claim
to know about the architecture of the human mind. There are a range of different arguments (not all of which could be
surveyed in this chapterfor further examples see Carruthers, 2005), together with a set of progressive research
programs in cognitive science and AI, all of which suggest that the mind is, indeed, composed of a multitude of distinct
processing systems. These systems will talk to one another and query one another but will, to a significant degree,
operate independently of one another. And their processing will be frugal (either using algorithms tailored specifically to
the task demands or using heuristics or shortcuts of one sort or another, or both). I am myself inclined to express this
result by saying that the mind is massively modular in its organization; but what matters is the result itself, not the
manner in which it is described.
end p.198
13 Modularity and Design Reincarnation
H. Clark Barrett
Modularity has come under fire of late. In particular, notions of modularity associated with evolutionary perspectives,
sometimes called massive modularity (Carruthers, 2005; Samuels, 1998; Sperber, 1994), have been heavily criticized
from a developmental perspective (Buller, 2005; Buller & Hardcastle, 2000; Elman et al., 1996; Lickliter & Honeycutt,
2003; Smith & Thelen, 2003; Quartz & Sejnowski, 1997). In this essay, I address developmental critiques of
modularity. The organization of the essay is as follows.
In section 1, I suggest that critics of evolutionary modularity theory have largely attacked a straw-man folk idea of
module development that is akin to preformationism (Lickliter & Honeycutt, 2003; Smith & Thelen, 2003). I then
attempt to flesh out what the actual developmental commitments of evolutionary modularity theory might be. Sections
2 and 3 review why modularity is invoked in cognitive science, and propose that the folk view of innateness as hard
wired must be revised in the case of cognitive modules. Modules are not preformed but rather are constructed by
evolved developmental systems that use local, real-time information to do so, resulting in modules that can vary along
some dimensions yet retain certain features in common, as discussed in sections 46. Sections 7 and 8 explore the
entailments of the idea that reliable development produces diverse tokens of evolved module types. Sections 9 and 10
confront the problem of how content can be reliably generated by systems that do not know everything in advance
about the problem domains they are facing, resulting in different outcomes in different environments, including novel
ones. The essay ends with some suggestions about how standards of evidence for the existence of evolved modules
might need revising in light of a reliable development view, as opposed to a preformationist view, of modularity.
end p.199
1 The Problem of Ontological Filling-In
The sciences of brain and mind are currently in a curious position regarding the formal concepts that are used as part
of everyday science. Whether one is a neuroscientist studying the brain's hardware or a behavioral scientist studying
the relationship between information inputs and behavioral outputs, inferences from data to actual cognitive
architecture are usually quite indirect. This means that many entities are postulated that are operationally defined but
whose ontological status is uncertain. Examples might include Chomsky's (1965) universal grammar (UG) and Morton
and Johnson's (1991) CONSPEC and CONLERN, among many others. In each case, these are abstract structures that
admit of an informational description, but can only be measured or inferred via concrete behavioral tokens. If these
are things, where do they reside? When we refer to them, what, physically, are we referring to?
Given such uncertainty, it is crucial that we not forget for what reasons our formal concepts are invented and invoked.
Moreover, given that these terms are often intentionally of indeterminate ontological status, especially in the world of
information-processing language, it can be a major mistake to give in to the urgings of our intuitive ontology, which
seeks to fill in the unspecified details of ontologically ambiguous objects, using properties that may come from a rather
limited menu (Bloom, 2004; Boyer, 2001; Boyer & Barrett, 2005). Theories that postulate information structures, as in
the spirit of Marr (1982) or Chomsky (1965), may attempt to achieve rigor precisely by leaving all parameters other
than those of the theory itself unspecified, resulting in postulated entities that seem downright unnatural in intuitive
terms, like a person with no body. Some unspecified parameters may include ontologically significant ones, such as
how the structure in question is constructed during development or instantiated in neural tissue. Leaving these
unspecified can produce an itch that our intuitive ontology longs to scratch by filling them in.
However, this underspecification must be taken seriously if the theories are to have any value (if not, all theories in
psychology will sink under the weight of associated ontological commitments that are almost certain to be incorrect in
the details, in the long run). It is a mistake to criticize a theory about psychological entities by first filling in open
parameters explicitly or implicitly, and then to discard the whole structure as implausible on the basis of what one has
filled in. For example, while it might be reasonable to ask of the UG Where would such a thing be located? this
question is in many ways orthogonal to whether the UG hypothesis is correct, because the hypothesis is framed in
terms of information, not in terms of brain structure or genetic structure per se. As Marr (1982) pointed out, while
there is a relationship between hypotheses about information structure and hypotheses about brain structure, it is an
asymmetric one; there are many ways to instantiate a given information structure.
Hypotheses about modularity might be particularly susceptible to this process of ontological filling in, in which details
that are not mandated by the hypothesis itself are introduced from an intuitive or scientific ontology. The hypothesis
that a specialized cognitive system exists, for example, usually leaves open
end p.200
many possibilities for how such a system might be instantiated and what its properties might be. For example,
evolutionary psychologists have stressed repeatedly that the core of their notion of modularity is functional
specialization (Barrett, 2005b; Pinker, 1997; Tooby & Cosmides, 1992). However, others have read modularity claims
as implying much more: for example, automaticity (DeSteno et al., 2002), encapsulation (Fodor, 2000), localization
(Uttal, 2001), lack of plasticity (Buller & Hardcastle, 2000; Elman et al., 1996; Karmiloff-Smith, 1992), and innateness
verging on preformationism (for explicit comparisons to preformationism see Smith & Thelen, 2003; Lickliter &
Honeycutt, 2003).
I suspect that many spontaneously generated intuitions about modularity, such as those mistakenly attributed to
evolutionary modularity theorists, are cobbled together from preexisting intuitive ontologies. These might include an
intuitive ontology of artifacts (e.g., tools or devices that process information, containers that contain it, pictures
that represent things in the world), intuitive ontological commitments about innateness that gloss invariances as
hard wired, and the intuitive ontology of agency (devices are automatic whereas the intentional aspects of human
psychology are not; see Fodor, 2005, for an explicit argument along these neo-Cartesian lines). These ontologies and
metaphors produce spurious entailments that may not hold for cognitive systems instantiated in neural tissue, which
are not actual artifacts or Turing machines (Barrett, 2005a; Pinker, 1997; Sperber, 1994, 2002).
What I would like to do here is consider how modules really do develop. What I propose to defend is that the notions
of modularity and innateness are, to a large extent, orthogonal. Specifically, hypotheses about modularity are not
hypotheses about innateness in the folk sense, or in the sense implied in recent critiques of modularity (Elman et al.,
1996; Lickliter & Honeycutt, 2003; Quartz & Sejnowski, 1997; Smith & Thelen, 2003). This is not to say that
considerations about innatenessif innateness survives as a formal scientific conceptare necessarily irrelevant to
modularity, but if they are relevant, it is only to the degree that they are relevant for anything else in psychology or
biology. This is because, when one hypothesizes the existence of some phenotypic structure X (whether adaptation or
by-product), the question of how X is constructed is a different one from whether it exists and what its properties are.
However, to say that a particular module or type of module is species-typical, or even just that it is an adaptation that
sometimes appears in the phenotype, does entail certain commitments about development, which are more subtle than
claims about, for example, hard-wiring. These are what I will examine here.
2 What Use Is Modularity as a Scientific Concept?
Much of the ontological filling-in regarding modules comes not just from intuitive ontology but from Fodor's (1983)
criteria (which in turn arise, to some degree, from considerations about humanmade computing devices). Fodor's work
has largely set the terms for the modularity debate, and many in cognitive science view modularity theory as
synomymous with Fodor's conception. Since Fodor himself maintains that a module sans phrase is an informationally
encapsulated cognitive
end p.201
mechanism, and is presumed innate barring explicit notice to the contrary (Fodor, 2000, p. 58), it is not surprising
that many think that modularity theorists ignore development (Elman et al., 1996; Karmiloff-Smith, 1992).
Let us step back from Fodor's conception, and ask: what use is the concept of modularity in cognitive science? Why do
we invoke it, and what role does it, or should it, play in our theories of cognition? We can ask the same of biological
theories that divide the phenotypes of organisms into parts: traits, or components. Given that no parts or aspects of
an organism are truly independent, why decompose it into parts?
Several reasons come to mind. The first is purely empirical: biologists have observed that organisms exhibit a modular
structure all the way up and all the way down, from the operating machinery of cells to the organization of cells into
tissues, tissues into organs, and so on (Raff, 1996; Riedl, 1978; West-Eberhard, 2003). Organisms exhibit nested or
hierarchical modularity: phenotypes exhibit different kinds of chunking at different levels, and there are chunkings
within chunkings down to the molecular level. If cognitive systems are different, then they are biologically unique.
A second reason is rooted in arguments about specialization and the benefits that come from division of labor. An
organ that had to perform the functions of both the liver and the heart would do neither well. Often discussed in this
context, in addition to the jack-of-all-trades problem, are other design considerations such as efficiency, avoidance of
combinatorial explosion, and the frame problem (Carruthers, 2005; Samuels, 2005; Tooby & Cosmides, 1992).
A third and closely related reason is that specialized traitsthose that are acted on independently by selectioncan
only evolve to the degree that they are developmentally decoupled from each other, leading to the expectation that
many traits of organisms will be developmentally modular, as I will discuss later (Raff & Raff, 2000; Wagner &
Altenberg, 1996; Wimsatt & Schank, 2004).
Finally, from the perspective of everyday science, it is explanatorily useful to try to decompose wholes into smaller
parts, even if these always end up being approximate to some extent (Bechtel, 2003).
For these reasons, many evolutionarily oriented cognitive scientists want to retain the modularity concept. They aim to
carve the mind at its joints, that is, into functionally specialized units of processing. Clearly, the mind is not literally
like a stereo system or a Lego toy with completely independent components that can be snapped in and out. Even
neuropsychologists who endorse modularity have long recognized this (Shallice, 1988). Rather, the goal is to find out
the functional components that underlie what appears to be the seamless whole of cognition, even if these components
are richly interdependent and interactive both developmentally and architecturally (as a biologist would expect them to
be). What these components might be likewhat properties they might have, from automaticity to innatenessis an
open question. Given the work that we want modularity to do in our theories about the mind, as laid out here, we
should consider accommodating our modularity concept to the empirical data about how specialized structures are
actually built, and even allow it to adjust from case to case, as it does elsewhere in biology.
end p.202
3 Innateness, Development, and Invariance
If the mind can be decomposed into components, systems, or procedures, at least to some degree, what might the
developmental properties of these components be? How might they be constructed during the development of the
organism? And where does innateness come in?
It is frequently overlooked that formal definitions of the prerequisites for natural selection contain no reference to
innateness or even to genes; they are formulated entirely in terms of the heritability of phenotypes (Endler, 1992).
This requires that there be a means of inheritance or transmission of structure from one generation to the next. In this
sense, innateness, broadly construed as a synonym for inheritednessthat is, reconstructed anew each generation via
the developmental processis clearly relevant to aspects of the phenotype with an evolved function. This process of
reconstructing evolved features of the phenotype anew, each generation, through an interplay between genes and
environment that has been shaped by selection, is what Tooby, Cosmides, and Barrett (2003) call design
reincarnation.
What is not mandated by the theory of natural selection is that phenotypes be constructed entirely by genes
whatever that meansor that they be constructed in the absence of environmental influence or even of learning or
other information-acquisition processes. In other words, the whole adaptation, as it manifests in the phenotype and
promotes fitness, need not be innate.
It is worth pausing for a moment here to consider what we mean by innate. Recently, Samuels (2002) has offered a
useful formal analysis of the concept of innateness in cognitive science. He defends the view that when we speak of a
structure as innate, we should mean that it was not acquired via a psychological process. This raises the question of
how psychological processes should be defined. However, assuming that at least some environmental influences are
nonpsychological, Samuels's notion clearly does not rule out the possibility of environmental influences on the
development of innate structures. In other words, on his view, innate does not mean not influenced by the
environment during development but rather not influenced by psychological processes during development.
This may well turn out to be a useful formalization of innateness. Samuels shows that it satisfies a list of desiderata
regarding the status we would like an innateness concept in cognitive science to have. Even so, as Samuels (2002, pp.
2512) points out, the question of whether a particular phenotypic structure is a specialized one generated by natural
selection, or is even generated by any evolutionary process at all, is not the same as the question of whether or not it
is innate. Indeed, psychological processes (however one might wish to construe them) could be a regular, normal part
of the construction of a structure X, and yet X could still be modular, and an adaptation.
To use a popular metaphor personifying natural selection, natural selection only sees phenotypic outcomes. This
means that natural selection doesn't care how a structure is built (except to the extent that how it is built impacts
fitnessan
end p.203
important caveat), it only cares what impact it has on the organism's fitness.

1

1. This anthropomorphic, intentional language can be cashed out in the formal language of evolutionary theory, which invokes only
nonintentional causation. It is used here only for ease of exposition. See Dawkins (1976).
Learning may be costly in many cases, but so may genetic specification (Boyd & Richerson, 1985). What matters is
whether the finished structure promotes survival and reproduction, and the costs and benefits of different ways of
building it. For example, many organisms have predator recognition systems that could properly be regarded both as
modular and as adaptations; yet learningunquestionably a psychological processseems to be a normal part of their
construction,

2

2. Some would argue that learning only influences information in the database to which the predator recognition module has access,
and does not influence construction of the module itself. While this is clearly possible, it strikes me as an empirical question that would
be very difficult to answer using current experimental methodologies. However the module/database distinction turns outand it may
not prove to be a real distinctionthe entire structure may properly be regarded as an adaptation.
precisely because learning is a better solution to certain problems related to predator recognition than is genetic
specification of a fully fleshed-out predator template (Barrett, 2005a). In this case, at least some aspects of the
module in question are not innate, even on Samuels's (2002) account, because a psychological process is involved in
its development.
Samuels argues against invariance accounts of innateness (see Samuels, 2002, pp. 2405). As applied to innateness,
these arguments are substantial. However, it is precisely an invariance account that I will offer here with respect to
modularity, at least for modularity as adaptation. The account I will offer is largely consistent with developmental
systems theory, which is concerned with the ways features of organisms are reconstructed anew in each new
generation of a species (Tooby et al., 2003, call this design reincarnation, and Oyama et al., 2001, describe it as
cycles of contingency). It is widely recognized that this always involves an interplay between the genetic system and
features of the environment that can be considered part of the organism's inheritance (Griffiths & Gray, 2001; Jablonka
& Lamb, 1995; Tooby et al., 2003; West-Eberhard, 2003). In conceptualizing the development of modular adaptations,
several concepts are crucially important, including the concepts of normal environment and reliable development, which
is rooted in considerations about invariance.

3

3. The notions of normal environment and reliable development are related to other concepts in the developmental literature, such as
canalization (see chapters 6 and 7 here). Because of the technical details surrounding such terms, I use the broader concept of reliable
development here, which encompasses any reliably recurring features of the phenotype.
4 A Computational View of Module Growth
If we are interested in how modules develop, and we are interested in modules as functionally specialized adaptations,
we might look to biology for insights. Natural selection shapes developmental processes so that fitness-promoting
phenotypes are
end p.204
systematically produced. In the case of adaptations, specific functional properties and types of organization should be
reliable outcomes of development. The notion of design reincarnation nicely captures this idea (Tooby et al., 2003).
This does not imply that everything is set in the genes, nor even that one could predict outcomes just by looking at
genes, because genes coevolve with environments. One can't understand what genes do without knowing about the
environment in which they do it. One way of conceptualizing this is that the environment contains information the
genes can exploit if they are evolved to do soboth regularities (commonalities across environments) and local
contingencies (differences across environments). For example, the processes responsible for development of a predator
recognition system can expect that there will be predators in the local environment, and can even use certain invariant
cue structures to detect them, such as how predators move (Barrett, 2005a). However, the system can also assume
that each environment contains its own idiosyncratic set of predators, and expect to learn about them, using a
psychological process to construct functional parts of modules that may vary, as tokens, from individual to individual,
such as individual predator concepts or categories (Barrett, 2005a; Boyer, 2001, Cosmides & Tooby, 2000).
It might be useful to think about the generation, or growth, of modules, in computational terms. Modules are grown:
limbs are grown, brains are grown, functional components of brains are grown. As many have stressed, growth is not
merely a process of instantiating a description of the phenotype contained in a genetic blueprint (Marcus, 2004).
Instead, it may be more fruitful to think of developmental processes themselves in computational terms: they are
designed to take inputs, which include the state of the organism and its internal and external environments as a
dynamically changing set of parameters, and generate outputs, which are the phenotype, the end-product of
development. One can think of this end-product, the phenotype, as the developmental target.

4

4. It is important to note, as well, that every stage in the process is also a phenotype, and a possible developmental target. The only
true end-state of development is a dead body.
Natural selection shapes the developmental process on the basis of the effects the process has on the developmental
target. In this way, natural selection can select for various types of developmental outcome without, in a strict sense,
determining the outcome in every detail, because only the developmental system that has that outcome as a target
is actually shaped.

5

5. The developmental process is shaped by many local, real-time events, including principled use of local information, and other
causal events as well. Therefore, phenotypic variation between individuals is to be expected from multiple causal sources, some of
which are by-products or noise. I focus on aspects of development that are targets of natural selection here.
Given this framing, our task in understanding the modular organization of the mind is to ask: What are the
computational properties of the developmental systems that give rise to modular outcomes, that is, architectural
modules? What inputs do these take, and what are the outputs (phenotypes) they produce?
end p.205
5 Conceptions of Modularity
Before turning to design considerations of developmental systems, it is important to note that there are at least three
different things we might mean when we talk about modularity. Each of these revolves around notions of discreteness,
separability, and specialization, but in slightly different ways.
The first is a conception of modularity that is present in the literature on evolutionary developmental biology, or evo-
devo, which I will call evolvable modularity (Raff, 1996; Raff & Raff, 2000; Schlosser & Wagner, 2004; Wagner, 1996;
West-Eberhard, 2003). In this literature, an aspect of the phenotype of an organism is said to be modular to the
degree that natural selection and other processes are able to act on that aspect of the phenotype independently of
other aspects of the phenotype, at least to some degree. In this sense, modularity is said to be a prerequisite for
evolution: an aspect of the phenotype is not a character that evolution can independently shape unless it is modular
(Raff & Raff, 2000; Wagner & Altenberg, 1996).

6

6. One way of capturing this is the notion of generative entrenchment, which refers to the degree and scope of influence of a particular
developmental event on later events (Wimsatt & Schank, 2004). Another is the idea of dissociability (Raff & Raff, 2000), or the degree
to which developmental processes can be independently operated on by natural selection. Entrenchment reduces the dissociability of
later developmental events. Modularity, in the sense of developmental pathways that can be independently shaped by selection, is
therefore a criterion for evolvability.
This does not necessarily mean that developmental processes are controlled entirely or even mostly by different
genes. There can be, for example, switch-like regulatory processes that can decouple them, despite near or complete
overlap in the genes involved in the developmental processes in question (West-Eberhard, 2003). Many, indeed most,
developmental processes will share large numbers of genes.
A second notion of modularity is what I will refer to as architectural modularity (Sperber, 2002). While this can apply
to any aspect of the phenotype, in the psychology literature it surfaces most often in the form of cognitive modularity
(Barrett, 2005b; Carruthers, 2005; Fodor, 1983, 2000; Samuels, 1998; Sperber, 1994). Whereas evolvable modularity
refers to developmental pathways that can be semiindependently controlled or shaped, architectural modularity refers
to the end-points of development. Aspects of the phenotype are architecturally modular to the extent that they are
phenotypically discrete, regardless of the developmental processes that gave rise to them.
Finally, there is what I will call developmental modularity (Sperber, 2002, also uses this term but collapses what I call
evolvable and developmental modularity). This refers to situations in which a single developmental process gives rise to
multiple architectural modules, though not necessarily identical ones, as the end-product of development. For example,
an individual hair and its supporting cell can be regarded as an architectural module, though a shared developmental
process gives rise to every hair on your head. It is possible, indeed common, to have many
end p.206
and sometimes vast numbers of architectural modules without each one having a separate developmentally modular
process responsible for it.
In cognitive systems, there may be many analogies to this. Consider, for example, systems for object recognition,
specifically, the case of face recognition. Presumably, the information necessary to identify a particular individual by his
or her face is fairly complex, and needs to be chunked in some way, so that all of the relevant information is
associated with the identity of the person in question. The face recognition file or template might therefore exhibit
a degree of architectural modularity. For example, one might imagine losing or gaining the ability to recognize a
particular individualsay, a new acquaintancewithout thereby losing or gaining the ability to recognize some other
individual (although it is also possible that these files might interact, e.g., learning a new face might increase my
confusion with some other, similar face; this does not imply that they are not modular, at least to some degree). One
might imagine a developmental system that builds a new architectural module, or template, for each new face I learn.
Presumably, the developmental process that gives rise to each new face file is the same for all of the faces: a single
developmental process gives rise to multiple architectural modules. Similar processes might underlie the acquisition of
behavioral skills that become modularized, or overlearned, such as driving, or chess (Karmiloff-Smith, 1992). A
relatively generalized module-generating process could give rise to specific token modules, where generalized means
capable of generating a broader class of tokens than any particular one that is the outcome of the process.
6 Module Types, Tokens, and Reliable Development
Of a particular developmental system we might ask: What phenotypic outcomes was this system designed by natural
selection to produce? In other words, what is the nature of the developmental target that the system is designed to
hit? On analogy to Sperber's (1994) notion of proper domains, we might think of this as the system's proper
developmental target.
Here it is useful to make a distinction between types and tokens when we think about developmental outcomes.
Natural selection shapes developmental systems so as to favor particular types of outcome. The actual token outcomes,
however, will always have a level of detail that is not in any sense specified either by selection or by anything that
can be found in the genes. This is an important point that is often overlooked when details of token outcomes are
used as evidence against evolutionary hypotheses.
As an example, consider the example of the face-recognition system in humans (the details of face recognition in what
follows are for the purpose of explication only, and are not intended as actual empirical claims). By hypothesis, the
evolved function of the entire system is face recognition, and the function of the developmental processes involved is
to build a phenotypic systeman information-processing phenotypethat is capable of accurately recognizing the faces
of conspecifics and associating them with individual identities, that is, interfacing with the conceptual structure than
manages knowledge about persons in the local
end p.207
environment. Abstractly, this outcome is the proper developmental target of the system. It is a type of developmental
outcome. When there is a consistent match between environment and the developmental system, such that organisms
hit their proper developmental target again and again over time and space throughout the population, we say that this
target is a reliably developing aspect of the phenotype. Reliable development is, in this sense, an alternative to the
notion of innateness; reliably developing aspects of the phenotype will appear innate.
The face-recognition system, so described, might well be considered a single architectural module. One might also
expect the processes involved in its construction to be developmentally modular, in that they have been acted on by
natural selection independently of selection on other systems. Within the overall system, however, there are likely to
be many modular components. The system exhibits nested, or hierarchical, modularity. For example, as already
described, the system might contain individual identity files, or templates, each of which contains the information
necessary to recognize a particular individual, and each of which is, at least to some degree, architecturally modular
(one might expect many shared elements, however; for example, there might be information about noses shared
across all such templates).

7

7. Again, many cognitive scientists would prefer to call the face-recognition system, as a whole, a module, and to refer to individual
face identity files as items in a database that are not themselves modules. I suppose this is a question that will depend both on data
and one's choice of modularity criteria, and I would prefer not to prejudge it. For example, it seems possible to me that the entities
involved in face recognition might be enzyme-like in the sense of Barrett (2005b), quasi -independent devices monitoring a common
information pool for a match. In any case, face-recognition templates serve as an example of more general points about module-
generating processes that have novel tokens, albeit with recurring features, as their end-points.
One can describe the developmental processes that build these identity files as having a proper function, and a proper
developmental target. The proper function of these processes is the construction of face identity files, templates that
can be used to identify particular individuals. Presumably, such files have all kinds of design features and format
parameters that they share that enable them to interact correctly with the rest of the system, and with outside
systems (e.g., the conceptual system, the lexical system, etc.). The developmental target of this system, therefore, is
a type of architectural module: an identify file, with a particular format. The system uses inputs in principled, quasi-
algorithmic ways to construct these files (e.g., information from the visual system when looking at the person's face,
semantic information about the person's identity, and so on).
However, the outcome of the process that builds individual identity files is always a token of the general type of target
that the developmental system was designed to produce. General, here, does not mean that the proper target is
anything other than a highly specified, and even domain-specific, information structure. Rather, it implies that the
proper target is a class of information structures of a particular type. The individual tokens will always be the fully
realized identity files, each of which will have a level of detail far beyond what natural selection, or anything in the
system, actually specifies. For example, I have a face-recognition file for George W. Bush. This is a token of the
identity file type. All of the information
end p.208
in the file that is specific to Bushthat disambiguates his face from other facesis clearly not specified in any sense
innately, and is, indeed, entirely evolutionarily novel (what is not novel is everything that does not disambiguate his
face from other human faces). It is in this sense that the token is more detailed than the type, and indeed, every
instantiated token contains evolutionarily novel, learned information. Yet the system is functioning exactly as it was
designed to by natural selection, and the token is entirely within the range of proper developmental targets for the
system.
There are several things to note about this example. First, the type never exists as an actual object in the world, and
it is futile to search for it. The actual developmental outcomes are always tokens and, as such, have elements that are
environmentally contingent, vary from individual to individual, and so on (sometimes as a matter of design, as in the
face-recognition case, but also sometimes merely as an accident or by-product of how the system works, or as the by-
product of the operation of some other system). Second, the contrast between learning and modularity is obviously
useless here, because we have a developmental system designed to use learning in the construction of its modular
target, and this may be the norm rather than the exception in modular cognitive systems. Finally, this is a case where
a single developmental process gives rise to many architecturally discrete modules (one for each face you can
recognize). Any idea of one-to-one mapping falls apart.
7 Proper versus Actual Developmental Outcomes
Sperber (1994) distinguished between the proper domain of an information-processing devicethe set of inputs the
device was designed by selection to processand its actual domain, the set of inputs the device actually can, or does,
process. The proper domain is defined by a history of selection. The actual domain, on the other hand, is defined by an
interaction between the information-processing properties of the device (whatever criteria admit information into the
device) and properties of the world (whatever information in the world satisfies these criteria).
The face-recognition case described earlier shows that in many cases, actual tokens of evolved phenotypes will always
include evolutionarily novel elements (the structures involved in recognizing George Bush in you and me contain
elements that did not and could not have existed before George Bush existed). How do we reconcile this with the
notion of proper and actual domains? The distinction between types and tokens helps, at least in part, to solve this.
What is proper is the type. Individual tokens are always actual outcomes. Of course, some of these tokens may fall
within the proper class, and some not. For example, suppose it turned out that the system designed to construct face
identity files also constructs identity files for some other kind of object, such as automobiles (there is, in fact, evidence
that face recognition is quite tightly focused on cues specific to faces, and excludes automobiles and even classes of
object designed to trigger face-recognition-like processes, such as greebles; Duchaine, Dingle, et al., 2004; Gauthier
et al. 2004; Moscovitch et al. 1997; but one can imagine this as a possibility). Should a token file that is about a
Chevrolet be considered just as good a token as one that is about George Bush?
If face recognition is the proper target of the developmental system, and if the system builds a template for
recognizing automobiles, this outcome is outside the
end p.209
proper set of targets (the proper type) for the system. Importantly, we might choose to say this because cars were not
part of the history of selection acting on the system, even though the constructed Chevrolet file could have the
appropriate formal properties of face-recognition files and, therefore, would be within the proper range of architectural
outcomes, in this sense. However, there may be some cases that are not as clear-cut, because we do not know
enough about what the history of selection actually was (even though there would, in principle, be a way to decide it).

8

8. This is not to say that uncertainties about some aspects of human ancestral environments invalidate the usefulness of the concept of
Environments of Evolutionary Adaptedness, or EEA, as some have claimed. Instead, the usefulness of the EEA concept depends on the
details of the case. For example, it is unambiguously true that there were no cars in human ancestral environments, but there were
human faces.
8 Underspecification of Tokens by Types
In many or most cases, every actual, instantiated token of a module type may contain information that is unique and
evolutionary novel, as the George Bush example shows. This would be true in all cases in which the finished phenotype
is not completely innate. In these cases, the type always underspecifies the token. In other words, in the absence of
proper environment, the type does not contain enough information to build actual tokens, that is, to generate actual
developmental outcomes (though the phrase contain information is misleading, as the type is not a physical object
and so cannot contain anything). This is relevant to the debate over the proper function of the putative face-
recognition system (anatomically, the fusiform gyrus and related structures; Duchaine, Yovel, et al., 2004; Kanwisher,
2000; Gauthier et al., 1999). Some have argued that the system is not specialized to handle faces at all but rather is
specialized for something like fine-grained, intracategorical distinctions between grossly similar visual representations
of middle-sized objects (Boyer & Barrett, 2005, p. 98; this possibility is what Duchaine, Yovel, et al., 2004, call the
individuation hypothesis, though there are other varieties of hypotheses based, e.g., on expertise). Often invoked
here is the fact that, unlike distinguishing between chairs and buildings, which are grossly different, distinguishing
between the faces of George H. W. Bush and George W. Bush requires processing of very subtle differences between
objects, which cannot be discriminated purely on the basis of individual local cues, and therefore require holistic or
configural processing. On this view, faces just happen to be objects in the environment that often satisfy the criteria
of configural processing, though other objects might sometimes satisfy it as well, especially in people who are forced or
trained to distinguish between members of these grossly similar, middle-sized object categories (but see Duchaine,
Dingle, et al., 2004, for counterevidence).
As noted, proper domains and outcomes are defined by a history of selection. Actual domains and outcomes, on the
other hand, are defined in terms of an interaction between the input criteria and operations of the system, and the
state of the current environment. On the face of it, it appears that the statement about
end p.210
fine-grained, intracategorical distinctions between grossly similar objects is a statement about actual outcomes,
whereas the statement that the system evolved to discriminate between faces is a statement about proper domain.
The observation that proper types always underspecify tokens helps to reconcile these two statements.
The type is never instantiated as a physical entity. Empirically, the only way to access the type is by observing
regularities in token developmental outcomes. However, to the extent that it might be possible to describe the
regularities in the type, independent of the information present in tokens that comes from environments in which the
tokens were designed to be generated, that description might be something like grossly similar visual representations
of middle-sized objects (though, in reality, it is likely that the type for the face-recognition system is more detailed
than this). In this example, this would be the level of abstraction at which the type is specified, no more and no less.
If one were to place the developmental system in all possible environments and let it develop, one would see all
possible developmental outcomes (all possible tokens of the instantiated system) that satisfied this criterion, given the
objects available in the environment. In other words, in all environments where there were evolutionarily novel face-
like objects that were not facescall them pseudofacesprocessing of pseudofaces by the fusiform gyrus would be a
reliable developmental outcome. This could be true even if the actual set of objects that led to the evolution of the
system was composed only of faces (although, in fact, the pseudofaces would not even have to be evolutionarily novel
for them to be outside of the proper developmental target of the system; identifying them would only have to have
been selectively neutral, so that they did not contribute to the evolution of the system, even while being handled by it
as a by-product).
9 The Relationship between Environment Structure and Content
A theoretical perspective that is very useful for thinking about the relationship between environment structure and
actual developmental outcomes is ecological rationality (Gigerenzer et al., 1999). On this view, systems can evolve that
lead to apparently meaningful outcomes but that exploit only abstract statistical or structural regularities in the
environment in order to do so. A system is ecologically rational if it produces its proper outcome, as an actual
outcome, when matched to environments of the proper structure. The kinds of ecologically rational systems that have
been studied by Gigerenzer and colleagues (for the most part, heuristic decision rules) are extremely sparse: on the
surface, the inputs and procedures of these systems appear content-free, and yet they generate behavior that appears
meaningful.
There is no paradox here. Instead, this may be the key to understanding how contentful modular structures are
generated, as actual token outcomes, by procedures that specify only a rather abstract type. Interaction between the
actual details of the environment and the generative system are what produces content. Such systems can be designed
by natural selection to exploit regularities in environment structure, even when these structural regularities appear to
be formal rather than
end p.211
contentful properties of the environment. An example might be aspects of biological motion that can be specified in
terms of abstract cue parameters, but that nevertheless reliably discriminate the motion of living things from other
kinds of motion (Barrett et al., 2005; Johansson, 1973). This points to how highly domain-specific structures can
develop, as a matter of design, using procedures that have structure that is only abstractly or heuristically related to
the proper domain.

9

9. Although this is recognized by critics of modularity and domain specificity such as Elman et al. (1996), they fail to recognize that the
reason that content reliably develops from interaction with the environment is that natural selection engineers this match. The
contentful phenotypic end-points of development, therefore, can properly be regarded as adaptations.
To say that a developmental system is domain specific is to say that the procedures evolved to reliably produce a
specific type of outcome are matched to relevant aspects of environment structure. There is a coupling, shaped by
selection, between the operations of the developmental system and the structure of its proper environment, what is
sometimes called its Environment of Evolutionary Adaptedness (EEA) (Tooby & Cosmides, 1992). It is not to say that
all of the content relevant to the domain is built in or prewired.

10

10. Buller (2005), for example, misunderstands this point when he claims that a putative face template in children, such as that
proposed by Morton and Johnson (1991), is domain general because it does not have full-blown innate knowledge of faces (Buller,
2005, p. 154). The criterion of full-blown innate knowledge would probably preclude the possibility of anything in the mind being
domain specific. What matters is that faces are the proper targets of the face template, even if the criteria for identifying them are
heuristic.
Consider the ability to recognize and discriminate between various animal taxa in the local environment. At least in
part, this skill was probably shaped by selection due to the benefits of recognizing dangerous animals and recognizing
prey (Barrett, 2005a). Here the developmental target might consist of several kinds of capacity. These might include
the capacity to discriminate between animates and inanimates, the capacity to determine whether or not an animal is
dangerous, and perhaps, the capacity to recognize and distinguish between types of animal, for example, different
classes of dangerous animal, such as snakes, spiders, carnivorous mammals, and so on. Such discrimination abilities
are known to be present in a variety of animal species that face predation. For example, vervet monkeys are able not
only to detect dangerous animals but also to discriminate between at least three categories of dangerous animals:
snakes, raptors, and terrestrial mammalian predators (Cheney & Seyfarth, 1990). This is a skill that reliably develops
by adulthood in normally developing vervets, but requires learning and calibration during development, as is evidenced
by the decrease in false alarms to each category of predator with age (Cheney & Seyfarth, 1990; see also Mineka et
al., 1984, on learning of snake fear in rhesus macaques, as an example of an evolved behavioral skill tuned by
experience, and social experience in particular).
In the predator recognition systems of humans and other animals, there may be hierarchical modularity in the
architectural outcome of the developmental process. For example, the perceptual processes used to detect the
presence of animate living things are likely to exhibit a degree of modularity: dedicated architecture,
end p.212
proprietary inputs or triggering conditions that are distinct from those for other kinds of object recognition, specific
animate motion detectors, and so on (Barrett, 2005a). Even the recognition of individual taxa may involve specific
brain regions, and may to some degree be dissociable from capacities to recognize other categories of object, such as
artifacts (for pro and con views of this proposal, see Forde & Humphreys, 2002). Beyond perception, it is not
implausible to suggest that vervets, like humans, have distinct concepts of the animal taxa they can recognize. At a
conceptual level, vervets understand that snakes are a different kind of animal than raptors (here, the difference
between the possession of a concept and a behavioral skill are difficult to tease apart). And there are likely to be
modular aspects to the connections between the knowledge systems triggered by encounters with a particular class of
predator, and decision-making and motor systems that instantiate different escape strategies for each kind of predator.
In this example, it is clear that there will be many aspects of content, as well as of the nature of the architecturally
modular end-points of development, that are not innate. In humans, one might imagine that all Inuit children
growing up in the Arctic reliably acquire the concept POLAR BEAR, and are able to reliably identify a polar bear when
encountered. Shuar children growing up in the Amazon basin, on the other hand, might all acquire the concept
JAGUAR, and be able to recognize one when they see it, but few Shuar children acquire the concept of POLAR BEAR.
Clearly, the concepts POLAR BEAR and JAGUAR are neither innate nor universal in humans, though both are token
developmental outcomes that are well within the proper type parameters of a predator recognition system. In Inuit and
Shuar adults, these are concepts that are as fully formed and contentful as any concept can be, and might be acquired
via a highly domain-specific system that is an evolved adaptation, yet much of their content is shaped by interaction
with the environment and even by psychological processes in Samuels's (2002) sense. Note also that, if one considers
the architecture dedicated to recognition and knowledge of each taxon as discrete or modular, at least to some degree,
different individuals could have different numbers and types of module as the outcome of development.
A key question in this case, and in the study of developmental systems in general, is: What kind of coupling or
structural matching between developmental system and environment would be necessary in order for such highly
organized and contentful outcomes to reliably develop? There has been much research on how living kind concepts
emerge during development, and there is not space to describe it all here (see Barrett, 2005a; Inagaki & Hatano,
2002; Rakison & Oakes, 2003). I have hinted earlier at some aspects of environment structure that allow living kinds
to be discriminated, and that afford reliable inference. For example, certain aspects of biological motion differ in a
reliable way from other kinds of motion, such as motion caused by gravity, collisions, wind, and so on (Barrett et al.,
2005). One can imagine a developmental system that is tuned to exploit these regularities without having preexisting,
explicit knowledge about animals, but that nevertheless develops knowledge about animals as a reliable outcome.
The list of aspects of environment structure exploited by such a system would undoubtedly be fairly long, and would
include things such as the contingent reactivity of animate agents, the temporal and spatial structure of contingent
interaction, morphological features of
end p.213
agents, facial cues, and more (Boyer & Barrett, 2005). Moreover, it is not enough to simply point out these features of
environment structure. To have a complete description of the system, its computational parametershow it processes
these inputs to generate developmental outcomesmust be specified. This requires substantial empirical work, though
adaptationist reasoning about design can help to constrain hypotheses.
10 The Appearance of Innateness and the Grain Problem
In the example given earlier, Inuit children acquire POLAR BEAR and Shuar children acquire JAGUAR because each kind
of animal (or information about it) is present in the respective developmental environment, and provides the proper
developmental inputs to the relevant system. The statement that POLAR BEAR and JAGUAR are not innate concepts
should be uncontroversial, even in the folk sense (they are also not innate on Samuels's [2002] account, because they
are acquired via a psychological process). Imagine, however, a developmental system that interacts with some aspect
of the environment that does happen to be universal, and so produces a token developmental outcome that is
universal. Examples might be the concept of ANT, and the concept of BOWL. Ants are nearly ubiquitous, and so it is
not implausible that every normally developing adult in the world has this concept. The same with bowls, or at least,
whatever the locally available concave receptacle for food and beverage is. Though bowls were not always present
everywhere, they might now be (and the concept could be reliably acquired, everywhere, via the proper operation of
an artifact recognition system; but it is not innate).
Using the criterion of universality, one might well determine that many aspects of the phenotype of a species are
innate. Reliable development produces outcomes that have the appearance of innateness. Yet it would be a mistake to
conclude from universality alone that they are hard-wired. An example might be the concept GRAVITY, or, rather
than the concept itself, the various inference systems used to generate intuitions about how objects will behave in the
presence of gravity. These are things that many people, currently, think of as hard-wired. Even infants are surprised
at apparently gravity-defying arrangements of matter (Baillargeon, 2002), and the presence of this intuition in infancy
satisfies many that innateness is involved. But would these systems look the same in someone who had spent his or
her entire life in the absence of gravity, and if not, would this falsify the hypothesis that an evolved system was
involved?

11

11. Other surprising examples of the importance of normal environments for development might await us. For example, studies of
individuals with infantile cataracts provide strong support for the importance of face input to the right hemisphere in the first months of
life. If the lenses in these children are repaired within their first six months, they have normal vision after that. Despite this, they never
develop normal face perception (Le Grand et al., 2001; Le Grand et al., 2003).
Such examples point to why innateness in the folk sense of the term is a dangerous concept to allow into the
discussion of evolution and development. Once innateness in this sense is admitted as a criterion for adaptation,
demonstrations of
end p.214
noninnateness appear to count as demonstrations against evolutionary hypotheses. Suppose, for example, a future
generation of children raised in space shows significant differences from earth-born children in the design of their
intuitive physics system. One might imagine, for example, that their intuitions about inertia, collisions, and so on
could be significantly different, and measurably so. Some would say that these data argue against the hypothesis that
the intuitive physics system is evolved; they would say, instead, that the evidence demonstrates that children simply
learn how objects behave through experience with the environment. On the basis of the framework developed here,
this is not an alternative hypothesis, or at least not a well-formed one. The question is not Is learning (or interaction
with the environment) important in the development of the observed outcome? but rather Is there a developmental
system that produces this outcome as a specific token of a more general type, and if so, what is the nature of that
type? Presumably, a zero-gravity intuitive physics is outside the range of proper developmental targets of the intuitive
physics system (based on the certainty that humans did not find themselves in zero-gravity environments in the past),
but may well be an actual developmental outcome of the evolved system, when placed in zero gravity.

12

12. Here I have assumed that people would argue that learning, a patently psychological process, would be responsible for differences
between children raised in zero- and one-g environments, but there are other possibilities. Whether or not the differences would be due
to innate factors in Samuel's (2002) sense, then, would depend on empirical details and the criteria for counting a process as
psychological. These are questions that are largely orthogonal to the present discussion.
Such considerations create problems for conventional interpretations of variance and invariance in outcomes across
environments. Conceptual tools that have conventionally been used to test evolutionary hypotheses, such as innateness
and universality, need to be rethought. Universality of outcomes is an important diagnostic, but this universality needs
to be construed in terms of developmental systems interacting with actual environments that will produce diverse token
outcomes. It is higher level and sometimes abstract invariants that must be looked for in the search for type
outcomes. Another widely invoked criterion that needs to be rethought is the ability of systems to be flexible or to
deal with novelty, which are typically taken as evidence against domain-specific adaptation. This is problematic. For
example, the presence of the concept TYRANNOSAURUS REX in the minds of many children is clearly an evolutionarily
novel outcomethere was no selection to acquire this concept, nor any selection by Tyrannosaurus rex on humans at
alland yet it might be considered a token outcome that is well within the proper type of a predator-recognition
system.
Another problem is what has sometimes been referred to as the grain problem (Atkinson & Wheeler, 2004; Sterelny
& Griffiths, 1999). Sterelny and Griffiths (1999) argue that there are various ways of describing adaptive problems,
which vary in how coarse- or fine-grained they are, and there is no reason to privilege one over the other (e.g., face
recognition vs. discriminating between complex middle-size objects that require holistic processing). While I disagree
that it is impossible to prefer one level of grain over another in describing a given adaptive problem, I do
end p.215
think that there are aspects of the grain problem that are important. One aspect of the problem can be framed in
terms of types and tokens: it is a mistake to confuse tokens for types. If we discovered that all humans have the
concept ANT or the concept BOWL, it would be a mistake to assume from this that there has been selection specifically
to acquire the concept ANT or BOWL, as opposed to selection for acquisition of a more general class of concepts of
which these are tokens. It is more likely that there was selection to reliably develop concepts of animal taxa in the
local environment, or artifact categories in the local environment (or, some would claim, selection only for the
acquisition of object concepts in general). The same goes for arguments about the presence of chess and driving skills
in humans. These are not, as has been claimed (e.g., Sterelny & Griffiths, 1999), necessarily problems for
adaptationist accounts of cognitive architecture, if one allows for the type-versus-token distinction. The potential
mistake is one of grain: one is concluding that observed tokens represent a type. However, this is not an
insurmountable problem (Atkinson & Wheeler, 2004). Adaptationist reasoning helps to constrain hypotheses, because
one of the criteria for the evolution of a novel, dedicated system is that there are problem parameters that are unique.
This helps to reduce the a priori likelihood of, for example, the bowl -specific adaptation hypothesis being correct.
Moreover, careful attention to the type/token distinction will help to resolve cases like these.
11 Conclusion
Although our intuitions may tell us that modularity and development are incompatible, because modules are hard-wired
devices that are simply unpacked from a box and plugged in, a biologically sophisticated understanding of how
phenotypes are constructed suggests that they are not incompatible at all. Our task is not one of choosing between
modular and developmental accounts. Evolutionary developmental biologists have shown that there are good reasons to
suspect that most aspects of organisms are modular, and that these modular phenotypes are constructed anew each
generation through a complicated but orchestrated interplay between genes and both internal and external
environments.
The modularity debate, as it currently stands, is frustrating because both sides should agree that innate structures, as
our intuitive ontology construes them, do not exist. Geneenvironment interactions always occur, by design, in the
construction of phenotypes, and for good reason. The environment is an important source of information that one
would expect evolved developmental systems to exploit, not ignore. In the debate, there are baby/bathwater problems
of several kinds. First, while the argument against blueprint nativism (in the sense of iconic representations of the
phenotype in the genome) is correct, it is a misconstrual to direct this criticism against evolutionary psychology, which
is not based on a commitment to genetic blueprints of this kind. A second error lies in the unwarranted leap from the
dismissal of blueprint nativism to learning, flexibility, and socialization as alternatives to evolutionary hypotheses. Such
a leap is simply a non sequitur. It misses the fact that reliably developing functional design always needs to be
explained as the outcome of selection, at some level: perhaps not of the specific token of
end p.216
functional design that one is currently observing, but at least of some type of which that token is an instantiation. The
nature of the type is open to investigationand, as argued here, tokens are almost always more detailed than types
but tokens of functional design do not simply appear at random. An evolved system generates them.
What we should attend to, according to this analysis, is discovering the type of outcome that selection has favored in
any given case. Usually, this will be an empirical matter that cannot be resolved purely with a priori adaptationist
reasoning, or with appeals to parsimony, which are often simply based on intuition or preferred theoretical framework.
However, the question of what type of outcome selection has favored cannot be ignored; it always, in principle, has an
answer. Equating domain generality with parsimony, which is the currently popular strategy for dodging this question,
will not make it go away.
end p.217
14 Cognitive Load and Human Decision, or, Three Ways of Rolling the Rock Uphill
Kim Sterelny
In this chapter I argue that much human decision-making has a high cognitive load; that is, agents make satisficing
decisions only by accessing and effectively using information that is hard to get, hard to interpret, or both. When the
type of information needed for good decision-making is predictable over evolutionarily significant time frames, there is
likely to be a modular explanation of its intelligent use. When the environment is stable in the right way, natural
selection can pre-equip agents to register the relevant information and to use it efficiently. But human environments
are heterogeneous in space and time, and as a consequence there are many high-cognitive-load problems that we face
whose informational requirements are not stable over evolutionary time. I argue that our capacity to respond
successfully to these novel problems depends on two other evolved strategies. One is informational niche construction.
Humans physically engineer their own environments and those of the next generation: tools, shelters, fire, clothes, and
weapons have transformed the selective forces that act on our lineage. But humans also engineer their informational
world: marking a trail transforms a difficult navigational problem into an easy perceptual problem. Most especially, we
engineer the informational environment of the next generation. Teaching organizes the informational world of the
young, and plays a crucial role in allowing information to be assembled and transmitted accurately. Informational
engineering is an ancient feature of human lifeways, and I argue that human minds are adapted to this social
transmission of information. But, as with evolved modules, the success of intergenerational transmission of information
is linked to the rate at which environments change. If the world changes very rapidly, the information of the previous
generation may well be past its shelf life. A third strategy for dealing with high-load
end p.218
problems is less sensitive to the pace of change. Humans use epistemic technology to expand their cognitive powers.
Most obviously, we store information in the environment. This, too, is an ancient feature of human lifeways. For
example, many craft traditions produce artifacts that can act as exemplars or templates. A fish-spear is a rich source
of information about how to make fish-spears. In summary, then, a successful human life depends on good decision-
making, and an agent can only make the right decision if he or she notices the crucial features of their current
situation, and evaluates that information appropriately. Sometimes that competence is based on a special -purpose
evolved module. But our minds are adapted not just to relatively invariant features of human environments but also to
changeable ones. Adaptive action in the face of novel challenges depends on some combination of informational niche
construction and epistemic technology.
1 Foragers' Dilemmas and Bargaining Games
Human life is one long decision tree. Fortunately, some of these decisions are not especially challenging. Identifying
local mores about dress is often very important, for individual fitness often depends on conformity to local norms. Once
others are in business suits, it is harder to be treated seriously while dressed in a T-shirt and jeans. But the task does
not seem intrinsically difficult. It is reasonable to suppose that most dress codes could be learned by inductive
generalization from primary social experience (plus or minus a bit). Appearances might mislead, for we lack well-
developed theories of the power of learning. But with respect to most clothing norms, there is no plausible version of a
poverty of the stimulus argument.

1

1. There is the usual implausible version from the lack of explicit negative instruction: when growing up in Australia, I was never
explicitly told not to go to school with a dead turkey stuffed over my head. Even so, as Fiona Cowie notes, this would be a fragile basis
for positing an innate schema specifying the class of possible clothing norms (Cowie, 1998).
Some important decision problems have a low cognitive load: there is no particular problem in explaining how
intelligent agents could acquire and/or use the relevant information.
However, discovering the local dress code is not typical of human decision-making. Human action often depends on
information that is hard to acquire, hard to use, or both. This view has become somewhat controversial with the
articulation of a program for explaining human decision-making on the basis of fast and frugal heuristics. The
defenders of this program think that we can normally make good, though not perfect, decisions by following simple
rules and exploiting small amounts of easily available information. Thus, instead of weighting all the factors necessary
for making an optimal decision in choosing a car, we normally get a good result by using a take the best heuristic,
allowing one criterion to dominate our choice (Gigerenzer et al., 1999; Gigerenzer & Selton, 2001). But this approach
is not plausible as a general picture of human cognition. For stock examples abstract away from a key feature of
human life, namely epistemic pollution.
end p.219
Other decision-makers degrade our epistemic environment by active and passive deception, and such tactics can be
countered only by sensitivity to a wider range of information.
There is something right about this program, for heuristic decision-making is doubtless central to human life. We often
act under time pressure and with incomplete information. So we need decision-making strategies that will satisfice
under such circumstances, but those heuristics will often be quite informationally demanding. Consider, for example,
the problem of gathering resources in a forager's world. This problem is crucial to fitness. Foragers do not accumulate
a surplus and often live close to the edge: they must typically make good decisions. Yet consider the intellectual
challenge faced by a forager on a hunting expedition who sees an armadillo disappearing down its burrow. Should he
try to dig it out, or try his luck further down the path? The optimal choice depends on subtle ecological, informational,
and risk-assessment issues. The forager must consider the probability of catching the animal. Is the burrow likely to
end under a large rock or other immovable obstacle? He must estimate the costs of catching the animal, including the
risks, for some menu items are decidedly dangerous. Costs include opportunity costs. If it will take the rest of the day
to dig the armadillo out, the forager has forgone the potential reward of a day's hunting. Finally, of course, he must
factor in the benefits of catching the animal. As it turns out, armadillos vary in their value across the seasons. They
are much fatter in certain seasons than others (Shennan, 2002, p. 147). Moreover, there are social complications in
the assessment of return, for in many cultures, large catches are shared but small catches are individual property. So
forager decision-making has a high information load. The right armadillo choice requires detailed knowledge of local
natural history and local geography. It requires a clear-sighted assessment by the agent of his or her own technical
skills and social location. To understand forager decision-making, we need to understand how this information is
acquired and used.
Social decision-making, too, has a high information load. Trade is an ancient feature of human life (Ofek, 2001). Hence
so is bargaining. Yet it has both a high information load and a low tolerance of error. If you try to drive too hard a
bargain, you will end up with no deal at all. If you are too soft, you will never make a good deal. Yet deals are not
easy to evaluate. You need to evaluate your personal circumstances, and to integrate that evaluation with information
about the local availability of goods. What do you want and what you are willing to give up? Will you trade a lower
price against slower delivery, or a reduction in insurance cover? If you regularly trade, you will also need to factor in
future effects. These include effects on your reputation and on future negotiations with this agent. Finally, and
importantly, the micromanagement of negotiation is important. It is important how you phrase and present your offer.

2

2. I am indebted here to my student Christo Fogelberg, who alerted me to the value of this example as a whole, and especially to the
importance of expertly managing the initial offer and counteroffer.
Consider this dialogue (assuming the cart is worth roughly $75 to both A and B):
end p.220
A: I would like to buy your cart. I'll give you fifty for it.
B: No way, a hundred is my absolute minimum!
A: All right, then, why don't we split the difference and settle at seventy-five?
A has blundered, and will probably now either have to settle for more than $75 or break off negotiations. That is true
despite the fact that his offer is realistic. But having made it with his first counteroffer, A will now find it difficult to
maintain that position. It is now probable that negotiations will either finish around the $85$90 mark or break down
when A refuses to move.
These examples are typical rather than exceptional. Human decision-making often has a high information load, for we
depend on knowledge-intensive methods of extracting resources from our worlds. Our ecological style contrasts with
that of the closest living relatives of our species, the chimpanzees. For while they engage in some knowledge-intensive
foraging, most of their diet is based on fruit and other ready-to-use resources. In contrast, even the simplest foraging
lifeways depend on technology and on detailed local knowledge (Henrich & McElreath, 2003; Hill & Kaplan, 1999;
Kaplan et al., 2000). Moreover, human social worlds are complex, demanding, and only partly cooperative. They are
complexly structured: divided by gender, status, occupation, generation. They are operationally complex: much human
action requires coordination with others. And they are complex in their resource demands: successful human life
requires access to a large range of goods, not just a few. For this reason, human culture adds to the problem of
explaining adaptive human action. Human cultures generate a large measure of the informational load on human
decision.
2 Three Evolutionary Responses to High Cognitive Loads
High-load problems are typical of human life. They are also ancient. The distinctive features of human cultural life
originate hundreds of thousands of years ago; some may be much older (Wrangham et al., 1999). These features
include diverse and regionally differentiating technologies; trade; ecological expansion; and even public representation
(McBrearty & Brooks, 2000). There has been time for evolutionary responses to these informational burdens:
responses that vary according to the stability of the informational demands on adaptive action. Some human problems
are informationally demanding, but the information need for good decisions is stable, constant over evolutionarily
significant time-frames. In other cases, the information needed for adaptive choice is stable over generations but not
hundreds of generations. In yet others, the relevant features of the environment change still faster.
There is a standard conception of the interplay between learning and the rate of evolutionary change. Slow
environmental change (or no change) selects for innately encoding the information agents need. For information-
hungry skills are then protected against the vagaries of individual learning environments. If the environment changes
over generational time-frames, there is selection for social learning. Agents that learn from others that bears are
dangerous and that salmon are nutritious avoid the costs of trial-and-error learning, and those costs can be very
end p.221
high. If environments change within the life of a single generation, then there is selection for individual learning, for
the beliefs of others are likely to be out of date (Boyd & Richerson, 1996; Laland, 2001; Richerson et al., 2001). I
think this picture of the evolution of learning in social animals is broadly right and applicable to our descent, for all
three time scales are important in human life. However, the extent of informational demands on human action
introduces novel elements to our evolution.
Evolutionary psychology has emphasized the first of these responses, in defending modular conceptions of human
cognitive organization. Modularity, I shall argue, goes with predictability and environmental stability. Hence modules
innate, domain-specific cognitive specializationsplay a real but limited role in human response to high cognitive loads.
Social learning, likewise, is important, but not for the reason standardly given, namely, to avoid the costs of individual
learning (Boyd & Richerson, 1996). For human learning is very often hybrid learning: it is socially structured,
environmentally scaffolded, trial-and-error learning. No one learns foraging skills just by watching and listening to the
experts, and precious few learn them without these social inputs. In acquiring, for example, the skills involved in using
tools, imitation, instruction, and correction are combined with practice and exploration. This is no accident, for hybrid
learning, I shall argue, is more powerful, more faithful, and more reliable than either pure social learning or
unscaffolded trial-and-error learning (see also Sterelny, 2006). Finally, human individual learning is distinctive not just
in often relying on social scaffolding; it is also dependent on epistemic technology. Humans make tools for learning and
thinking, and these tools vastly extend our cognitive powers. The role of epistemic technology in human thought is the
central theme of the recent work of Dan Dennett and Andy Clark. They are onto something very important. But in
contrast to Clark (in particular) I shall argue that the use of epistemic technology is itself a high-load problem.
Epistemic technology makes us smarter than we would otherwise be. But we had to become much smarter to use this
technology. So my picture of human response to cognitive load borrows from evolutionary psychology, narrowly
defined; from the theory of cultural evolution developed by Richardson, Boyd, and their coworkers; and from extended-
mind conceptions of Dennett and Clark. But it is importantly different from all of those views.
In the rest of this section, I will briefly sketch the three responses: the three strategies for responding to high load
problems. In section 3, I discuss the modular strategy in a little more detail, and in section 4, social learning. I spend
most time on epistemic technology, in section 5. For in my previous work I have underplayed the significance of this
response to high-cognitive-load problems in fast-changing environments.
Human response to high-load problems does sometimes depend on an innately structured module. Language is
genuinely typical of one class of problems humans face. Linguistic competence is critical for fitness. The acquisition
(and perhaps the use) of language is intrinsically difficult. But the organizational features of language may well be
stable. Though language is a complex and subtle system of representation and communication, the information a
language learner needs to master is restricted in kind and is stable. An innate module is a candidate solution to
problems
end p.222
of this class, perhaps evolving via some Baldwin-like process. Some early protolanguage was invented, and it spread
through general learning capacities of some kind. But its invention changed the selective landscape as these
communicative abilities became increasingly central to fitness. Thus the acquisition process became increasingly
buffered from vagaries in environmental input as the system itself became increasingly powerful.

3

3. For a plausible though, of course, speculative picture of the stages through which a crude protolanguage may have been elaborated,
see Jackendoff, 1999.
Capacities that are phenomenologically akin to innate modules can be the result of socially mediated learning. For we
can learn to develop and to automatize quite cognitively demanding skills. A good chess player can make a good,
though not perfect, move on the spot. An expert bridge player can count the cards without conscious effort or
intervention. These skills take a lot of learning, but once learned, they are enduring and effective. And they reveal one
mechanism by which we respond to features of our environment that change at intermediate rates. We reliably develop
automatized skills as a result of prolonged immersion in highly structured developmental environments. The forager's
dilemma is solved by such skills (see, e.g., Diamond & Bishop, 1999). The local ecology of a foraging people is fairly
stable. But it does change. People move, and that changes the ecology, geography, and natural history of their
immediate surroundings. Moreover, many aspects of local habitat change over time, both through the impact of
humans themselves, and through extrinsic causes, especially those to do with climate. So the resource profile of a local
area mostly changes at intermediate rates. Yet if agents are to make good decisions, that profile must be tracked
accurately and used appropriately. In their overview of theories of cultural evolution, Joseph Henrich and Richard
McElreath illustrate this point with a very vivid example. The Bourke and Wills expedition was an attempt to explore
some of the arid areas of inland Australia that ended in failure and death. Local aboriginal people survived without
undue difficulty in the area that killed the expedition, because survival depended on accumulated local knowledge. The
locals had learned how detoxify locally available seeds from which bread could be made, and they had learned how to
catch the local fish. Fatally, the members of the expedition had no such information (Henrich & McElreath, 2003).
Intermediate rates of change do indeed select for social learning, yet the cognitive burden of adaptive foraging
decisions is carried by social learning of a very special kind: automatized skills acquired in learning environments that
are adapted to induce the reliable acquisition of those very skills. As Diamond notes in his account of the natural
history skills of Papuan foragers, the acquisition of skill is neither a process of pure instruction nor of unstructured
exploration (Diamond & Bishop, 1999). Social learning of this kind is a special case of an important evolutionary
phenomenon: niche construction. Many animals alter their environment as well as adapting to it, for example by
building burrows, nests, and other shelters. They partially construct their own niches (Odling-Smee, 1994; Laland &
Odling-Smee, 2000; Laland et al., 2000; Odling-Smee et al., 2003). Humans are extreme examples of niche
constructors. Furthermore, their niche construction takes two
end p.223
very special forms. It is downstream and cumulative: members of generation N engineer not just their own
environment but also that of the N + 1 generation. Moreover, generation N + 1 inherits the effects of generation N,
and further changes the environment, and those further changes become the world into which generation N + 2 is born
and grows. Second, this niche construction is often epistemic. Humans engineer their own informational world and that
of their descendants, transforming the informational character of the problems they must solve. For example, the
invention of psychological vocabulary makes the fact that others think differently from you much more salient. The
nonlinguistic behavior of other agents can show that their beliefs and preferences are unlike your own. But once others
learn to talk about what they think, they attempt to cajole and persuade; and differences in perspective become
inescapable. The reliable acquisition of skill is often the result of this transformation of downstream developmental
environments.
Evolutionary psychologists are rightly struck by the fact that humans all over the world reliably acquire difficult
competences, despite the differences in their personal circumstances. This acquisition process must be entrenched. It is
buffered against the vagaries of individual learning histories. Sometimes, though, this buffering is by environmental
engineering. One crucial chunk of the foraging tool kit is a natural history taxonomy, and it turns out that in forager
cultures such taxonomies are extensive and, in some respects, remarkably accurate. In particular, the species category
turns out to be a universal and central element of forager natural history taxonomies, and this is the basis of Atran's
argument that we have innate natural history modules (Atran, 1990, 1998). My own view is that forager taxonomy is a
consequence of the intersection of (1) inherited perceptual tuning; (2) objective features of the biological worldfor
species are objective units in nature; and (3) engineering developmental environments (see Sterelny, 2003). The
acquisition of folk biology is scaffolded by apprentice learning. As children accompany adults, adult behavior directs
them to salient differences and identifying characteristics of the taxa they encounter. It depends on cultural
representations. Pictures and other enduring representations are obviously very important for contemporary Western
cultures. But preliterate cultures have and pass on the system of nomenclature they have assembled over time, and
this labels differences, making them more salient. Moreover, the process is perceptually scaffolded. Our perceptual
input systems are specially adapted to features of the world important for folk biology. Thus these learning
mechanisms form a complex hybrid. We are perceptually preadapted to notice the relevant features of the natural
environment. Forager children are richly and interactively exposed to that environment. But they are exposed to it in
ways structured by their communities' activities, nomenclature, and lore (and perhaps by active teaching as well). This
combination of perceptual tuning, individual exploration and social scaffolding makes learning much more reliable than
it would otherwise be: notice the failure of the highly experienced, bush-hardened members of the Bourke and Wills
expedition to reach survival threshold by individual learning.
In Thought in a Hostile World (Sterelny, 2003), I focused on the idea that scaffolding developmental environments
offers an alternative to modular solutions of the problem of information load. Downstream epistemic engineering can
scaffold
end p.224
the development of automatized, highly tuned, quasi-modular cognitive skills. I argued that we often needed an
alternative to such solutions, for the computational advantages of modularity depend on environments being
informationally stable over evolutionarily significant periods. Innate structuring obviates or reduces the learning
problem. Encapsulation eases the computational burden on decision by reducing the size of the database to be scanned
and by allowing a module to be optimized for processing particular kinds of data. Both innate structuring and
encapsulation bet on stability. Yet many domains are not stable. Richard Potts has argued that humans evolved in
times of increasing environmental instability (Potts, 1996; see also Calvin, 2002). Even if we had stayed put, our world
would have changed around us. But of course, we have not stayed put, and the effects of migration have to be added
to those of climate change. Moreover, we induce changes in our own environment through niche construction. We
rebuild our own worlds economically, biologically, technologically, and socially.
In developing this argument, I underplayed the problem of fast change and hence underplayed an important aspect of
human niche construction. Humans make cognitive tools: we technologically enhance the capacities of our naked
brains. Dan Dennett and Andy Clark have recently been pressing this point.

4

4. For Dennett's work on these themes, see especially Dennett, 1993, 1995, 1996, 2000. For Andy Clark's, see Clark & Chalmers,
1998; Clark, 1999, 2001, 2002a, 2002b, 2003.
To take the simplest of examples, the practice of marking a trail while walking in the bush converts a difficult memory
problem into a simple perceptual problem. Along similar lines, Dan Dennett points to the epistemic utility of linguistic
labels: if you see that two apparently identical birds are given different names by those around you (say, buff-rumped
thornbill and striated thornbill) you are thereby cued to the existence of a difference you would otherwise almost
certainly miss. Dennett and Clark are onto something important. But my take on cognitive technology is different from
that of Clark (especially). He thinks it explains how it can be that we are much more intelligent than the chimps
without our brains being dramatically reconfigured. In contrast, in section 5, I shall argue that the use of such
technology depends on a very substantial neural upgrade (see also Sterelny, 2004).
3 Cognition on the Baldwin Plan
Evolved, innate modules play a role in the explanation of human response to high-cognitive-load problems. Language
is very likely subserved by such a module, and our naive physics skills are likely to be, too. The standard case for the
modularity of language turns on the difficulty of seeing how language could be learned. This case for innateness is
certainly plausible (though see Cowie, 1998). There does seem to be a large gap between primary linguistic experience
and the principles a competent speaker has mastered. Moreover, with other cognitive competences, but not language,
language itself is available as a learning tool. Notice, though, the connection between innateness and environmental
stability. Innately encoding the general features of language by building in some form of universal grammar stabilizes
this
end p.225
feature of the human environment. Once Baldwin-like evolutionary processes developmentally entrench features of
language, deviant forms will be penalized. There is no temptation to defect from the phonology, syntax, and
morphology of your local community.

5

5. Except perhaps for the minor ways that serve to badge subcultures within a culture (Dunbar, 1999).
Mutants with a variant form of the grammar (even if that variant would be superior if it were common) will be
punished because, presumably, they will find it harder to acquire the language of their local community. But, as
importantly, language use also makes a modular hypothesis attractive. The information we need to decode speech is a
small and predictable subset of the total informational resources of an agent.
Suppose Two Aardvarks hears Old Bear say:
Hairy Max gave Spotted Hyena the spear.
To understand the utterance of Old Bear, Two Aardvarks must identify the organizational features of the utterance: its
segmentation into words and phrases, and the overall organization of those constituents. Sentences must be identified
and parsed. A computational mechanism using restricted but especially relevant information could accurately and
efficiently parse sentences. For the relevant information is predictable. The general organizational features of language
may well be a stable target onto which an evolved, innately structured mechanism can lock. However, to make it worth
his while to listen to Old Bear, Two Aardvarks must do more than recover the structural skeleton of Old Bear's
utterances. He has to lock onto the semantics of those utterances. He has to understand that Old Bear is conveying
news about a spear, Hairy Max, and Spotted Hyena. There is great controversy about the nature of the cognitive
demands these tasks impose on Two Aardvarks. But however that controversy is resolved, it is likely that much of the
relevant information is predictable. If Old Bear intends to convey news about spears, he will standardly execute that
intention using the term spear, and a special -purpose database can be set up incorporating that regularity. The
specific term for a spear is an accidental feature of this linguistic community. But the existence of terms for artifacts
and the practice of communicating about artifacts by using those terms is not. Whatever the nature of symbolic
reference, the existence of lexical items of this class is a stable feature of human environments.
In short, a module exploiting a restricted, special -purpose database whose contentsin general or in detailcan
remain constant over the generations could probably solve the parsing problem. But this depends on two special facts
about language. First, the organizational aspects of language are not tightly tied to other aspects of cognition. It is
quite likely that there has been a spectacular flowering in our causal and technical reasoning about our physical
environment in the last 100, 000 years. Such a flowering, presumably, has lead to a considerable coinage of new
vocabularybut not to new kinds of vocabulary. Moreover, that coinage leaves the organizational features of language
intact. Those features are content-neutral. In virtue of this neutrality, cognitive change in our lineage can be cordoned
off
end p.226
from the organizational features of language, and that allows these features to be stable.
Second, in one crucial respect, there is no evolutionary conflict of interest between speaker and listener. Whatever the
long-term aims of speaker and audience, it is in the interests of the speaker to have his utterance parsed properly,
and to have his sentences understood; understood in the minimal sense that Two Aardvarks understands that Old Bear
is talking about spears and about Spotted Hyena. In identifying structure and topic, there is no arms race between
deceptive signaling and vigilant unmaskingunmasking that might require all the informational resources of the
audience. Where there is no temptation to deceive, coevolutionary interactions will tend to make the environment
more transparent and the detection task less informationally demanding. The same is not true of Old Bear's overall
plans, and hence it is not true of the pragmatics of language. His desire to persuade Two Aardvarks to go on a wild
elephant hunt might well be subverted by Two Aardvarks's recognition of that further intention. A modular solution to
the informational load imposed by language is plausible because important features of language are stable, and that
stability is no accident. It is a coevolutionary achievement, depending on specific features of language and of
communication.
4 Niche Construction: Engineering Developmental Environments
In a recent article on the coevolution of our mental architecture and our interpretative capacities, Peter Godfrey-Smith
(2002) sketches out one scenario that he thinks naturally leads to the expectation of an innate folk psychology. He
pictures interpretation as beginning in a hominid population that has evolved enough behavioral complexity for the
prediction of behavior to be difficult. Some individuals, though, are able to develop a simple framework to predict the
action of other agents. This achievement gradually changes the social environment. Interpretative capacities that were
initially advantageous but patchily distributed through the population come to be mandatory for effective social life. So
there is selection on that population for more reliable and accurate development of this predictive framework.
Development is accelerated and canalized, increasingly decoupled from signals from the environment (Godfrey-Smith,
2002).
A quite different possibility emerges once we build into our evolutionary scenario the full human propensity for
engineering our own environments. Selection for interpretative skills could lead to selection for actions that scaffold the
development of the interpretative capacities, rebuilding the epistemic environment of the developing agent. Moreover,
folk psychology does not have to be built from scratch. We are likely to have perceptual systems tuned to facial
expression; signs of affect in voice, posture, and movement; the behavioral signatures that distinguish between
intentional and accidental action, and the like. These systems make the right aspects of behavior, voice, posture, and
facial expression salient to us. Moreover, these perceptual adaptations come to operate in a developmental
environment that is the product of cumulative epistemic engineering, engineering that scaffolds the acquisition of
interpretative skills.
end p.227
Even so, mental states are unobservable causes of behavior. So the task of learning folk psychology might seem
especially difficult, depending as it does on an inference from effects to their hidden causes (Scholl & Leslie, 1999). In
adults, the connection between psychological state and action can be very complex and indirect, and that may
reinforce the suspicion that folk psychology must be largely innate. But the step from effect to hidden cause may itself
be scaffolded. When children interact with their peers, the connections between desire, emotion, and action will often
be very direct. Moreover, introspection might play a role in suggesting the hypothesis that others have mental states
analogous to one's own.

6

6. Recently it has been the received view of developmental psychology that knowledge of first-person and knowledge of third-person
mental states develop in parallel, so first-person knowledge could not scaffold third-person knowledge. Nichols and Stich (2003),
however, have recently pointed out that the case for complete parallelism is far from clear.
As children mature, they learn to inhibit impulses, and their actions become much more sensitive to spatiotemporally
displaced information and motivation. But when interacting with their peers, the inference from effect to cause will
often be much less challenging. Children are less good at concealing overt signs of their emotions than adults, and less
good at resisting the urge to act on those emotions. With four-year-olds (as I can testify), the behavioral regularity
that links overt desire for an object in the immediate vicinity with an attempt to take possession of that object is close
to exceptionless. As three- and four-year-olds are making crucial developmental transitions, the lack of inhibition of
their peers simplifies their epistemic environment.
As I see it, then, the acquisition of folk psychology, like that of folk biology, is a hybrid learning process. It depends on
perceptual preadaptation, individual exploration, and a socially structured learning environment. In particular, the
reliable development of interpretive capacities is supported by the following factors.
end p.228
Thus a cognitive task that might once have been very difficult, or achievable only at low levels of precision, can be
ratcheted both to greater levels of precision and to earlier and more uniform mastery, by incremental environmental
engineering. Like Alison Gopnik, but for very different reasons, I think something science-like is going on as children
acquire folk psychology. Science genuinely does trade in theories, and these really do pose a poverty of the stimulus
problem. The gap between experience and scientific theory can be crossed only if individual environments are very
extensively epistemically engineered: only by the social organization and working traditions of science. In acquiring folk
psychology, and in contrast to many scientific domains, we are psychologically tuned to the right features of the world.
So acquiring it is a less intimidating problem. Children do not have to be scientiststo be wired into those very special
environmentsto solve this discovery problem. They need only the help of rather more modest epistemic engineering.
Something somewhat science-like is going on in the development of our interpretative capacities. But it is not the
operation of especially powerful autonomous learning mechanisms within individual agents; rather, our environments
have been epistemically engineered in ways that circumvent the cognitive limits of individuals on their own.
To sum up the argument: we do not have to appeal to innate and canalized development to explain the early and
uniform development of fast, unreflective, powerful, and accurate cognitive mechanisms. We have a second model
1. Perceptual mechanisms make crucial clues of agents' intentions salient to us. Folk psychology is scaffolded by
perceptual tuning.
2. Children live in an environment soaked not just in behaviorally complex agents but with agents interpreting one
another. Children are exposed both to third-party interpretation, and to others interpreting them. Much of this
interpretation is linguistic, but there are also contingent interactions in which the child is treated as an agent:
imitation games, joint attention, joint play (see, e.g., Tomasello, 1999b).
3. Learning is scaffolded by particular cultural inventions: for example, narrative stories are full of simplified and
explicit interpretative examples.
4. There are folk psychological analogues of Motherese. Parents who interact with small children often rehearse
interpretations of both their own and their infant's actions.
5. Language scaffolds the acquisition of interpretative capacities by supplying a premade set of interpretative tools.
Thus linguistic labels help make differences salient.
6. Interpretation is scaffolded by interacting with agentsyour developing peerswho have not yet gained the abilities
to mask their emotions, inhibit their desires, and suppress their beliefs. Such agents simplify the problem of
inferring from action to its psychological root.
automatized skillsand it is easy to overlook their cognitive power. By the time they were 12, the Polgar sisters were
of international master class, and improving. Their chess competence was acquired early. It was fast, powerful, domain
specific, often unreflective. However, the sisters did not acquire their chess competence by unstructured trial-and-error
learning. Rather, those skills were acquired in a highly structured, chess-soaked developmental environment. A
behavioral competence that might seem to be the signature of an innate module can be produced by a highly
structured developmental environment. Of course, chess is not a perfect model of folk psychology, for chess is not a
field of hidden causes. Even so, it is a model of how a fast, automatic, and sophisticated cognitive specialization can
develop in an appropriately scaffolded environment without depending on specific innate structure.

7

7. Specific matters here. I think it very like that the notion of a cause itself is innate. And if naive physics is indeed an innate module,
it may provide conceptual templates for the idea of a hidden cause (like that of a force or an inner essence) that can be exported to
other domains.
Niche construction provides an alternative explanation of folk psychology. We are all Polgars with respect to the chess
game of social interaction.
The argument, so far, has not placed any weight on environmental change. Even if there is a universal and stable
human nature that folk psychology tracks, folk psychology could be built through downstream niche construction rather
than
end p.229
via its Baldwinization. The converse is not true, and I doubt that there is a universal and stable human nature.
Automated skills vary from culture to culture and individual to individual, and these skills profoundly change an
individual's cognitive profile. Consider the differences in quantitative reasoning competence between an agent who has
mastered the number system with positional notation and one who has not. Likewise, patterns of emotion and the
propensity to act on emotion varies importantly (see, e.g., Nisbett & Cohen, 1996). There is certainly some evidence
that as folk psychological skills develop from the skeleton of belief and preference, cultural differences in folk
psychological vocabulary become apparent (Nichols & Stich, 2003, pp. 2059). In short, changes to human
environments have profound developmental consequences. To the extent that we think successful interpretation
depends on tracking contingent and variable aspects of the way others think, we should doubt that interpretive
capacities depend on innate folk psychological principles.
5 Epistemic Technology
Let me now turn to the phenomena I have previously somewhat neglected: the role of epistemic technology in
mitigating the problem of information load. I shall begin by sketching some of the forms of epistemic technology. Most
obviously, we alter our environment to ease memory burdens. We store information in the environment; we recode it,
and we exploit our social organization through a division of intellectual labour. Our contemporary environment is full of
purpose-built tools for easing burdens on memory. These include diaries, notebooks, and other organizers. Filofaxes
are new tools, but purpose-built aids to memory are certainly ancient. Pictorial representation is over 30,000 years
old. Furthermore, and deeper still in the past, ecological tools have informational side effects. A fish-trap can be used
as a template for making more fish-traps (Mithen, 2000). Moreover, we recode information in public language to make
it easier to recall. In songs, stories, and rhyme, the organization of the information enables some elements to prime
others. Such recoding enables us to partially substitute recognition for recall. The division of intellectual labor also
reduces the memory burden on individuals; no one has to master all the information a group as a whole needs.
We transform difficult cognitive problems into easier perceptual problems. We do this when we re-present quantitative
information as a pictorial pattern, in pie charts, graphs, maps. Likewise, we transform difficult perceptual problems into
easier ones. For example, in shaping wood with a chisel and hammer, it is useful to mark the spot you intend to strike,
making it easier to focus attention on the exact working surface.
We transform difficult learning problems into easier ones. For we alter the informational environment of the next
generation. We do not just provide information verbally: learning is scaffolded in many other ways. Skills are
demonstrated in a form suited for learning. Completed and partially completed artifacts are used as teaching props.
Practice is supervised and corrected. The decomposition of a skill into its components is made obvious; subtle elements
will often be exaggerated, slowed down, or repeated. Moreover, skills are often taught in an optimal sequence,
end p.230
so that one forms a platform for the next. Engineered learning environments play their most obvious role in
intergenerational information flow, but these techniques also mediate horizontal flows of information.
We engineer workspaces so that frequent tasks can be completed more rapidly and reliably. For example, skilled
bartenders use the distinctive shapes of glasses and their sequence to cue recall for customers' orders and to code the
order in which they will be served. Their ability to respond accurately to multiple simultaneous orders plummets if they
are forced to use identically shaped glasses (Clark, 2002b). Cognitive tools, too, are simplified and standardized to
enhance performance on repeated tasks. Improvements in notation systemsthe switch from imperial to decimal
currency and measurementmakes many routine calculations easier, faster, and less error-prone.
Finally, as Dennett in particular argues, cognitive technology also has profound developmental effects. For example,
Dennett (2000) distinguishes between the capacity to have beliefs about beliefs and the capacity to think about
thinking. On his view, even if nonhuman primates have beliefs about beliefs, they cannot think about thinking. Agents
in a culture with enduring public symbols inherit an ability to make those symbols themselves objects of perception
and to manipulate them voluntarily. Imagine a group of friends making a sketch map in the sand to coordinate a hike.
Those representations are voluntary and planned. Dennett suggests that we first learn to think about thoughts by
thinking about these public representations. In drafting and altering a sketch map, we are using cognitive skills that
are already available. They are just being switched to a new target. Moreover, manipulating such a public
representation makes fewer demands on memory; no one has to remember where on the map the campsite is
represented. Rich metarepresentational capacities are developmentally scaffolded by an initial stage in which public
representations are objects of thought and action. While obviously very speculative, this idea seems very plausible to
me.
In summary, epistemic technologybuilding tools for thinking, and altering the informational character of your
environmentmakes possible much that would otherwise be impossible. Moreover, for the most part, the effectiveness
of epistemic technology is not linked to the pace of environmental change. Optimizing your workspace; turning
memory tasks into perceptual ones; using templates, public representational media, and good notation systems all
enhance your capacity to learn about your environment. And they do so independently of the pace at which that
environment changes. But though epistemic technology plays a crucial role in explaining human intelligence, the use of
epistemic technology is itself informationally demanding. I think Clark, in particular, tends to overlook this point. For he
focuses too much on epistemic tools that are specifically tied to a single agent (Clark, 2001, 2002a, b). For example,
in The Extended Mind, Andy Clark and David Chalmers develop a thought experiment about an Alzheimer's sufferer
(Otto) who manages his problem by writing down crucial information in a notebook. They argue that the information in
the book plays the same functional role for Otto that ordinary belief plays for other agents (Clark & Chalmers, 1998).
That is not quite right. Otto's external memory is less reliable after dark; when he forgets his glasses; when his pen
leaks or his pencil breaks; when it rains and his book gets wet. And we
end p.231
have not yet considered epistemic sabotage by other agents. To the extent that others have access to his notebook,
Otto is at risk of thought insertion and deletion. These problems do not arise for such of Otto's information that he still
codes internally.
Clark's favored examples of the use of tools to extend our cognitive abilities tend to be of solitary activities: an
academic writing an article by revising drafts, cutting, pasting, and annotating his way from one version to the next.
Problem solving is not typically such a solitary vice. Think instead of conversations, discussions, brainstorming.
Likewise, scientific labs are shared spaces, and the tools are often shared tools; notebooks, experiments, programs,
and articles are more often than not the result of many hands and minds. The same is true of decision and action in
many commercial and administrative organizations. Files, for example, are often joint products. In short, epistemic
technology is often used in a public and sometimes contested space, and this has important implications for the
cognitive demands it imposes.
end p.232
Time to sum up this stage of the argument. In discussing epistemic technology, I have had four aims. The first was to
highlight the variety and the potential power of epistemic technology. The second was to show the developmental
consequences of epistemic engineering. The purely internal mechanisms of the mind become more powerful as a result
of using epistemic tools (Dennett, 1993, 1996, 2000). In these respects, there is no difference between my views and
those of Dennett and Clark. In addition, though, I have pointed out, third, that these techniques make few assumptions
about the pace at which environments change. Even in a fast-changing world, they enhance the power of individual
learning, and they enable solutions to be spread and improved horizontally. Finally, and very importantly, the use of
epistemic technologies has evolutionary consequences. For tool use is itself a high-burden activity. The use of such
technology is itself an aspect of the selective landscape that has transformed human cognitive capacities. Epistemic
technologystoring information in the world, and improving the local epistemic environmentis not a way of making a
dumb naked-brain smart by adding the right peripherals; it is not a way of making dumb brains a part of smart
systems. As with the other strategies, epistemic technology is not a complete solution in itself to the problem of
cognitive load. The use of epistemic technology itself must be supported by some mix of quasi-modules and modules.
Let me end with a quick review of the argument. Contra the fast-and-frugal heuristics program, much human decision-
making has a high information load. Good decisions require access to, and use of, generous amounts of information. I
have sketched three evolutionary responses to this problem. All are important, for response depends on the rate of
environmental change, and different aspects of human environments change at very different rates. Even so, I have
emphasized nonmodular evolutionary responses to high information loads, in part because they have been less
1. Jointly used epistemic artifacts are often less than optimal for any of their users: they need to be individualized at
each use. Moreover, though human interactions are often cooperative, they are not exclusively so. The possibility of
deception and the hidden agendas of others cannot be ignored. Files are sometimes doctored, and their users have
to be alert to this possibility. Agents using common tools cannot afford to be dumb.
2. Public representations have to be interpreted. Thus maps of an underground system typically represent the order of
the stations and the connections between the various lines, but they do not map the distance between stops.
Moreover, these features of maps and similar representations are variable and contingent, so they cannot simply be
implicit in the automatic routines for the use of a representation.
3. Models and templates also require interpretation. A fish-trap carries information about how and where to make
other fish-traps. But the template cannot be blindly copied, even by an agent who could commit every detail to
memory. A fish-trap has to be modified for its individual location: for the specific tidal inlet it will block at low tide.
That is often true of artifacts. When another agent makes an artifact for her own purposes, it is rarely ideal for me.
The other agent may be larger or shorter; weaker or stronger; a left-hander. I shall need to modify as well as copy
her production.
4. Symbol systems are now among our most important epistemic artifacts. Without positional notation and without
algorithms that decompose large arithmetic operations into elementary ones, accurate quantitative reasoning would
be impossible. Yet the appropriate use of these symbol structures is cognitively demanding. The innumerate are not
rare in Western societies, even though they make serious attempts to make numeracy skills universal. The arbitrary
symbol systems of language impose greater demands still. Counterdeception is a problem whose informational load
is both heavy and unpredictable: there is no telling in
advance what you will need to know in order expose another as a liar. This vetting problem is particularly pressing
for linguistically coded information. The arbitrariness and stimulus-independence of linguistic symbols make
language a powerful system. But they also make it a deception-subject system.
5. The use of epistemic tools in a public space involves quite complex problems of coordination. A recipe is a fairly
standard example of an epistemic artifact. So consider a group of friends jointly producing a meal by following a
recipe. Each agent must (1) monitor what others are doing; (2) negotiate a division of tasks; (3) negotiate a
division of shared space and shared work surfaces; (4) negotiate a division of shared toolswho gets to use which
chopper when. Successful coordination depends what the agents know of one another, their materials, and their
tools. We often solve such problems effortlessly, but that shows we are smart, not that the problems are easy.
discussed, and in part because I doubt that many aspects of human environments are stable on evolutionary time-
frames.
end p.233
end p.234
This page intentionally left blank.
Part III Morality, Norms, and Religion
end p.235
end p.236
15 How Good Is the Linguistic Analogy?
Susan Dwyer
Astriking fact about humans is that they demonstrate quite sophisticated sociomoral normative sensitivity from as early
as the second year of life. Over two decades of study in experimental and naturalistic settings, some carried out
crossculturally, shows that very young children not only have the capacity to recognize sociomoral rules but also have
the capacity to distinguish between different subtypes of such rulesin particular, between moral and conventional
rulesas evidenced in their differential responses to and reasoning about associated transgressions. Three- to four-
year-olds understand that moral rules differ from conventional rules in terms of two main criteria: the former have
force that is independent of any particular authority (e.g., God, parents, social custom) and are closely tied up with
considerations of harm and injury (see Nucci, 2001; Turiel, 1983, 1998).
More recently, it has been shown that children of the same young age grasp the import of deontic conditionals, or
permission rules (e.g., If Sally plays outside, she must wear her hat); they easily and accurately identify violations of
such rules, and they distinguish between intentional and accidental violations thereof (Cummins, 1996; Harris & Nez,
1996; Nez & Harris, 1998). Together with the vast amount of data from studies documenting infants' empathy and
one-year-olds' helping and comforting behavior (e.g. Dunn et al., 1995; Hoffman, 1983; Zahn-Waxler & Hastings,
1999), this work strongly suggests that some basic moral capacities are in place quite early in development. A pressing
empirical question is how these capacities are acquired.
A further striking fact about our species is that all (normal) humans develop into moral agents, that is, into creatures
with (at least) the following moral capacities: the ability to make judgments about the moral permissibility, moral
impermissibility, and moral obligatoriness of actions in actual and hypothetical, novel and familiar cases;
end p.237
the ability to register morality's special authority (i.e., the fact that moral imperatives are nonhypothetically binding
and sometimes contrary to self-interest); the ability to make attributions of moral responsibility for actions (as distinct
from attributions of mere causal responsibility); and the ability to recognize the force of excuses.
While moral capacities are present early in life and are virtually universal across the species, there appears not to be
universal agreement about which actions are morally permissible or obligatory, or about which creatures are owed
moral concern. So, in addition to the acquisition question, we are confronted with the task of explaining the diversity
within unity of human moral life.
My own view is that a nativist moral psychology provides the best framework for explaining these facts. In particular,
as I have argued elsewhere, there are interesting parallels between the nature and development of human moral
competence and the nature and development of human linguistic competence (Dwyer, 1999). In my view, this
strongly suggests that the appropriation of some concepts and methodology from theoretical linguistics will be useful
for working out the nativist details in the moral domain. This approach is sometimes characterized as pursuing the
linguistic analogy (LA).
While not the only moral nativist game in town (cf. Nichols, 2005), I will show that the LA provides a superior
framework for seeing what is at stake in the claim that we are innately moral creatures and for making real progress
in discovering what that claim amounts to in detail.

1

1. That we are innately moral creatures does not entail that we are innately morally good creatures. Moral nativists are not naive, nor
need they be especially sanguine about human behavior.
Moreover, I shall here develop the LA in a way that demonstrates that it does not entail normative relativism.
After a brief recapitulation of a poverty-of-the-moral-stimulus argument, I turn to the issue of moral differences, and
sketch a view according to which something akin to a universal moral grammar provides a set of parameterizable
principles whose specific values are set by the child's environment, resulting in the acquisition of a moral idiolect, or I-
morality. This moral parameters model has not been subject to empirical investigation, and it may not in the end
prove to be the correct way to pursue the LA. Nonetheless, together with the poverty-of-the-moral-stimulus argument,
it throws into sharp relief some central challenges for anyone wanting to work out a human moral psychology.
In the background of all this is a big picture reason for looking to linguistics for help in thinking about morality
namely, that human moral capacities reflect the operation of a genuine competence. The idea is not merely that there
are poverty-of-the-moral-stimulus arguments and that morality is a universal but heterogeneous human institution. My
suggestion is that moralitylike languageis underpinned by a human normative competence, the possession of which
both allows us to and makes us see the world in moral terms, while also making possible the acquisition of particular
capacities that allow us to negotiate a world so conceived, in ways that are sensitive to local conditions.
But let's return to the children.
end p.238
1 Poverty of the Moral Stimulus
At the outset, I cited some facts about the moral capacities that all children apparently acquire very early in life in the
normal course of events. The capacity to distinguish between different sociomoral normative domains and the
heightened sensitivity to permission rule violation appear to be central aspects of adult human moral competence.
These capacities do not represent a sort of protomorality limited to childhood. Rather it would appear that, over a
remarkably short of period of time, human children acquire moral capacities that are shared with adult members of
their communities.

2

2. We should note that not everyone believes that Mead ultimately endorsed the extreme form of cultural determinism that we ascribe
to her in the text; but there is no doubt that many of her followers have interpreted her that way.
It is also worth emphasizing that the capacities in question concern a certain sort of cognition, or way the human
mind-brain negotiates the world. The claim is not that children make the same particular moral judgments that adults
makesay, that it is permissible to eat nonhuman animals; though it should not be the least surprising that young
children parrot their parents' pronouncements. The capacities in question are more fundamental. Arguably, the capacity
to distinguish between a moral rule violation and a conventional rule violation needs to be in place before any
judgments about the moral permissibility of a particular action or practice can be made. And any plausible acquisition
story must explain how all (normal) children come to have this quite abstract capacity in the normal course of
development.
Traditional social-learning theory (e.g., Bandura, 1986) and other empiricist accounts claim that children are able to
learn all they know about morality on the basis of observation, (perhaps) coupled with an innate general-purpose
learning mechanism. Such approaches must assume that there is sufficient evidence of the right type available to all
children in all environments to explain the fact that three-year-olds grasp the difference between moral and
conventional transgressions. For example, it might be argued that moral rules are manifest in behavioral regularities in
the child's environment (children are able to recover specifically moral rules from their environment); that children are
explicitly encouraged to be good little boys and girls (children get lots of positive evidence concerning what is morally
required of them); and that children often meet with emotionally charged reactions from their caretakers when they act
in less than morally admirable ways (children get lots of negative evidence concerning what is morally required of
them). But this won't do. First, empiricist accounts radically underestimate the complexity of the task that faces the
young child with respect to rule recovery. Second, the positive and negative evidence adverted to is either irrelevant
to or inadequate to explain the child's acquisition of the capacity to distinguish moral and conventional rules.
To be sure, the general acceptance and following of rules among adults in a community is liable to result in behavioral
regularities that a child can observe. But there are regularities and regularities. Consider, for example, the matter of
telling
end p.239
the difference between rule-governed behavior and merely accidentally regular behavior. Suppose that in the Smith-
Jones household there is a rule, unbeknownst to two-year-old Lisa, that glass containers go in the right-hand side of
the recycling bin and plastic containers go in the left-hand side of the bin. Imagine further that left-handed Jones
typically lays the breakfast table, which results in the Wheaties box being placed on the table in the same orientation
each day. Young Lisa will observe two very regular sequences of events or dispositions of objects. But how, absent
explicit instruction, will she learn to discriminate between the rule-governed behavior concerning recyclables and the
merely accidental but regular placement of the cereal box? Since elements of the world rarely come with labels, it is
highly implausible to claim that Lisa will manage to learn, just by observation, to make the discrimination.
Of course, caretakers do engage in some explicit instruction: Lisa, remember the plastic bottles go in here. But there
is simply not sufficient time to explicitly characterize every regularity to a child. And Lisa's parents probably
themselves do not notice the accidentally regular placement of the cereal box.
The problem for the empiricist is worse. Presume, for the sake of argument, that the child does manage to make the
discrimination between rule-governed regularities and merely accidental regularities just on the basis of data available
in her environment. How does she, then, just by observation, learn that some rule-governed regularities are merely
conventional (forks go on the left for right-handed diners) while others are moral (promises ought to be kept).
One could suggest that caretakers' differential reactions to infractions of these types of rules might provide the child
with some guidance. It might be argued, in particular, that caretakers have particularly strong or emotionally
distinctive responses to children's moral transgressions as opposed to their conventional transgressions. So far as I
know, there is no evidence to support this hypothesis. Some parents get just as hot under the collar about
conventional transgressions as they do about moral transgressions. (In some middle-class households, etiquette is
taken very seriously). Moreover, it is likely that conventional transgressions outnumber moral transgressions, offering
little opportunity for the child to observe the peculiar type of emotional reaction allegedly associated with a moral
transgression. And there is evidence that caretakers more often correct or admonish conventional transgressions than
they do moral transgressions (see Nucci, 2001; Smetana, 1989). Finally, even adults have difficulty distinguishing
between strong emotional reactions: is my interlocutor angry, disgusted, irritated, or disappointed with my action? It's
hardly likely that very young children are any better at making fine-grained discriminations between the emotionally
laden responses of their caretakers.
Again, it must be conceded that caretakers do provide explicit moral instruction. The nativist need not deny this. But
she will question whether this instruction provides every child with sufficient data to acquire the capacity we are
investigating.
First, it's worth noting that You ought to keep your promises has precisely the same form as You ought to put the
fork on the left. I've told you before, don't do that! is as appropriate after a hair-pulling as after an episode of food-
throwing. In other words, there appears to be little in the positive evidence concerning rule-violations generally that
would cue the child to whether a moral or a conventional
end p.240
rule has been transgressed. Second, there may well be a paucity of negative evidence concerning the distinction
between the two types of rules. Very roughly, negative evidence is evidence that the child can use to correct a false
assumption she has made or that she can use (in this case) to eliminate a candidate criterion for making the
discrimination.
At best, it seems that children can become aware that the adults around them exhibit some regularities, sometimes
their caretakers codify those regularities by uttering ought statements, and their caretakers seem to care about
whether those ought statements are obeyed.
The nativist claim is not that there is no information in the child's environment relevant to her acquisition of the
capacity to distinguish between moral and conventional rules. The nativist's concern is whether that information is
sufficient to explain the capacity the child possesses and whether it is available to all children. At present, I don't think
we can be sure that it is. Moreover, I have just discussed the acquisition of a single capacity. Nothing has been said
about how very young children come to grasp the difference between deontic and indicative conditionals. One might
speculate that that capacity is even more abstract than the one just outlined, and thus that an empiricist account of its
acquisition will be even less plausible.
Poverty of the stimulus arguments get traction when we are confronted with the early acquisition of some distinctive
capacities that appear to be universal across the species and cannot be explained on the basis of the positive and
negative evidence available to children everywhere. The conclusion is that the childor, more precisely, the child's
mind-brainmust contribute something to the process of acquisition.
Such arguments play a central role in linguistics (Crain & Pietroski, 2001; Laurence & Margolis, 2001). The conclusion
of such arguments in linguisticswhich, it must be noted, operate in a domain where we have a much richer and more
specific characterization of the relevant capacities (i.e., explananda)is that the child's mind-brain contains (at some
level of abstraction) a language acquisition device (or language faculty) that makes possible the acquisition of all and
only humanly possible languages. The language faculty is characterized in terms of a set of rules, principles, and
constraints (universal
FIGURE 15.1 An account of the language faculty.
end p.241
FIGURE 15.2 An account of the moral faculty.
grammar) that determine what aspects of her environment a child needs to pay attention to and determines, together
with what the child hears around her, her mature linguistic competence, also called her I-language, or idiolect. This
account can be illustrated as in figure 15.1.
A similar proposal is very tempting as the conclusion of the poverty-of-the-moral-stimulus argument: the child's mind-
brain contains (at some level of abstraction) a morality acquisition device (or moral faculty) that makes possible the
acquisition of all and only humanly possible moralities. The moral faculty is characterized in terms of a set of rules,
principles, and constraints (universal moral grammar) that determine what aspects of her environment a child needs to
pay attention to, and, together with what she hears and sees around her, determines her mature moral competence,
which we can call her I-morality, or moral idiolect.

3

3. Since the use of the expression universal moral grammar is apt to lead to misunderstandings, two important caveats must be
entered here. First, while the content of universal grammar must be adequate to the task of explaining the productivity of language,
moral nativism inspired by the LA need not involve this constraint. That is, when the moral nativist speaks of a moral grammar, she is
not speaking of a set of principles that will generate all and only (say) true moral judgments. Second, neither the linguistic nativist nor
the moral nativist need make any particular claims about how their respective grammars are represented. Obviously, if there are
innate human capacities, they must be encoded in some way that permits genetic transmission. But this leaves it wide open how
grammars are manifested in actual mind-brains (Jackendoff, 2002).
This account can be illustrated as in figure 15.2.
2 Moral Parameters
So far I have discussed how one appropriation from linguisticsthe poverty of the stimulus argumentmight help us
address the empirical questions concerning how children acquire the moral capacities they do. But I mentioned another
fact to which moral psychology must pay attention: while all (normal) human beings become moral agents, there is
diversity among the particular moral judgments that such agents are wont to make. The situation seems to be this:
quite abstract moral capacities that are universal (e.g., marking the distinction between the moral and conventional,
making judgments of permissibility, and attributing moral responsibility) are exercised in ways that appear to be
subject to local variations.
The former point is addressed by positing the existence of an innately given moral faculty. Explanation of the latter
point might benefit from thinking about how linguists explain differences among the world's languages. The general
issue can put be more precisely: the content of the language faculty must be general enough that any child in any
linguistic environment can acquire a (humanly possible) language, and yet it must make possible the acquisition of
different languages.
The principles and parameters approach (Baker, 2001; Chomsky, 1981; Lightfoot, 1991) is one very powerful and
influential account in linguistics of the presence of variation against the backdrop of deep similarities. But before
describing that account, and how it might help us think about moral difference, it will be useful to be a bit clearer
about some of the concepts that are (explicitly and implicitly) already in play.
Earlier, I referred to a speaker's I-language as the manifestation of his mature linguistic competence, and, pressing the
LA, we can refer to a moral agent's
end p.242
I-morality as the expression of his mature moral competence. A speaker's competence is something he acquires on the
basis of two things: how his mind-brain is built and the linguistic environment in which he grows up. The powerful
Chomskian idea is that the human mind-brain is built in a way that radically constrains its interaction with the world. A
human child cannot acquire birdsong competence; and the range of languages she can acquire is itself severely
constrained. This is the sense in which universal grammarunderstood as part of the innately specified, abstract
functional architecture of the human mind-braincircumscribes a space of (linguistic) possibilities. Furthermore, a
speaker's competence, once acquired, radically constrains her perception of and linguistic action in her linguistic
environment. Her I-language represents oneand not a host of other logically possibleway(s) of so perceiving and
acting. The absolutely central point is this: in essence, a competence is a normative structurethat is, something that
effects a highly constrained mapping from one type of thing to another. In the case of language, the mapping is from
signals (sounds) to meanings (fig. 15.3).
The structure of and content of a speaker's competence is what explains why she attributes certain meanings and not
others to the signals to which she is exposed. And we discover the content and structure of a speaker's linguistic
competence by collecting her so-called acceptability judgments. If a speaker judges that some string is okay in her
language, then we know that the normative structure of her linguistic competence permits the relevant construction.
Here is a simple example (from Jackendoff, 2002, p. 16); there are literally thousands of others.
English speakers will judge that (1) is okay while (2) is not-okay (as indicated by the asterisk). This suggests that
the grammar of English contains a rule according to which an anaphor in object position must be coreferential with the
subject of the clause in which it appears. In (1), himself must refer to Fred and not to Joe. English speakers judge
(2) to be unacceptable because that rule is violated: you and himself cannot be coreferential. (Linguists will say that
[2] is ungrammatical, because it violates a rule of grammar.)
Acceptability judgments are also crucial to the task of understanding the ways in which human languages differ. Here
is another very simple example. An English speaker will judge (3), but not (4), to be okay; whereas an Italian
speaker will find both (5) and (6) acceptable.
(1) Joe thinks that Fred adores himself.
(2) *Joe thinks that you adore himself.
(3) I am going to the cinema.
(4) *Am going to the cinema
(5) Io vado al cinema.
Since speakers' acceptability judgments provide evidence for the content of linguistic competence, this pattern of
judgments provides some evidence for the ways English differs from Italian. The English speaker's competence imposes
a
end p.243
constraint concerning the pronunciation of the subject of a sentence. The Italian speaker's does not. In English one
must always pronounce the subject of a sentence, while Italian permits sentences with no overt subject in the main
clause.
Linguists refer to such features that distinguish groups of languages from one another as parameters. (The parameter
in question above is called the null subject parameter.) The idea is quite simple. It is hypothesized that some principles
of universal grammar contain variables that are initially unspecified; specific values for these variables are determined
by the linguistic input to which the child is exposed. A useful metaphor is that of a switch: a parameter, in principle
able to be on or off, is switched either on or off.
This is not the place to provide a thorough account of parameters. However, it is worth emphasizing three important
points. First, the effects of setting of a parameter to on or off, as it were, are noticeable throughout a language.
For example, whether a language is a null subject language or not determines the acceptable form of questions formed
from declarative sentences. Since Italian is a null subject language, both (7) and (8) would be judged as acceptable by
native Italian speakers:
Hence (9) (Baker, 2001, p. 42) is a perfectly acceptable-sounding question to Italian speakers, but not to English
speakers:
Sentence (9) is okay in Italian but not in English, because questioning the subject position in an embedded clause
requires moving a question word to the front of the sentence, and this leaves behind a tensed clause with no overt
subject. English doesn't tolerate this. This is a relatively small difference. Some languages appear to differ quite
profoundly. Still, it turns out that what appear to be massive differences between languages are explicable in terms of
the variable setting of a single parameter (see Baker, 2001, on the polysynthesis parameter).
Second, there is good reason to believe that the setting of parameters makes the task of language acquisition much
easier for the child. Consider, for example, the head directionality parameter: either Heads follow phrases in forming
larger phrases or Heads precede phrases in forming larger phrases (Baker, 2001, p. 68). English is a head-first
language. Hence, in (10)(12), the complement prepositional phrase at Charles comes after the head, irrespective of
whether the head is a verb, a noun, or an adjective.
A child growing up in an English-speaking environment will be able to set this parameter on the basis of exposure to a
wide range of triggering data; any sentences
end p.244
FIGURE 15.3 The acquisition of linguistic competence: representation of the principles and
parameters model.
(6) Vado al cinema.
(9) Chi credi che _______ verr?
(Whom you-think that will -come?)
*Whom did you say that _______ will come?
(10) Mallory swore at Charles.
(11) Mallory's amazement at Charles.
(12) Mallory is mad at Charles.
of the forms (10)(12) will do. And, supposing that the parameter is set on the basis of sentences like (12), the child
will not have to learn (independently) that verbs precede their objects, or that prepositions precede their complements,
for these are necessary concomitants of the head position parameter being set a particular way.
Finally, parameter-setting is not a conscious process. It happens as the result of a mind-brain structured in accordance
with universal grammar existing in a linguistic environment that contains signals that embody the constraints imposed
by parameters.
With all this in place, we can replace the question marks in figure 15.3 with parameterized principles.
It is now possible to see how the notion of moral parameters can help us account for the variation we see in the local
expression of universal moral capacities. To begin, consider figure 15.4.
Right away, we are confronted with the challenge of replacing the question marks in the parentheses. As in the case of
language, this task will involve some bootstrapping. Linguists don't begin their inquiry by positing a handful of
principles and parameters from their armchairs. They collect lots of detailed datafrom child speech (What mistakes do
kids make? What mistakes don't they make? What evidence concerning language is available in the child's
environment?); from particular languages (Which expressions do native speakers of Japanese judge to be okay?
Which expressions do native speakers of Japanese judge not to be okay?); and from comparisons between
languages (How does Mohawk differ from Italian?). Nothing like this sort of dataeither with respect to quantity or
with respect to detailis (yet) available to the moral psychologist.
This is a serious problem for any account of human moral psychology that has explanatory ambitions. Explanations are,
quite generally, hard to come by. And, of course, they simply cannot get started without a clear idea of what is to be
explained. Since the main focus of twentieth-century moral philosophers was moral theory (and not moral psychology),
it is not surprising that we lack a thorough and detailed account of the capacities that are distinctively associated with
morality. Moreover, developmental moral psychology carried out by psychologists has either provided mere
redescriptions of aspects of moral life (e.g., social learning accounts) or has been hampered by unwarranted
assumptions about what mature moral reasoning must involve (e.g., Kohlberg, 1981b).
FIGURE 15.4 Moral parameters and universal moral capacity: moral analogue of the principles and
parameters model.
end p.245
The explananda identification problem might appear to be especially pressing for nativists, because nativist claims are
too easily dismissed if they do not say precisely what is innate. Think of it this way: the plausibility of nativist claims is
greatly increased by the provision of quite fine-grained characterizations of the innate endowment, whether that is
understood a set of processes or a set of constraints. And the fineness of grain will be determined by the level of
specificity of the target explananda. However, empiricist moral psychologists with explanatory ambitions should be just
as concerned with the explananda identification problem. Absent a detailed characterization of the phenomena to be
explained, it is difficult to adjudicate between accounts that posit rich, domain-specific innate endowments and those
that posit all-purpose learning mechanisms, constrained only in the most general terms. Hence both moral nativists
and moral empiricists have an interest in how complex mature moral competence turns out to be.
Nativist claims about language are hard to refute. Linguists are able to provide a rich characterization not only of what
mistakes children make (in acquiring a language) but also of what mistakes children do not make, and rich
characterizations of which strings of a given human language, L, native speakers will judge acceptable and which they
will not. All acquisition stories must be responsible to this data. They must ask: How is it possible that children exhibit
this linguistic behavior (as opposed to other linguistic behavior) on the basis of what is available to them? Poverty of
the stimulus arguments, by their very nature, are acutely attuned to this epistemic requirement.
Exhaustively articulating the proper explananda for moral psychology is not a task I can undertake here. But I can
examine a suggestion. Given the diagrammatic representation of the principles and parameters model earlier (fig. 15.3)
and its moral analogue (fig. 15.4), it is very tempting to think of the output of an agent's I-morality (or competence)
in terms of permissibility judgments. Like speakers' acceptability judgments, permissibility judgments are easy to elicit
and thus easy to collect and study. And there appears to be a significant degree of variability in the permissibility
judgments (normal) moral agents make. For example, some people judge that same-sex sex is morally permissible,
others judge that it is morally impermissible. Hence we might fill out figure 15.4 as figure 15.5.
According to this way of working out the LA, an agent's I-morality effects a highly constrained mapping from inputs (as
yet unspecified) to outputs, namely to an agent's permissibility judgments. We can bootstrap our way to articulating
the content of an I-morality by noting how that mapping is effected. But this will require knowing what the inputs are.
A plausible candidate is actionseither observed or thought about. We make moral judgments about actions that we
witness (What he did was impermissible) and about actions we contemplate, either in an ethics workshop or
preparatory to
FIGURE 15.5 I-morality's mapping from inputs to outputs.
end p.246
FIGURE 15.6 Moral judgments from actions.
performing them ourselves (Is it permissible for a hypothetical agent [or me] to do X in circumstances C?) Hence we
arrive at figure 15.6.
Once we have some data concerning input and output, we can ask what needs to be in I-morality to explain how an
agent gets from a particular action or action description to a judgment about whether the action is morally permissible
or morally impermissible.
As we have seen in the case of language, assuming that universal grammar is a highly abstract innate endowment
universal in the species, we say that the content of a speaker's competence is a set of parameterized principles. A
speaker's language faculty comes to be structured in one of a highly constrained set of ways. This structure imposes
limits on how he perceives the signals to which he is exposed. If the signal can be interpreted by his language faculty,
if it does not violate any of the parameterized principles that characterize his competence, then he will judge the signal
to be okay; if not, not.
The story that figure 15.6 then encourages is this. Assuming something like universal moral grammar, an agent's I-
morality comes to be structured in one of a highly constrained set of ways. This structure imposes limits on how she
perceives actions to which she is exposed. But how do we complete the thought? If the action can be interpreted by
her moral faculty, if it does not violate any of the parameterized principles that characterize her moral competence,
then she will judge that the action is morally permissible; if not, not. Figure 15.7 represents the picture we arrive at.
The moral parameters model appears to have the attractive feature of suggesting an account of moral diversity, in
much the same way that the principle and parameters theory in linguistics has actually provided an account of
linguistic diversity. Universal moral grammar provides the cognitive resources that make possible the acquisition of
moral capacities. Since the latter are acquired in particular moral environments, the developing moral agent will come
to exercise them in ways that reflect those environments, and so will come be able to negotiate moral space in ways
that are sensitive to local conditions.
To the best of my knowledge, no one has actually looked for moral parameters, so a speculation will have to suffice for
now.
Recall that an agent's moral competence, her I-morality, effects a highly constrained mapping from inputs to outputs,
where, for the moment, we are
FIGURE 15.7 Moral parameters model.
end p.247
working with an incredibly simple model, limiting the inputs to actions or action descriptions, and the output to
permissibility judgments.
Let us first think about those inputs, drawing again on linguistics. Speakers qua speakers do not hear noise; they
hear words, sentences, questions, and so on. This is because their linguistic competence imposes structure on the
incoming signalwhere it can. This is not to say, of course, that you and I do not hear birdsong. Rather, the point is
that we do not interpret it as an utterance. Hence it will be useful to think about the fact that moral agents qua moral
agents see actions, not happenings. Again, the claim is not that you and I do not see leaves falling and waves
lapping. Rather, we do not interpret such things as actions.

4

4. We are shameless anthropomorphizers. But anthropomorphism is just thatthe (misguided or motivated) projection of distinctly
human properties onto the nonhuman world.
And to see something as an action as opposed to a happening just is to impose some structure on it. At the very least
it involves the marking of the agent(s) of the action, the patient(s) of the action, and the spatiotemporal boundaries of
the action (its identity conditions). In a very real sense, we parse parts of our environment into actions.
The identification of actions is something arguably made possible by the possession of universal moral grammar. We
might imagine, that is, that one thing the human moral faculty does is to get parts of our environment into the right
shape for evaluation. Things that cannot be gotten into the right shapefor example, a squirrel knocking an acorn
onto my head when it scampers up the roofcannot be evaluated in terms of moral permissibility. (We can and do
curse nonhuman animals; but we don't really think they act impermissibly.) Moral evaluations, like permissibility
judgments and attributions of responsibility, simply cannot get started if we do not already see the world in terms of
agents, patients, and consequences. And since every (normal) human makes moral evaluations, it is not implausible to
claim that every human has the innately specified capacity to see actions. Indeed, considerable evidence has
accumulated that shows that very, very young humans detect agency in the world (see Gergely et al., 1995; Johnson
2005).
The evaluative components of an agent's I-morality can get to work once a representation of an action is in place.
Particular evaluations will depend on a number of factors: the nature of the agent; the nature of the patient; the
effect(s) or outcomes of the action; and how the effects or outcomes are brought about (intentionally, accidentally,
directly, indirectly, alone or in concert?) All of these things will make a difference to how a moral agent's I-morality
will map an action into a permissibility judgment.
Parametric variation might be evident both at the input/I-morality interface and at the I-morality/output interface.
What kinds of creatures can be agents (only humans? only adults?); Which patients matter for the purpose of
evaluation of actions (only humans? only members of the evaluator's community? all sentient creatures?); What
outcomes are good or bad? All these are areas in which we can expect to see differences among human moral judges.
Furthermore, one way of describing observed moral differences among the world's moral agents is to say that
members of different cultures make different judgments concerning what is
end p.248
morally salient for the purposes of evaluation: is the fact that the agent's father has recently died relevant to
assessing his action of having a haircut or eating chicken (see Shweder et al., 1987)?
Let me try to make this less abstractagain, with a very simple example.
One thing seems to be true of all known human moral systems:

5

5. Moral system does not mean particular normative theory. It is shorthand for something like typical pattern of permissibility
judgments made by a group of humans.
moral considerations (obligations and prohibitions) do not apply to everything. For example, pieces of furniture are not
the sorts of things that have moral considerability: no one thinks that tables are owed special treatment in virtue of
their intrinsic properties, though someone might judge that it is morally impermissible to scratch a table because that
table belongs to a human being. Still, there is global variation in what things are taken to fall into the set of the
morally considerable. Some human moral systems cast the net widely, including, all animals along with human beings;
others are more conservative, extending moral considerability only to human beings (and perhaps then only to a
subset of human beingswhat moral philosophers like to call persons). In addition, there is further variation among
the systems that admit humans and nonhuman animals into the special class of the morally considerable: some such
systems might assign different degrees of moral considerability to different types of members of the class, ranking,
say, human beings above nonhuman animals, men above women, or cows above frogs.
Let us then define a schweeb as creature with the highest moral status. A very basic principle of all possible I-
moralities might be Schweebs are to be respected or Given the choice of saving the life of a schweeb or saving the
life of a nonschweeb, always save the life of the schweeb. However, what counts as a schweeb might differ from one
community of moral agents to another community of moral agents. Schweebhood might be attributed only to women,
or only to rational creatures, or only to sentient creatures. And so moral agents raised in different moral communities
would come to have their schweebhood parameter set in one way rather than another. (In principle, there is no barrier
to some parameters allowing for more than two settings.)
How moral agents' schweebhood parameter is set is something, presumably, we could discover by eliciting
permissibility judgments, across a range of moral communities, about a range of hypothetical actions involving different
agents and patients. Having one's schweebhood parameter set in a particular way will be reflected in one's
permissibility judgments, in ways that mirror the cascading effects of linguistic parameter-setting illustrated by
sentences (7)(9) earlier. And having one's schweebhood parameter set eliminates the need for considering the
question of moral status anew each time one makes a moral judgment.

6

6. The idea that the proponent of the LA might account for the universal tendency to make discriminations between subjects of moral
considerability can be pursued in alternative ways. Here I have briefly examined a parameter for schweebhood. However, the same
patterns of judgment might be accounted for in terms of the universal possession of some principle regarding the fitting treatment of
others, which interacts with culturally specific beliefs that are external to the moral faculty itself.
end p.249
FIGURE 15.8 Modified, completed principles and parameters model.
The moral parameters model is an attractive way of beginning to cash out a nativist moral psychology, if only because
it makes vivid the sorts of things to which all moral nativist accounts must pay attention. Nonetheless, even at this
early stage of inquiry, some concerns are likely to be raised. In the remainder of the essay, I want to address a worry
that might seem immediately apparentnamely, that the moral parameters view (and perhaps any other way of
pursuing the LA) entails moral relativism. This will be a worry, of course, only for those who think moral relativism is
false. But it is related to a concern that even moral objectivists might have with the LA approach to moral psychology
namely, whether this type of approach can do justice to some apparently distinctive phenomenological aspects of
moral life. The treatment will be far from complete. My present aim is quite modest: to investigate these matters in a
way that makes clear what the moral parameters model (and other yet-to-be-proposed LA approaches) is not and
need not be committed to and in a way that renders them generally instructive for nativist moral psychologists.
3 Moral Relativism and Moral Disagreement
To see why friends of moral relativism might take comfort in the apparent potential of the moral parameters view to
support their theoretical position, consider again the following diagrams. (Fig. 15.8 is the completed picture of fig.
15.3, slightly modified for comparative purposes; fig. 15.7 is repeated.)
Modulo performance errors, if two speakers make different acceptability judgments about the same string, they are
thought to have different I-languages or idiolects. As we saw earlier: English and Italian speakers make different
acceptability judgments about expressions with no overtly pronounced subject; English and Japanese speakers make
different acceptability judgments about expressions in which the head of an expression is preceded by modifying
material. Insofar as the moral parameters model treats an agent's permissibility judgments as analogous to a speaker's
acceptability judgments, it would seem to entail that, modulo performance errors, if two agents make different
permissibility judgments about the same practice, then those agents have different I-moralities.

7

7. Some performance errors in the moral domain will have the same source as their linguistic cousins: distraction, drunkenness, and
processing and other physical limitations. However, in addition, a particular agent's judgment about some morally charged matter
might not accurately manifest his moral competence so much as be due to the effect of a comforting but irrational prejudice or
ignorance of the facts.
The moral relativist will press the sensed advantage in the following way.
Supposing that two speakersMary and Kumikohave different I-languages, it makes no sense for Mary to complain
to Kumiko that she (Kumiko) has it wrong
end p.250
about where heads should go. Mary just has to and (of course) will recognize that Kumiko simply speaks a different
language. It would be foolish of Mary to ask Kumiko to provide reasons for her acceptability judgments, and Kumiko's
inability to provide justification will not be a source of concern to Mary. Similarly, then, if Mary and Kumiko have
different I-moralities, then it makes no sense for Mary to complain about Kumiko's views about the permissibility of
certain practices. Mary just has to accept that Kumiko has a different morality, and it would be foolish of her to ask
Kumiko to provide reasons for her permissibility judgments. If the moral parameters model is right in treating an
agent's permissibility judgments as analogous to a speaker's acceptability judgments, then normative relativism is true.
Agents make the permissibility judgments they do, and, controlling for performance errors, when those judgments
diverge with respect to a particular practice, there is nothing more to be said. No reasons can be provided for saying
that Mary is right and Kumiko is wrong, or vice versa.
At this point, someone without moral relativistic leanings might press a related but quite different complaint, namely,
that the moral parameters model is at odds with the lived experience of moral life, insofar as it seems to allow for
neither genuine disagreement (as opposed to mere diversity) nor the fact we care about moral differences in ways we
don't seem to care about linguistic differences.
Mary and Kumiko do not really disagree about where the head of an expression should go. But genuine moral
disagreement is a fact of life. Members of the same families, exposed to virtually identical environments, disagree
about the permissibility of same-sex sex, abortion, and eating nonhuman animals. Moreover, most of us have
experienced intrapersonal moral disagreement: we engage in inner dialogue about whether we should eat pork; we
used to think that abortion is morally permissible, now we think not. And we care about these differences, often to the
point of severing relationships and experiencing considerable anxiety about our former selves.
An obvious line of response is to reject the idea that the requisite output of an agent's I-morality is a set of
permissibility judgments. However, that move is neither necessary nor sufficient to defend the moral parameters view
from the current line of criticism.
It is not necessary, for the hopeful moral relativist and the skeptical objectivist both rather overstate the disanalogies
between language and morality. It is true that, in the normal course of speaking and understanding one another, we
do not typically ask for justifications for why a speaker judges that a certain string is okay. But it is not true that
there is never cause to make normative recommendations regarding language. For example, given the notorious
ambiguity of the word sanction, one would advise a student not to use that verb in writing an applied ethics essay,
say. And native speakers of English, who to all intents and purposes have the same I-languagelike Americans and
Australianscan and do disagree about whether one takes a bottle of wine to a dinner party or whether one brings a
bottle.
In any case, it is not clear that dispensing with the thought that an agent's permissibility judgments are among the
outputs of her moral competence will be sufficient to assuage the critic. For the apparent problem is somewhat deeper:
any view that models human moral competence on human linguistic competence in the perfectly general way described
here (figs. 15.1 and 15.2) seems to allow no gap
end p.251
between what an agent judges to be morally permissible and what she ought to judge to be morally permissible.
Mary's mind-brain is structured in a way that permits only a highly constrained mapping between inputs and outputs;
in some sense, she cannot be faulted for the judgments she makes. However, such a gap is precisely what we must
presuppose in order to make sense of genuine moral disagreement and our belief that it is appropriate to interrogate
agents about the reasons for their permissibility judgments.
We have reached familiar and unavoidable questions about the relation between descriptive psychology and normative
theory that arise throughout cognitive science. Most familiar with respect to the empirical study of human reasoning
(see Stein, 1996, for a useful review), it is no surprise that these questions arise for the empirical study of moral
capacities, too. But it is not as if the questions get no grip with respect to language. This bears emphasis, because it is
too easy to assume that they don't and, on that basis, to infer that there are special problems about the normativity
involved in rationality and morality that, at least with respect to the latter, render the LA implausible. To put the
critic's point more specifically: pursuing the LA erases an isought distinction that, while irrelevant in linguistics, is
essential to maintain in the study of morality.
The apparent irrelevance to linguistics of the questions concerning the relation between descriptive psychology and
normative theory is, I believe, an artifact of the way linguistic inquiry proceeds. Linguists do not begin with a theory of
right syntax and then assess the extent to which speakers conform to that theory. Rather, the principles that
characterize a speaker's linguistic competence are discovered by the systematic study of signal -to-meaning mappings,
as evidenced in speaker's acceptability judgments. In sharp contrast, both the moral relativist's embrace of the moral
parameters model and the concern that the model cannot accommodate the fact of genuine moral disagreement
presuppose the existence of theories of right action; the very idea that two agents can be equally justified in making
contradictory permissibility judgments and the very idea that an agent can make a (nonperformance error) mistake
about the permissibility of an action assume that there are accounts of what is permissible that purport to be correct.
But this comparison is misleading. For, while it implicitly recognizes two types of normative domain, it wrongly assigns
language exclusively to one and morality exclusively to the other. Instead, I believe we should recognize two levels of
normativity: there is the normativity that is the direct result of our mind-brains being built and developing in certain
wayscall this brute normativity; and there is the normativity that is reflected the theories of right X-ing we
constructcall this codified normativity. Brute normativitythe innately enabled structures and processes that make
judgment possibleis the proper target of linguistic and moral psychological inquiry. Codified normativitythe ways we
think and talk about our practices of judgmentis real enough, but it cannot be the subject of science. The
construction of codified normative theories is motivated, no doubt, by the need to facilitate communicative and social
cooperation. But the factors that are relevant to those tasks in any human community are too multifarious to capture
in any universally valid and systematic account.
end p.252
In the absence of theories of right syntax, linguists have no option but to proceed the way they do. But moral
psychologists do. So the approach I am encouraging here asks and allows us to abandon our attachment to theories of
right action when we do moral psychology.
Theories of right actionlike utilitarianism and versions of duty ethicsare the products of philosophical labor. The
moral psychologist should neither presuppose them in her empirical inquiry, nor should she expect her investigation
into the structure of brute moral normativity to vindicate a particular theory or principle of right action (cf. Greene,
2005).
It is logically possible, I suppose, that we will find that a common component of the world's I-moralities is some
particular normative theoryact utilitarianism, say, or some particular moral principlethe categorical imperative, say.
That would be both interesting and very, very surprising.
The acquisition of a competence that embodies either the greatest happiness principle or some version of the
categorical imperative is consistent with neither empiricist nor nativist accounts. Most of the world's children, not
having Western moral philosophers as caretakers or parents, will not be exposed to these principles of right action at
all. And we know that the permissibility judgments of (Western) moral agents are apt sometimes to be accordance
with utilitarian considerations and sometimes to be accordance with (roughly) Kantian considerations (Nagel, 1972).
Hence no single principle or theory of right action will do as the content of I-morality.

8

8. While I cannot do it justice here, it is crucial to note a further consequence of this discussion. Experiments aimed at uncovering the
nature of human moral processing (to choose a suitably neutral term) that are structured to elicit particular judgments of a roughly
consequentialist or a roughly deontological nature beg central questions in moral psychology. If inquirers are looking for contrasts
between judgments of these types, they will find them. Hence studies that seek to discover which parts of the brain light up when an
agent makes a judgment warranted by consequentialism and which parts light up when she makes a judgment in accordance with
some version of the categorical imperative are seriously misleading and are unlikely to help us uncover the content of human I-
moralities (cf. Greene & Haidt, 2002; Greene et al., 2001).
With these remarks in place, I want to end by being as clear as I can about what the proponent of the LA is not
committed to.
It is very tempting to view the moral parameters model as a way of filling out the details of Rawls's early view. This is
a mistake. In A Theory of Justice (1971), Rawls writes:
[O]ne may regard a theory of justice as describing our sense of justice. This enterprise is very difficult. For by
such a description is not meant simply a list of the judgments or institutions and actions we are prepared to
render, accompanied with supporting reasons when these are offered. Rather, what is required is a formulation
of a set of principles which, when conjoined with our beliefs and knowledge of the circumstances, would lead us
to make these judgments with their supporting reasons were we to apply these principles conscientiously and
intelligently. A conception of justice characterizes our moral sensibility when the everyday judgments we make
are in accordance with its principles. ... A useful comparison here is with the problem of describing the sense of
grammaticalness that we have for the sentences of our native language. (pp. 467)
end p.253
Talk of parameterized principles might then be misinterpreted as implying that the content of an agent's I-morality is a
set of explicitly represented moral principles that are consciously accessible to him for deployment in the activity of
moral judgment and in the practice of providing justifications for those judgments (see Nichols, 2005). But, as in the
case of language, the proponent of the moral parameters view need make no particular claims about how the relevant
principles are represented. As in the case of language, we are to imagine that the relevant principles are simply a
theorist's way of characterizing a set of constraints or cognitive structures.
More important, pace Rawls, the moral psychologist need have no truck with the idea that the operation of an agent's
sense of justice (i.e., her moral competence) is a conscious affair. Rawls mentions that the principles that characterize
an agent's sense of justice can be applied conscientiously and intelligently. His picture characterizes the operation of
an agent's moral competence as a sort of syllogistic machine: confronted with a hypothetical or actual circumstance or
practice, the agent searches for and applies a relevant moral principle (or principles), and then out pops a judgment
about the justice of the circumstance or the permissibility of the practice.

9

9. There are other more philosophical objections to this way of characterizing moral reasoning, seee specially McDowell, 1979.
Furthermore, Rawls suggests, the reasons for her judgments that an agent might be able to supply on demand mirror
the operation of this machine. Put another way: moral epistemology (the justification of moral judgments)
recapitulates moral psychology (the cognitive processes that actually make those judgments possible.)
But there is little reason to believe that the content of an agent's I-morality will be recognizable to us or to him as
anything like a set of moral principles. Again, as we know from the study of language, speakers do not recognize the
principles that characterize their linguistic competence, and even savvy linguists do not consciously deploy principles,
like the head position parameter, in speaking. And if it is right to posit an epistemic relation between a speaker and
the content of his linguistic competence, then that relation must be tacit. There is no reason to deny the same
possibility regarding an epistemic relation between a moral agent and the content of his moral competence.
Indeed, both anecdotal and experimental evidence suggests that moral agents are quite bad at providing reasons for
their permissibility judgments; at a certain point, justification stops. I suspect we all judge that it is morally
impermissible to torture human infants for fun, but it is notoriously difficult to say why it is morally impermissible
(Haidt, 2001).

10

10. The difficulty agents have in providing justifications for their permissibility judgments is thought, by some, to lend definitive
support to a view according to which moral judgments are the output of some affective (i.e., noncognitive) system. Just as a speaker
says of a *-ed sentence It doesn't sound right, an agent might say of a particular *-ed action it just doesn't feel right. However, it
would be wholly unjustified to take the dumbfounding of linguistic informants as evidence for the claim that their linguistic competence
is not something cognitive. I see no reason to make the related inference with respect to moral dumbfounding.
end p.254
There is a direct parallel in linguistics. While (nonlinguist) speakers of English will immediately judge that (13) is not-
okay, they will not be able explain why it is not.
They'll just say It doesn't sound right. However, it is no count against linguistic inquiry aimed at uncovering the
content of speakers' I-languages that speakers are unable to articulate parameterized principles that justify their
acceptability judgments. Linguists are expected to be able to articulate the relevant principles. Even so, their job is not
to justify speakers' acceptability judgments, but rather to explain them. Luigi makes the judgments he does because
his idiolect is characterized (in part) by the null subject parameter being set to on. That is a psychological fact about
Luigi. He does not apply the relevant parameterized principle either intelligently or conscientiously in his role as a
native speaker of Italian; his mind-brain just happens to be structured in a certain way.
Similarly, the moral psychologist who pursues the LA is concerned with what I dubbed brute normativity, that is, the
psychological structures and processes that underlie the exercise of moral capacities. Arguably, these structures and
processes can be characterized, at some level of abstraction, in terms of explicit principles. But those principles are
formulated by the scholars who study moral capacities, not by the folk studied. And, crucially, we should not expect
that, once articulated, these principles will look anything like those products of philosophical laborstheories or
principles of right action.
I just said that the job of the linguist is not to justify speakers' acceptability judgments but to explain them, and I
pressed the same point with respect to the moral psychologist. However, one might wonder whether this is really
kosher.
Linguists bootstrap their way to an articulation of the content of I-languages by starting with speakers' acceptability
judgments. In this sense, it is correct to say that acceptability judgments provide data for linguistic theory, and a
theory that is radically at odds with speakers' judgments would fail on that account. Things seem to be quite different
with respect to morality because we expect a gap between agents' permissibility judgments and moral principles. Quite
so. We cannot read a theory of right action off the proffered judgments of agents, and certainly not off their actions.
But this does not show that permissibility judgments cannot be treated as analogous to acceptability judgments.
Rather, the point is simply illustrative of the distinction I described earlier between brute and codified normativity. The
principles the linguist articulates are intended as an abstract characterization of the structure of speakers' competence;
(13) *We congratulated themselves.
they are not intended to provide speakers with guidance in their communicative endeavors. Neither should we depend
on the principles the moral psychologist uncovers for moral guidance. The moral psychologist's job is to uncover the
structures and processes that make moral life possible. Pursuing the LA is, I have argued, the best way to go about
doing that.
If this is right, and I think it is, what are we to make of the enterprise of normative ethics quite generally? Nothing I
have yet said directly undermines any
end p.255
particular normative theory; still less am I committed to the idea that a completed nativist moral psychology will
render normative ethics otiose. What I am sure of is that pursuing the LA provides an excellent opportunity to
rigorously examine precisely how our ways of codifying morality are related to the capacities that make us take an
interest in that enterprise at all.
end p.256
16 Is Human Morality Innate?
Richard Joyce
The first objective of this chapter is to clarify what might be meant by the claim that human morality is innate. The
second is to argue that if human morality is indeed innate, an explanation may be provided that does not resort to an
appeal to group selection but invokes only individual selection and so-called reciprocal altruism in particular. This
second task is not motivated by any theoretical or methodological prejudice against group selection; I willingly concede
that group selection is a legitimate evolutionary process, and that it may well have had the dominant hand in the
evolution of human morality. There is a fact of the matter about which process, or which combination of processes,
produced any given adaptation, and it is to be hoped that, in time, enough evidence might be brought into the light to
settle such issues. At present, though, the evidence is insufficient regarding human morality. By preferring to focus on
reciprocity rather than group selection, I take myself simply to be outlining and advocating a coherent and
uncomplicated hypothesis, which may then take its place alongside other hypotheses to face the tribunal of our best
evidence.
1 Understanding the Hypothesis
Before we can assess the truth of a hypothesis, we need to understand its content. What might it mean to assert that
human morality is innate? First, there are issues concerning what is meant by innate. Some have argued that the
notion is so confused that it should be eliminated from serious debate (see Bateson, 1991; Griffiths, 2002). I think
such pessimism is unwarranted, but I agree that anyone who uses innate in critical discussion should state what he or
she has in mind. I suggest that what people generally mean when they debate the innateness of morality is
end p.257
whether morality (under some specification) can be given an adaptive explanation in genetic terms: whether the
present-day existence of the trait is to be explained by reference to a genotype having granted ancestors reproductive
advantage, rather than by reference to psychological processes of acquisition.

1

1. This stipulation is not intended as an analysis or a general explication of the concept innateness. I have no objection to the term
being used in a different manner in other discourses.
If morality is innate in this manner, it would not follow that there is a gene for morality. Nor do this conception of
innateness and the references to human nature that routinely come along with it imply any dubious metaphysics
regarding a human essence. Asserting that bipedalism is innate and part of human nature doesn't imply that it is a
necessary condition for being human.
Nor does it follow that an innate trait will develop irrespective of the environment (for that isn't true of any phenotypic
trait) or even that it is highly canalized. The question of how easily environmental factors may affect or even prevent
the development of any genetically encoded trait is an empirical one that must be addressed on a case-by-case basis.
It is also conceivable that the tendency to make moral judgments is the output of an innate conditional strategy, in
which case even the existence of societies with nothing recognizable as a moral system would not be inconsistent with
morality's being part of human nature, for such societies may not satisfy the antecedent of the conditional. Indeed, if
our living conditions are sufficiently dissimilar from those of our ancestors, then, in principle, there might have been no
modern society with a moral systemnot a single moral human in the whole wide modern worldand yet the claim
that morality is innate might remain defensible. These possibilities are highlighted just to emphasize the point that
something's being part of our nature by no means makes its manifestation inevitable. But, of course, we know that in
fact modern human societies do have moral systems; indeed, apparently all of them do (see Brown, 1991; Roberts,
1979; Rozin, Lowery, et al., 1999).
The hypothesis that morality is innate is not undermined by observation of the great variation in moral codes across
human communities, for the claim need not be interpreted as holding that morality with some particular content is
fixed in human nature. The analogous claim that humans have innate language-learning mechanisms does not imply
that Japanese, Italian, or Swahili is innate. We are prepared to learn some language or other, and the social
environment determines which one. Although there is no doubt that the content and the contours of any morality are
highly influenced by culture, it may be that the fact that a community has a morality at all is to be explained by
reference to dedicated psychological mechanisms forged by biological natural selection. Even if mechanisms of cultural
transmission play an exhaustive role in determining the content of an individual's moral convictions, this would be
consistent with there being an innate moral sense designed precisely to make this particular kind of cultural
transmission possible. That said, it is perfectly possible that natural selection has taken some interest in the content of
morality, perhaps favoring broad and general universals. (Later, I will mention some evidence
end p.258
indicating that there are a number of recurrent themes among all moral systems.) This fixed content would pertain
to actions and judgments that enhance fitness despite the variability of ancestral environments. Flexibility is good if
the environment varies; but if in some respect the environment is very stablefor example, it is hard to imagine an
ongoing situation where fitness will be enhanced by eating one's childrenthen moral attitudes with fixed content may
be more efficient. After all, speaking generally, phenotypic plasticity can be costly: learning introduces the dangers of
trial-and-error experimentation, and it takes a potentially costly amount of time. (Consider the nastiness of getting a
sunburn before your skin tans in response to an increase in sun exposure, or the dangers of suffering a disease before
your immune system kicks in to combat it.)
Apart from controversy surrounding innateness (which I don't for a second judge the foregoing clarifications to have
settled), the hypothesis that human morality is innate is also bedeviled by obscurity concerning what might be meant
by morality. A step toward clarity is achieved if we make an important disambiguation. On the one hand, the claim
that humans are naturally moral animals might mean that we naturally act in ways that are morally laudablethat the
process of evolution has designed us to be social, friendly, benevolent, fair, and so on. No one who has paused to
glance around herself will ever claim that humans always manifest such virtuous behaviors, for it is obvious that we
can also be violent, selfish, lying, insensitive, and unspeakably nasty creatures. By saying that humans naturally act in
morally laudable ways, we might mean that these morally unpleasant aspects of human behavior are unnatural, or
that both aspects are innate but that the morally praiseworthy elements are predominant, or simply that there exist
some morally laudable aspects among what has been given by nature, irrespective of what darker elements may also
be present.
Alternatively, the hypothesis that humans are by nature moral animals may be understood in a different way: as
meaning that the process of evolution has designed us to think in moral terms, that biological natural selection has
conferred on us the tendency to employ moral concepts. According to the former reading, the term moral animal
means an animal that is morally praiseworthy; according to the second, it means an animal that morally judges. Like
the former interpretation, the latter admits of variation: saying that we naturally make moral judgments may mean
that we are designed to have particular moral attitudes toward particular kinds of things (for example, finding incest
and patricide morally offensive), or it may mean that we have a proclivity to find something-or-other morally offensive
(morally praiseworthy, etc.), where the content is determined by contingent environmental and cultural factors. These
possibilities represent ends of a continuum; thus, many intermediate positions are tenable.
These two hypotheses might be logically related: it has often been argued that only beings who are motivated by moral
thoughts properly deserve moral appraisal. If this relation is correct, then humans cannot be naturally morally laudable
unless we are also naturally able to employ moral judgments; thus establishing the truth of the first hypothesis would
suffice to establish the truth of the second. However, this strategy is not a promising one, because the connection
mentionedroughly, that
end p.259
moral appraisal of an individual implies that the individual is morally motivatedis too contentious to rest arguments
on with any confidence. (In fact, as I will mention later, I doubt that it is true.)
It is the second hypothesis with which this chapter is concerned, and I will be investigating it directly, not by
establishing the first hypothesis. With it thus made explicit that our target hypothesis concerns whether the human
capacity to make moral judgments is innate, it ought to be clear that arguments and data concerning the innateness of
human prosociality do not necessarily entail any conclusions about an innate morality. Bees are marvelously prosocial,
but they hardly make moral judgments. An evolutionary explanation of prosocial emotions such as altruism, love, and
sympathy also falls well short of being an evolutionary explanation of moral judgments. We can easily imagine a
community of people all of whom have the same desires: they all want to live in peace and harmony, and violence is
unheard-of. They are friendly, loving people as far as you can see, oozing with prosocial emotions. However, there is
no reason to think that there is a moral judgment in sight. These imaginary beings have inhibitions against killing,
stealing, and so onthey wouldn't dream of doing such things because they just really don't want to. But we need not
credit them with a conception of a prohibition: the idea that one shouldn't kill or steal because it's wrong. And moral
judgments require, among other things, the capacity to understand prohibitions. To refrain from doing something
because you don't want to do it is very different from refraining because you judge that you ought not do it.
This point must not be confused with one famously endorsed by Immanuel Kant: that actions motivated by prosocial
emotions cannot be considered morally admirable (Kant, 1783/2002, pp. 199200). I am more than happy to side with
common sense against Kant on this point. We often morally praise people whose actions are motivated by love,
sympathy, and altruism. In fact, I am willing to endorse the view that on occasions a person whose motivations derive
from explicit moral calculation rather than direct sympathy is manifesting a kind of moral vice. So it is not being denied
that the imaginary beings described earlier deserve our moral praise, or even that they are, in some sense of the
word, morally virtuous. My point is the far less controversial one that someone who acts solely from the motive of love
or altruism does not thereby make a moral judgment (assuming, as seems safe, that these emotions do not necessarily
involve such judgments

2

2. Notice that my examples of prosocial emotions do not include guilt or shame, for the very reason that I accept that these emotions
do involve a normative (and often moral) judgment. Guilt, I submit, necessarily involves thoughts of having transgressed.
).
Now we face the question of what a moral judgment is, for we cannot profitably discuss the evolution of X unless we
have a firm grasp of what X is. Unfortunately, there is disagreement among metaethicists, even at the most
fundamental level, over this question. On this occasion I must confine myself to presenting dogmatically some plausible
distinctive features of a moral judgment, without pretending to argue the case.
end p.260
Something to note about this list is that it includes two ways of thinking about morality: one in terms of a distinctive
subject matter (concerning interpersonal relations), the other in terms of what might be called the normative form of
morality (a particularly authoritative kind of evaluation). Both features deserve their place. A set of values governing
interpersonal relations (e.g., Killing innocents is bad) but without practical authority, which would be retracted for any
person who claimed to be uninterested, for which the idea of punishing or criticizing a transgressor never arose, simply
wouldn't be recognizable as a set of moral values. Nor would a set of binding categorical imperatives that (without any
further explanation) urged one, say, to kill anybody who was mildly annoying, or to do whatever one felt like doing.
(Philippa Foot once claimed that to regard a person as bad merely on the grounds that he runs around trees in a
certain direction, or watches hedgehogs by the light of the moon, is not to have evaluated him from a moral point of
Moral judgments (as public utterances) are often ways of expressing conative attitudes, such as approval, contempt,
or, more generally, subscription to standards; moral judgments nevertheless also express beliefs (i.e., they are
assertions).
Moral judgments pertaining to action purport to be deliberative considerations that hold irrespective of the
interests/ends of those to whom they are directed; thus they are not pieces of prudential advice.
Moral judgments purport to be inescapable; there is no opting out of morality.
Moral judgments purport to transcend human conventions.
Moral judgments centrally govern interpersonal relations; they seem designed to combat rampant individualism in
particular.
Moral judgments imply notions of desert and justice (a system of punishments and rewards).
For creatures like us, the emotion of guilt (or a moral conscience) is an important mechanism for regulating one's
moral conduct.
viewit's just the wrong kind of thing; Foot, 1958, p. 512.) Any hypothesis concerning the evolution of a moral faculty
is incomplete unless it can explain how natural selection would favor a kind of judgment with both these features.
I am not claiming that this list succeeds in capturing the necessary and sufficient conditions for moral judgments; it is
doubtful that our concept of a moral judgment is sufficiently determinate to allow of such an exposition. Some of these
items can be thought of merely as observations of features of human morality, whereas others very probably deserve
the status of conceptual truths about the very nature of a moral judgment. The sensibly cautious claim to make is that
so long as a kind of value system satisfies enough of the foregoing criteria, then it counts as a moral system. A
somewhat bolder claim would be that some of the items on the list (at least one but not all) are necessary features,
and enough of the remainder must be satisfied in order to have a moral judgment. In either case, how much is
enough? It would be pointless to stipulate. The fact of the matter is determined by how we, as a linguistic
population, would actually respond if faced with such a decision concerning an unfamiliar community: if they had a
distinctive value system
end p.261
satisfying, say, four of the listed items, and for this system there was a word in their languagesay woogle values
would we translate woogle into moral? It's not my place to guess with any confidence how that counterfactual
decision would go. All I am claiming is that the foregoing items would all be important considerations in that decision.
What evidence is there that the human proclivity for making such judgments is innate? The reader could be forgiven
for assuming that an examination of such empirical evidence will be the focus of this chapter, but, in fact, this is
another matter concerning which I must content myself with a wave of the hand in a certain direction. On this
occasion, my objective is not to attempt to establish that human morality is innate, but rather to address the question
of how and why it could be: What makes moral judgment adaptive, and what evolutionary forces might have been
involved in its emergence? Having a good answer to these questions does in itself provide some support for the
hypothesis that morality is innate, for this hypothesis would be shaky if we lacked any conception of how natural
selection might have produced such a trait. Nevertheless, of course, having a coherent story to tell about how a trait
could have resulted from natural selection is never sufficient for establishing that it did so evolve. For that we need
hard evidence. In my opinion (here comes the hand-waving), the strongest evidence for an innate human faculty
comes from developmental psychology. The course of moral development in the human child exhibits an extremely
reliable sequence, it gets underway remarkably early, its developmental pathway is distinct from the emergence of
other skills, and its unfolding includes abrupt maturations. On this last point, Jonathan Haidt (2001, pp. 8267)
describes the view of anthropologist Alan Fiske (1991) as follows.
children seem relatively insensitive to issues of fairness until around the age of 4, at which point concerns about
fairness burst forth and are overgeneralized to social situations in which they were never encouraged and in
which they are often inappropriate. This pattern of sudden similarly timed emergence with overgeneralization
suggests the maturation of an endogenous ability rather than the learning of a set of cultural norms.
Of particular note is the child's capacity to distinguish moral from conventional transgressions, which emerges as early
as the third year (Smetana, 1981; Smetana & Braeges, 1990)and this is an impressively crosscultural phenomenon
(Hollos et al., 1986; Nucci et al., 1983; Song et al., 1987; Yau & Smetana, 2003). Whence do children derive this
distinction? It is exceedingly unlikely that across the wide variety of human social ecologies there is some stable
exogenous characteristic that may be plausibly appealed to as the explanans of this developmental phenomenon. For
example, one of the features taken to distinguish the moral from the conventional is the independence of moral
normativity from any rule-conferring authority figure (see Turiel, 1983, 1998; Turiel et al., 1987). Yet it is difficult to
see what there might be in a typical social environment that would allow a general intelligence mechanism to infer on
the basis of observation that one norm depends on authoritative decree (e.g., that boys should not wear dresses to
school) while another does not (e.g., that one shouldn't punch others). In order to infer a dependence relation, one
would have to observe a correlation between the relevant authority's
end p.262
changing its mind to permit the boy to wear a dress and that action's no longer counting as a transgression. And in
order to infer an independence relation one would have to either (1) observe the relevant authority change its opinion
about an act of harming while one noted that the act nevertheless continued to count as a transgression, or (2)
observe a previously condemned act of harming cease to count as a transgression (or vice versa) while one noted that
the relevant authority's opinion on the matter had not altered. But observations of types 1 and 2 are hard to come by,
even for adults, let alone three-year-olds. Regarding a serious moral offense, like violent crime, what we invariably
observe is both elements remaining stable: all relevant authorities denounce it, and it continues to be considered a
transgression. How, on the basis of such observations, a child is supposed to infer an independence relation is
baffling.

3

3. Likewise, what experience allows a child to infer that certain norms are local whereas others hold more generally (this being another
criterion for distinguishing conventional norms from moral)? When the locale of the norm is, for example, school versus home, we can
plausibly find the origin of the distinction in the child's experience. But many social conventions hold in both the school and the home,
and in fact for a wide range of social norms (e.g., eating with utensils rather than fingers), the child very often has neither direct nor
indirect experience of a setting in which it doesn't hold.
The solution to this puzzle is that morality is not something that children learn or infer from their exogenous
environment but is, rather, the result of the unfolding of an innate preparedness.
As I say, rather than develop this line of argument (or any of a number of complementary lines of argument), what I
intend in this chapter is to ask why natural selection might have been interested in producing such a trait. A group
selectionist account will be satisfactory as an explanation if it shows how having individuals making such authoritative
prosocial judgments would serve the interests of the group. An explanation in terms of individual selection must show
how wielding authoritative prosocial judgments would enhance the inclusive reproductive fitness of the individual. One
might be tempted to think that the group selectionist account is more feasible since it can more smoothly explain the
development of prosocial instinctsafter all, it is virtually a tautology that prosocial tendencies will serve the interests
of the group. However, prosociality may also be smoothly explained in terms of individual selection via an appeal to
the processes of kin selection, mutualism, and reciprocal altruism (see Dugatkin, 1999). In what follows I will focus on
the last.
2 Reciprocity
It is a simple fact that one is often in a position to help another such that the value of the help received exceeds the
cost incurred by the helper. If a type of monkey is susceptible to infestation by some kind of external parasite, then it
is worth a great deal to have those parasites removedit may even be a matter of life or deathwhereas it is the
work of only half an hour for the groomer. Kin selection can be used to explain why a monkey might spend the
afternoon grooming family members; it runs into trouble when it tries to explain why monkeys in their natural setting
would bother grooming non-kin. In grooming non-kin, the benefit given by an individual
end p.263
monkey might greatly exceed the cost she incurs, but she still incurs some cost: that half-hour could profitably be used
foraging for food or arranging sexual intercourse. So what possible advantage to her could there be in sacrificing
anything for unrelated conspecifics? The obvious answer is that if those unrelated individuals would then groom her
when she has finished grooming them, or at some later date, then that would be an all-around useful arrangement. If
all the monkeys entered into this cooperative venture, in total more benefit than costs would be distributed among
them. The first person to see this process clearly was Robert Trivers (1971), who dubbed it reciprocal altruism.
It is often thought that cheating and cheat-detection traits are an inevitable or even defining feature of reciprocal
exchanges, but in fact a relationship whose cost-benefit structure is that of reciprocal altruism could in principle exist
between plantsorganisms with no capacity to cheat, thus prompting no selective pressure in favor of a capacity to
detect cheats. Even with creatures who have the cognitive plasticity to cheat on occasions, reciprocal relations need
not be vulnerable to exploitation. If the cost of cheating is the forfeiture of a highly beneficial exchange relation, then
any pressure in favor of cheating is easily outweighed by a competing pressure against cheating, and if this is reliably
so for both partners in an ongoing program of exchange, then natural selection doesn't have to bother giving either
interactant the temptation to cheat, or a heuristic for responding to cheats. But since reciprocal exchanges will develop
only if the costs and benefits are balanced along several scales, and since values are rarely stable in the real world,
there is often the possibility that a reciprocal relation will collapse if environmental factors shift. If one partner, A,
indicates that he will help others no matter what, then it may no longer be to B's advantage to help A back. If the
value of cheating were to rise (say, if B could possibly eat A, and there's suddenly a serious food shortage), then it
may no longer be to B's advantage to help A back. If the cost of seeking out new partners who would offer help (albeit
only until they also are cheated) were negligible, then it may no longer be to B's advantage to help A back. For
natural selection to favor the development of an ongoing exchange relation, these values must remain stable and
symmetrical for both interactants.

4

4. By symmetrical I mean that it is true of each party that she is receiving more benefit than cost incurred. But it is in principle
possible that, all told, one of the interactants is getting vastly more benefit than the other. Suppose B gives A 4 units of help, and it
costs him 100 units to do so. Sounds like a rotten deal ? Not if we also suppose that A in return gives B 150 units of help, and it costs
her only 3 units to do so. Despite the apparent unevenness of the exchange, since 4 > 3 and 150 > 100, both players are up on the
deal, and, ceteris paribus, they should continue with the arrangement. The common assumptionthat what is vital to reciprocal
exchanges is that one can give a benefit for relatively little costneed not be true of both interactants. With the values just given, it is
not true of B. But when it is not true of one of the interactants, then in order to compensate, it must be very true of the other: Here
A gives 150 units for the cost of only 3.
What is interesting about many reciprocal arrangements is that there's a genuine possibility that one partner can cheat
on the deal (once she has received her benefit) and get away with it. Therefore, there will often be a selective pressure
in favor of developing a capacity for distinguishing between cheating that leads to long-term forfeiture and cheating
that promises to pay off. This, in turn, creates a new pressure for a sensitivity to cheats and a capacity
end p.264
to respond to them. An exchange between creatures bearing such capacities is a calculated reciprocal relationship; the
individual interactants have the capacity to tailor their responses to perceived shifts in the cost-benefit structure of the
exchange (see de Waal & Luttrell, 1988).
The cost-benefit structure of a reciprocal relation can be stabilized if the price of non-reciprocation is increased beyond
the loss of an ongoing exchange relationship. One possibility would be if individuals actively punished anyone they have
helped but who has not offered help in return. Another way would be to punish (or refuse to help

5

5. In some scenarios, there may not be much difference in refusing help and punishing, despite one sounding more active than the
other. If a group of, say, baboons were to terminate all interactions with one of their troop, this would penalize the ostracized
individual as much as if they killed the individual outright. This is one reason why I am troubled by Chandra Sripada's efforts to place
reciprocity-based and punishment-based accounts of moral compliance in opposition to each other (2005). Punishment will often be a
natural concomitant of reciprocityas even Trivers (1971) noted. It should also be noted that refusing to play can be as costly as
administering punishment. If lions were to refuse to share with a free-riding lioness, then they would have to drive her off when she
barged in to share their kill, perhaps risking injury to do so. (As a matter of fact, it turns out that lions are rather tolerant of free-
riders; their helping behaviors seem regulated by mutualism rather than reciprocation. See Heinsohn & Packer, 1995.)
) any individual in whom you have observed a non-reciprocating trait, even if you haven't personally been exploited.
One might go even further, punishing anyone who refuses to punish such non-helpers. The development of such
punishing traits may be hindered by the possibility of higher order defection, since the individual who reciprocates
but doesn't take the trouble to punish non-reciprocators will apparently have a higher fitness than reciprocators who
also administer the punishments. Robert Boyd and Peter Richerson (1992) have shown that this is not a problem so
long as the group is small enough that the negative consequences of letting non-reciprocators go unpunished will be
sufficiently felt by all group members. They argue, however, that we must appeal to cultural group selection in order to
explain punishing traits in larger groups. I have two things to say in response to this last point. First, the reason that
increased group size has such an impact on the effectiveness of punishment strategies is that the multiplication of
interactants amplifies the costs of coercion. But if an increase in group size is accompanied by the evolution of a trait
that allows an individual to spread her punishments more widely at no extra cost, then this consideration is mitigated.
It has been argued (with much plausibility, in my opinion) that language is precisely such a mechanism (see Aiello &
Dunbar, 1993; Dunbar, 1993, 1996; Smith, 2003). Talk, as they say, is cheap, but it allows one to do great harm to
the reputation of a virtual stranger. Second, on the assumption that through the relevant period of genetic natural
selection our ancestors lived in relatively small bandssmall enough, at least, that a person not pulling his or her
weight was a burden on the groupBoyd and Richerson's cogent argument doesn't undermine the hypothesis that an
innate human morality can be explained by reference only to individual selection. Perhaps they are correct that cultural
group selection must be invoked to explain the explosion of human ultra-sociality in the Holocene; and perhaps it is a
process that has contributed a great deal to the content of moral codes. But neither observation is at
end p.265
odds with my hypothesis, since it may be maintained that a biological human moral sense antedates the large-scale
ultra-sociality of modern humans. Indeed, Boyd and Richerson as much as admit this when they allow that moral
emotions like shame and a capacity to learn and internalize local practices existed as genetically coded traits prior to
any spectacular cultural evolution (Richerson et al., 2003, p. 371).
Another trait that might be expected to develop in creatures designed for reciprocation is a faculty dedicated to the
acquisition of relevant information about prospective exchange partners prior to committing to a relationship. Gathering
social information may cost something (in fitness terms), but the rewards of having advance warning about what kind
of strategy your partner is likely to deploy may be considerable. This lies at the heart of Richard Alexander's account
(1987) of the evolution of moral systems. In indirect reciprocal exchanges, an organism benefits from helping another
by being paid back a benefit of greater value than the cost of her initial helping, but not necessarily by the recipient of
the help. We can see that reputations involve indirect reciprocity by considering the following. Suppose A acts
generously toward several conspecifics, and this is observed or heard about by C. Meanwhile, C also learns of B acting
disreputably toward others. On the basis of these observationson the basis, that is, of A's and B's reputationsC
chooses A over B as a partner in a mutually beneficial exchange relationship. A's costly helpfulness has thus been
rewarded with concrete benefits, but not by those individuals to whom he was helpful. Alexander lists three major
forms of indirect reciprocity:
(1) the beneficent individual may later be engaged in profitable reciprocal interactions by individuals who have
observed his behavior in directly reciprocal relations and judged him to be a potentially rewarding interactant
(his reputation or status is enhanced, to his ultimate benefit); (2) the beneficent individual may be rewarded
with direct compensation from all or part of the group (such as with money or a medal or social elevation as a
hero) which, in turn, increases his likelihood of (and that of his relatives) receiving additional perquisites; or (3)
the beneficent individual may be rewarded by simply having the success of the group within which he behaved
beneficently contribute to the success of his own descendants and collateral relatives. (p. 94)
One possible example of indirect reciprocity is the behavior of Arabian babblers, as studied by Amotz Zahavi over many
years (Zahavi & Zahavi, 1997). Babblers are social birds that act in helpful ways toward each other: feeding others,
acting as sentinels, and so on. What struck Zahavi was not this helpful behavior per se but the fact that certain
babblers seem positively eager to help: jostling to act as sentinel, thrusting food upon unwilling recipients. The
handicap principle that Zahavi developed says that such individuals are attempting to raise their own prestige within
the group: signaling Look at me; I'm so strong and confident that I can afford such extravagant sacrifices! Such
displays of robust health are likely to attract the attention of potential mates while deterring rivals, and thus such
behavior is, appearances notwithstanding, squarely in the fitness-advancing camp.

6

6. The connection between indirect reciprocity and the handicap principle is commented on by Nowak & Sigmund, 1998.
end p.266
Consider the enormous and cumbersome affair that is the peacock's tail. Its existence poses a prima facie threat to the
theory of natural selectionso much so that Charles Darwin once admitted that the sight of a feather from a peacock's
tail makes me sick! (F. Darwin, 1887, p. 296). Yet Darwin also largely solved the problem by realizing that the
primary selective force involved in the development of the peacock's tail is the peahen's choosiness in picking a mate.

7

7. I say largely solved since Darwin did not present an explanation of why it is the female who gets to be the choosy one. The
answer is that in many species, females must invest a lot of energy in their offspring, whereas males can hope to get away with
investing very little. This answer was, I believe, first appreciated by the early geneticist Ronald Fisher (1930/1999).
If peahens prefer mates with big fan-shaped tails, then eventually peacocks will have big fan-shaped tails; if peahens
prefer mates with triple-crested, spiraling, red, white, and blue tails, then (ceteris paribus) eventually peacocks will
sport just such tails. Sexual selection is a process whereby the choosiness of mates or the competition among rivals
can produce traits that would otherwise be detrimental to their bearer. I am not categorizing sexual selection in
general as reciprocity, only those examples that involve the favoring of traits of costly helpfulness. If a male is helpful
to a female (bringing her food, etc.) and, as a result, she confers on him the proportionally greater benefit of
reproduction, this is an example of direct reciprocity. If a male is helpful to his fellows in general and, as a result, an
observant female confers on him the proportionally greater benefit of reproduction (thus producing sons who are
generally helpful and daughters who have a preference for helpful males), this is an example of indirect reciprocity.
Just as sexual selection can produce extremely cumbersome physical traits, like the peacock's tail, so too can it
produce extremely costly helping behaviors. We can say the same of reputation in general if the benefits of a good
reputation are great enough. If a good reputation means sharing food indiscriminately with the group, then an
indiscriminate food-sharing trait will develop; if a good reputation means wearing a pumpkin on your head, then a
pumpkin-wearing trait will develop. The same, moreover, can be said of punishment, which is, after all, the flip side of
being rewarded for a good reputation. If a type of self-advancing behavior (or any type of behavior at all) is sufficiently
punished, it will no longer be self-advancing at all (see Boyd & Richerson, 1992).
Once we see that indirect reciprocity encompasses systems involving reputation and punishment, and that these
pressures can lead to the development of just about any traitextremely costly indiscriminate helpfulness included
then we recognize what a potentially vital explanatory framework it is. It is important to note, however, that all that
has been provided in this section is an account of a process whereby prosocial behavior can evolve; the organisms
designed to participate in such relations might be insectsthey need not have a moral thought in their heads.
3 Reciprocity and Altruism
The view I am interested in advocating is that in cognitively advanced creatures moral judgment may add something
to reciprocal exchanges: it may contribute to
end p.267
their success in a fitness-enhancing manner, such that a creature for whom reciprocal relations are important may do
better with a sense of obligation and prohibition guiding her exchanges than she would if motivated solely by
unmoralized preferences and emotions. The advantages of reciprocity, then, may have provided the principal
selective pressure that produced the human moral sense.
Before proceeding, however, a couple of quick objections to the hypothesis should be nipped in the bud. First, it might
be protested that many present-day moral practices have little to do with reciprocation: our duties to children, to the
severely disabled, to future generations, to animals, and (if you like) to the environment all are arguably maintained
without expectation of payback. Yet this objection really misses the mark, for these observations hardly undermine the
hypothesis that it was for regulating reciprocal exchanges that morality evolved in the first place; it is not being
claimed that reciprocity alone is what continues to sustain social relations. Reciprocity may give someone a sense of
duty toward his fellows that causes him to hurl himself on a grenade to save their lives. There is no actual act of
reciprocation therenot even an expectation of onebut nevertheless reciprocity may be the process that brought
about the psychological mechanisms that prompted the sacrificial behavior. Although these mechanisms may have
evolved in order to govern reciprocal exchanges (producing, we might expect, judgments that are highly dependent on
what kind of relation the individuals stand in), it should come as no surprise that social factors might develop that
urge, say, a more universal benevolent attitudeperhaps even encouraging one to initiate and continue relations
irrespective of one's partner's actions (e.g., to turn the other cheek). By comparison, one might hypothesize that
human color vision evolved in order to allow us to distinguish ripe from unripe fruit, but this would hardly imply that
this continues to be the only thing we can do with color vision.
Second, it might be objected that a person enters into a reciprocal relationship for self-gain, and thus is motivated
entirely by selfish ends (albeit perhaps enlightened self-interest)the very antithesis of moral thinking. This objection
is confused. Entering into reciprocal relations may well be fitness-advancing, but this implies nothing about the
motivations of individuals designed to participate in such relations. Even Darwin got this one wrong: in the passage
from The Descent of Man often cited as evidence of his appreciation of the importance of reciprocity in human
prehistory, he attributes its origins to a low motive (Darwin, 1871/2004, p. 156).

8

8. This perhaps should be put down to a sloppy choice of wording, for elsewhere in Descent Darwin argues staunchly against
psychological egoism.
George Williams (1966, p. 94) correctly responds: I see no reason why a conscious motive need be involved. It is
necessary that help provided to others be occasionally reciprocated if it is to be favored by natural selection. It is not
necessary that either the giver or the receiver be aware of this. (I would add that I see no reason that an
unconscious motive need be involved either.) In vernacular English, whether an action is selfish or altruistic
depends largely (if not entirely) on the motives with which it is performed. (Suppose Amy acts in a way that benefits
Bert, but what prompts the action is her belief that she will benefit herself in the long run.
end p.268
Then it is not an altruistic act but a selfish act. Suppose Amy's belief turns out to be false, so that she never receives
the pay-off, and the only person who gains from her action is Bert. This does not cause us to retract the judgment
that her action was selfish.) It follows that creatures whose cognitive lives are sufficiently crude that they lack such
deliberative motives cannot be selfish or altruistic in this everyday sense at all, and yet they may very well be involved
in reciprocal exchanges.
It is standard to distinguish altruism in this psychological sense from evolutionary altruism, which is an altogether
more complex and controversial affair, consisting of a creature lowering its inclusive reproductive fitness while
enhancing the fitness of another.

9

9. On the face of it, evolutionary altruism, as it is here defined, seems impossible. Sober and Wilson, (1998) argue that it is possible
only by invoking group selection, and so long as we take care to avoid what they call the averaging fallacy (pp. 315). Even if their
argument is successful, however, it remains an open question how much of the prosocial behavior observable in nature (bees, ants,
humans, etc.)that is often casually referred to as altruismis an instance of evolutionary altruism.
Reciprocal altruism is not an example of evolutionary altruism (see Sober, 1988); in a reciprocal exchange, neither
party forfeits fitness for the sake of another. As Trivers defined it, altruistic behavior (by which he means helpful
behavior) is that which is apparently detrimental to the organism performing the behavior (1971, p. 35)but
obviously an apparent fitness-sacrifice is not an actual fitness-sacrifice, any more than an apparent Rolex is an actual
Rolex. Others have defined reciprocal altruism as fitness-sacrificing in the short term. But again, foregoing a short-
term value in the expectation of greater long-term gains is no more an instance of a genuine fitness-sacrifice than is,
say, a monkey's taking the effort to climb a tree in the hope of finding fruit at the top. So, despite claims that
reciprocal altruism and kin selection together solve the so-called paradox of evolutionary altruism, if (1) by altruism
we mean fitness-sacrificing (not apparent or short-term fitness-sacrificing), and (2) by fitness we mean inclusive
fitness, and (3) by solving the paradox of evolutionary altruism we mean showing how such altruism is possible, then
I see no reason at all for thinking that this frequently repeated claim is true.
But if reciprocal altruism is altruism in neither the vernacular nor the evolutionary sense, then in what sense is it
altruism at all? The answer is that it is not. I have called it reciprocal altruism in deference to a tradition of 30 years,
but in fact I don't like the term, and much prefer to call it reciprocal exchanges or just reciprocity. What it is is a
process by which cooperative and helpful behaviors evolve, not (necessarily) a process by which altruism evolves. I
add the parenthetical necessarily because it may be that in cognitively sophisticated creatures, altruism, in the
vernacular sense, may evolve as a proximate mechanism for regulating such relations; but it is certainly no necessary
part of the process, since it is also possible that for some intelligent creatures the most efficient way of running a
reciprocal exchange program is to be deliberatively Machiavellianthat is, selfish in the vernacular sense. My point is
that neither motivational structure can be inferred from the fact that a creature is designed to participate in reciprocal
exchanges. Reciprocal partners may enter into such exchanges for selfish motives, or
end p.269
for altruistic motives, or their exchanges may be mere conditioned or hard-wired reflexes properly described neither as
selfish nor altruistic. Genes inhabiting selfishly motivated reciprocating organisms may be soundly out-competed by
genes inhabiting reciprocating organisms who are moved directly by the welfare of their chosen exchange partners.
And genes inhabiting reciprocating organisms motivated additionally by thoughts of moral duty, who will feel guilty if
they defect, may do better still.
4 Ancestral Reciprocity
The lives of our ancestors over the past few million years display many characteristics favorable to the development of
reciprocity. They lived in small bands, meaning that they would interact with the same individuals repeatedly. The
range of potential new interactants was very limited; thus the option of cheating one's partner in the expectation of
finding another with whom one could enter into exchanges (perhaps also to cheat) was curtailed. We can assume that
interactions were, on the whole, quite public, so opportunities for secret uncooperative behaviors were limited. They
lived relatively long liveslong enough, at least, that histories of interaction could developand they probably had
relatively good memories. Some of the important foods they were exploiting came unpredictably in large packages
that is, big dead animalsmeaning that one individual, or group of individuals, would have a great deal of food
available at a time when others did not, but in all likelihood at a later date the situation would be reversed. Large
predators were a problem, and shared vigilance and defense was a natural solution. Infants required a great deal of
care, and youngsters a lot of instruction. Though we don't need to appeal to reciprocity to explain food sharing,
predation defense, or child rearing, what these observations do imply is that several basic forms of currency were
available in which favors could be bestowed and repaid. This means that someone who was, say, unable to hunt could
nevertheless repay the services of the hunter in some other form. If we factor in the development of language, then
we can add another basic currency: the value of shared information. All these kinds of exchanges (the last in
particular) allow for the give-a-large-benefit-for-a-relatively-low-cost pattern that is needed for reciprocity to be
viable.
When we start to list such characteristics, what emerges is a picture of an animal ripe for the development of
reciprocityindeed, it is hard to imagine any other animal for whom the conditions are so suitable. Bearing in mind the
enormous potential of reciprocity to enhance fitness, we might suspect natural selection to have taken an interest, to
have endowed our ancestors (and thus us) with the psychological skills necessary to engage efficiently in such
relations. What kind of skills might these be? I have already mentioned some: a tendency to look for cheating
possibilities; a sensitivity to cheats, a capacity to remember them, and an antipathy toward them; an interest in
acquiring knowledge of others' reputations, and of broadcasting one's own good reputation. We can add to these a
sense of distributive fairness; the capacity to distinguish accidental from intentional defections and an inclination to
forgive injuries of the former kind; and if those
end p.270
participating in a reciprocal exchange are trading concrete goods, then we would expect a heightened sense of
ownership to develop.
Here is not the place to review empirical evidence favoring the view that the human mind has evolved such
tendencies; such support comes from a number of fields: developmental psychology, neuroscience, crosscultural
anthropology, experimental economics, evolutionary psychology, primatology. Let me, however, very briefly gesture
toward some evidence pertaining to the last item mentioneda sense of ownershipon the grounds that the role of
this trait in the evolution of human reciprocity seems underappreciated in the literature, as, indeed, does the fact that
ownership (as opposed to mere possession) is a highly moralized relation. To the extent that trade implies a grasp of
ownership, we find the physical traces of ownership far back in the archaeological record, at least into the early Upper
Paleolithic (Mellars, 1995, pp. 398400; Ofek, 2001), and perhaps far beyond (McBrearty & Brooks, 2000). There is
not a shred of evidence that trade (or reciprocity more generally) is a de novo artifact of modern civilization that
spread from one or more points of cultural invention. It is, rather, like language: ubiquitous and ancient.

10

10. Sometimes we hear tell of societies with no sense of private ownership, but upon examination it turns out that these societies just
own different things from those we (in the West) are familiar with. Certainly there are cultures where land isn't an owned item, and
cultures where there are very few possessions, but there is no human society where the very idea of an item being owned (be it only
articles of clothing, weapons, or a few ornaments) is unknown. Other cultures may also more readily employ the concept of collective
ownershipbut, of course, goods belonging to the family or the tribe are just as much conceived of as property as those belonging to
an individual. As a matter of fact, however, the concept of individual ownership appears to be a human universal.
A sense of ownership, moreover, emerges more or less spontaneously in the course of childhood development, and
surprisingly early: the very first two-word linguistic strings that an infant manages to construct and comprehend often
denote ownership relations (e.g., Mommy sock for Mommy's sock; Markessini & Golinkoff, 1980; see also Brown,
1973). Numerous studies have shown that the vast majority of playroom conflicts among children concern possession
of items, beginning as early as the children are capable of generating any kind of interpersonal conflict at all (see
Bronson, 1975; Dawe, 1934; Smith & Green, 1975). The few grand social experiments that have attempted to
expunge the notion of ownership from the human psychesuch as in the Soviet Union or the kibbutzim of Israelhave
encountered an extremely stubborn opponent. Discussing this phenomenon in the 1950s, the anthropologist Melford
Spiro wrote:
The child is no tabula rasa, who, depending on his cultural environment, is equally amenable to private or
collective property arrangements. On the contrary, the data suggest that the child's early motivations are
strongly directed towards private ownership, an orientation from which he is only gradually weaned by effective
cultural techniques. (1958, pp. 3756)
In admitting that this amounts to no more than a gesture toward the kind of evidence we should be looking for, I don't
mean to suggest that there is a large and
end p.271
overwhelming body of evidence that I'm skirting in the interests of brevity. Whether there really are parts of the
human mind dedicated to ownership or reciprocal exchanges in general, or whether such universal skills are instead
the product of our general all-purpose intelligence, remains to be established, and doing so will not be easy. What we
should not expect from anyone is a deductive argument from demonstrably true premises; rather, we should hope for
a picture of the human mind that fits well with the available evidence and promises to help us make sense of things.
But at least one thing is clear: there is enough evidence supporting this hypothesis that the tired sneer that it is
merely a just-so story is no longer warranted. It is a plausible, coherent, productive, and testable hypothesis, and
there is good reason for looking favorably upon it.
5 Morality and Motivation
But what's morality got to do with it? What is added to the stability of a reciprocal exchange if the interactants think
of cheating as morally odious (say), as opposed to simply having a strong unmoralized disinclination to cheat? Note
that this question is pressing not just for the advocate of the hypothesis presently under discussion but is a good
question for anyone, even someone who thinks that morality is a purely cultural construct. What practical benefit does
distinctively moral thinking bring? Someone seeking to explain morality as a biological phenomenon and invoking only
individual selection may find it useful to tease apart two questions: What benefit does an individual gain by judging
others in moral terms? What benefit does an individual gain by judging himself in moral terms? I will start out
addressing the latter question, though the need to tie this to a discussion of the former will quickly become apparent.
It is natural to suppose that an individual's sincerely judging some available action in a morally positive light increases
her probability of performing that action (likewise, mutatis mutandis, judging an action in a morally negative light). If
reproductive fitness will be served by the performance or the omission of a certain action, then it will be served by any
psychological mechanism that ensures or probabilifies this performance or omission (relative to mechanisms that do so
less effectively). Thus self-directed moral judgment may enhance reproductive fitness so long as it is attached to the
appropriate actions. We have already seen that the appropriate actionsthat is, the fitness-enhancing actionswill,
in many circumstances, include helpful and cooperative behaviors. Therefore, it may serve an individual's fitness to
judge certain prosocial behaviorsher own prosocial behaviorsin moral terms.
The part of the foregoing case that needs development is the premise that moral judgment probabilifies the
performance or omission of actions. There is plenty of empirical evidence to this effect (see Bandura, 1999; Bandura et
al., 1996; Beer et al., 2003; Covert et al., 2003; Ferguson et al., 1999; Keltner, 2003; Keltner et al., 1995; Ketelaar &
Au, 2003; Tangney, 2001), but in what follows I will develop the argument along a particular avenue.
The benefits that may come from cooperationenhanced reputation, for exampleare typically long-term values, and
merely to be aware of and desire these
end p.272
long-term advantages does not guarantee that the goal will be effectively pursued, any more than the firm desire to
live a long life guarantees that a person will give up fatty foods. (The human tendency to discount future gains is well
documented: see Ainslie, 1992; Elster, 1984; Schelling, 1980.) Self-directed moral judgment often does better than
long-term prudential deliberation in securing the correct motivations. If you are thinking of an outcome in terms of
something that you desire, you can always say to yourself But maybe foregoing the satisfaction of that desire
wouldn't be that terrible. If, however, you're thinking of the outcome as something that is desirableas having the
quality of demanding desirethen your scope for rationalizing a spur-of-the-moment devaluation narrows. When a
person believes that an act of cooperation is morally requiredthat it must be practiced whether he likes it or not
then the possibilities for further internal negotiation on the matter diminish. If a person believes an action to be
required by an authority from which he cannot escape, if he believes that in not performing it he will not merely
frustrate himself but will become reprehensible and deserving of disapprobationthen he is more likely to perform the
action. The distinctive value of imperatives imbued with such practical clout is that they silence further calculation,
which is a valuable thing when our prudential calculations can so easily be hijacked by interfering forces and
rationalizations. What is being suggested, then, is that self-directed moral judgments can act as a kind of personal
commitment, in that thinking of one's actions in moral terms eliminates certain practical possibilities from the space of
deliberative reasoning in a way that thinking I just don't like X does not.

11

11. Note that the argument doesn't depend on comparing someone who is motivated by non-moralized sympathy with someone who is
utterly unsympathetic but has a robust rational sense of moral dutya thought experiment familiar to students of Kant. First, we are
granting the moralized person all the sympathies and inclinations of the non-moralized person; the argument is just that moral
judgment adds something to that motivational profile, that it gives her an edge. Nor is the claim that moral thinking always does better
than prudential thinking, for a lot of the time prudential thinking is completely resolute (the knowledge that crossing the highway will
result in your death is probably more motivationally engaging than the judgment that jaywalking is morally forbidden); the argument is
just that moral judgment can step in on those occasions when prudence may falter (in particular when the prudential gain is a
probabilistic long-term affair). Also it must be remembered that moral judgment is not being conceived of here as the cool
intellectualized affair that Kant fancied it to be; an element of what self-directed moral judgment adds to a person's mental life, for
example, is the emotion of guilt. When I say that moral judgment promotes motivation, I am including the motivational efficacy of
certain moral emotions.
In saying this I am in part agreeing with Daniel Dennett (1995), who argues that moral principles function as
conversation-stoppers: considerations that can be dropped into a decision process (be it a personal or interpersonal
decision) in order to stop mechanisms or people from endlessly processing, endlessly reconsidering, endlessly asking
for further justification. Any policy may be questioned, so, unless we provide for some brute and a-rational
termination of the issue, we will design a decision process that spirals fruitlessly to infinity (p. 506). In deciding how
to treat a criminal, the consideration He has a moral right to a fair trial seems to close off further discussion. In
deciding whether to shoplift, the consideration It is wrong to shoplift; I mustn't do it puts an end to deliberations.
Faced with a world in which
end p.273
such predicaments are not unknown, says Dennett, we can recognize the appeal of ... some unquestioning
dogmatism that will render agents impervious to the subtle invasions of hyper-rationality (p. 508).
These thoughts, however, provide only half the answer to the question we are addressing, for one might still wonder
what it is about a moral judgment that makes it function so well as a conversation-stopper. Presumably, non-moral
considerations also often function effectively in this manner; the thought I would die if I did that will in most
circumstances put an end to any further deliberations in favor of performing the action in question. One way of putting
this worry is to ask what motivation-strengthening features moral judgment has that strong (but non-moral) desire
does not have. The worry deepens when we bear in mind that nothing I have said is intended to undermine the truism
that what ultimately determines whether a person acts is the strength of her desires in favor of so acting compared
with her desires against acting; the hypothesis being advocated is that moral judgment bolsters desire. This, then,
leaves us with the questionposed by David Lahti (2003)of why natural selection did not simply make humans with
stronger desires that directly favor cooperation in certain circumstances. After all, for some adaptive behaviors this is
precisely what evolution has granted us. Protective actions toward our offspring, for example, appear to be regulated
by robust raw emotions, not primarily by any moralistic sense of duty. These emotions are by and large stoutly
resistant to the lures of weakness of will: few are tempted to rationalize a course of action that promises short-term
gain while resulting in injury to their beloved infant. Moreover, insofar as our hominid forebears already had in place
the neurological mechanisms for such strong desires, it's something of a mystery why the inherently conservative force
of natural selection would not press into service these extant mechanisms in order to govern any novel adaptive
behavior, rather than fabricating a radically different and biologically unprecedented mechanism for a purpose which
is achieved regularly in nature by much more straightforward means (Lahti, 2003, p. 644). Lahti's challenge must be
addressed.
Whenever an evolutionary psychologist hypothesizes about the presence of a specialized mechanism functioning to
govern an adaptive behavior, the query can always be raised: Why would natural selection bother with that
mechanism? Why wouldn't it simply create an overwhelmingly strong desire to perform that behavior? That there is
something fishy about this question is revealed if we consider some non-moral cases. Think instead about the
psychological reward systems that have evolved in humans regarding sex and eating. One might ask why natural
selection bothered giving us all that complicated physiological equipment needed for having an orgasmwhy not
design us simply to want to have sex? It seems a misguided question. Natural selection did make us want to have sex,
and one of its means of ensuring this desire was precisely the human orgasm. Similarly, natural selection made us
want to eat food, and one of its means of achieving this was to create a creature for whom food tastes good and
hunger feels bad. And perhaps natural selection has made us want to cooperate, and granting us a tendency to think
of cooperation in moral terms is a means of securing this desire. That natural selection may employ a distinctive
means for creating and strengthening a type of fitness-advancing desire is no more mysterious in the moral case than
in the other
end p.274
two cases. Granted, in the moral case we are considering a biologically unprecedented mechanismsomething that
evolved uniquely in the hominid linebut insofar as human social relations are radically different from those of other
animals, a radically different solution may have been necessary. Note also that, despite the conservatism of natural
selection, there is an obvious reason that distinct fitness-advancing behaviors will often require different mechanisms
motivating them: if eating or promise-keeping were rewarded with an orgasm, then an individual might not bother
with sex.
It is still reasonable to inquire what special features a moral judgment might have that render it suited to the
evolutionary task we are speculatively assigning it here. An important part of the answer, I think, concerns the public
nature of moral judgments. That we are now focusing on self-directed moral judgments shouldn't lead us to assume
that we are talking about a private mental phenomenon. There can be private other-directed judgments (e.g.,
ruminating quietly to oneself John's such a bastard), just as there can be publicly announced self-directed judgments
(I want you all to know that I'm thoroughly ashamed of what I did). A moral judgment, even a self-directed one, is
essentially communicative: it is something that may be asserted in the course of collective negotiation, may be
employed to stake a claim, to justify a decision, to provide warrant for a punishment, to criticize or praise another's
conduct or character, or to present evidence of one's own character. The manner in which thinking of a possible course
of action in morally positive terms promotes the motivation to perform it cannot be divorced from this public sphere.
Even when my private conscience guides me to refrain from cheating with the thought Cheating is wrong, I am
aware that this is a consideration that might be brought into the domain of public deliberation if I am required to
justify my actions; I am accepting that, were I to cheat, punishment from others would be warranted. By comparison,
the proposition I just don't like cheating may be brought forward to explain one's actions, but it lacks the normative
justificatory force of a moral consideration.

12

12. I really don't like X can be an element of a justification: I really don't like X, and in these circumstances it is acceptable for my
actions to be guided by my strong preferences. Clearly, though, the latter part of the justification introduces a normative principle.
Often the latter part will be tacit: I like coffee can seem like a perfectly good justification alone for drinking coffee, but that there is
an unspoken premise here (to the effect that one is in circumstances where preferences may legitimately guide action) is obvious if we
compare I like torturing children.
A person's resolve to act (or not to act) is importantly affected by her conception of how others will receive her
decisions, her confidence in whom she can justify herself to, her perception of herself as acting from considerations
that would also move her fellowsin short, her experience of herself as a social being. Lahti's puzzle is solved when
we realize that a moral judgment affects motivation not by giving an extra little private mental nudge in favor of
certain courses of action, but by providing a deliberative consideration that (putatively) cannot be legitimately ignored,
thus allowing moral judgmentseven self-directed onesto play a justificatory role on a social stage in a way that
unmediated desires cannot.
end p.275
This reasoning leads me to supplement the simple hypothesis with which we started: that the evolutionary function of
moral judgment is to provide added motivation in favor of certain adaptive social behaviors. Morally disapproving of
one's own action (or potential action)as opposed to disliking that actionprovides a basis for corresponding other-
directed moral judgments. No matter how much I dislike something, this inclination alone is not relevant to my
judgments concerning others pursuing that thing: I won't pursue X because I don't like X makes perfect sense, but
You won't pursue X because I don't like X makes little sense. By comparison, the assertion of The pursuit of X is
morally wrong demands both my avoidance of X and yours. By providing a framework within which both one's own
and others' actions may be evaluated, moral judgments can act as a kind of common currency for collective
negotiation and decision-making. Moral judgment thus can function as a kind of social glue: bonding individuals
together in a shared justificatory structure, providing a tool for solving many group coordination problems. Of particular
importance is the fact that although a non-moralized strong negative emotional reaction (e.g., anger) may prompt a
punitive response, it takes a moral judgment to supply license for punishment, and thus the latter serves far more
effectively to govern public decisions in a large group than do non-moralized emotions or desires.
One final thing that should be emphasized is that although, for brevity's sake, I have spoken of moral judgments as
bolstering the motivation to cooperate, I don't mean to imply that we are designed to be unconditional cooperators.
The moral sense is not a proclivity to judge cooperation as morally good in any circumstancesomething that looks like
a recipe for disastrous exploitation. By the same token, the fact that we have innate mechanisms dedicated to making
us want to eat, rewarding us with pleasure for doing so, doesn't mean that we eat unconditionally and indiscriminately.
We may be designed to be very plastic with respect to cooperative strategies. How generous one can afford to be, or
how miserly one is forced to be, will depend on how resource-rich one's environment is. Who is a promising partner
and who is a scoundrel is something we learn. One can moralize a conditional strategy, such as Be trusting, but don't
be a sucker. One can moralize non-cooperation, seeing it as forbidden in certain circumstances. The idea being
advocated is that there are adaptive benefits to be had by moralizing the whole plastic social structure. Doing so
prevents under-performance, which is not to be confused with encouraging overperformance. It is true that there is a
sense in which any boost to the motive to cooperate on a token occasion means that one may be encouraged to
commit a practical errorto stick with an exchange relation when one's fitness would really be better served by
cheating. But this is the same sense in which any natural reward system can lead us to occasional and even disastrous
error: The craving for food can lead someone to eat a poisonous plant, and the pleasures of sex can result in making
powerful enemies.
6 Group Selection
I should like to end by commenting on the comparison between the hypothesis outlined in this chaptera hypothesis
ostensibly in terms of individual selectionand
end p.276
the well-known views on group selection put forward by Elliott Sober and David Sloan Wilson in Unto Others (1998). I
will confine myself to three points.
1. Sober and Wilson do not purport to put forward a theory concerning the evolution of morality; the subtitle of their
book is The Evolution and Psychology of Unselfish Behavior. The first part of their book establishes the viability of
altruism in the evolutionary sense (fitness-sacrificing behavior might be a better term), and the second part more
tentatively argues that for cognitively sophisticated creatures like us, it is plausible that altruism in the vernacular,
psychological sense is a proximate mechanism that natural selection might have struck upon for getting us to act in an
appropriate fitness-sacrificing way. But, as I argued earlier, creatures who are altruistic (psychologically), though
perhaps moral in the sense of deserving praise, are not necessarily moral in the sense of evaluating themselves and
each other in moral terms. (Psychological altruism may correctly be called a moral sentiment, but this just draws
attention to the fact that creatures with no cognitive ability to grasp a moral concept or make a moral judgment can
be ascribed a moral sentiment.) If we're interested in the origins of moral judgment, then Sober and Wilson do not
offer a theory. This is not a criticism of them, just an observation of what they do and what they do not attempt.
Indeed, they are perfectly explicit about this, denying two theses: that morality always requires us to sacrifice self-
interest for the sake of others ... [and] that to be motivated by an altruistic desire is the same thing as being
motivated by a moral principle (1998, p. 237).
2. However, though Sober and Wilson do not attempt it, it is perfectly possible that biological group selection could
produce the trait of making moral judgments. If moral judgment reinforces prosocial behavior, then (ceteris paribus) it
will be good for a group to contain members able and disposed to engage in moral thinking. However, it should be
noted that general references to prosociality are rather coarse-grained, and there is probably a more detailed story
to be told about the characteristic subject matter of morality. A number of comprehensive crosscultural studies have
unanimously found certain broad universals in moral systems: (1) negative appraisals of certain acts of harming
others, (2) values pertaining to reciprocity and fairness, (3) requirements concerning behaving in a manner befitting
one's status vis--vis a social hierarchy, and (4) regulations clustering around bodily matters (such as menstruation,
food, bathing, sex, and the handling of corpses) generally dominated by concepts of purity and pollution (see Haidt &
Joseph, 2004, for discussion and references). The first three qualities all pertain directly or indirectly to reciprocal
exchanges. (To see how indirect reciprocity might produce an emphasis on social hierarchy, recall the importance of
reputation to such exchanges.) Given this, we may conclude that if the human moral sense is prepared for any
particular subject matter, it is surely reciprocity; it therefore seems eminently reasonable to assume that reciprocal
exchanges were a central evolutionary problem that morality was designed to solve. Saying this doesn't knock the
other processes out of the running. Group selectionmost probably at the cultural levelmay well have also been a
major factor. But my hunch is that reciprocity, broadly construed, is what got the ball rolling. (The moralization of
disgustgiving rise to taboos concerning food and sex, for exampleI suspect of being a matter of
end p.277
natural selection coopting a motivational mechanism that had conveniently evolved for other initial purposes.)
There is also a body of evidence, alluded to earlier, suggesting that many of the concomitant traits one might expect
would evolve in order to govern reciprocal exchanges are indeed innate features of human psychology: the interest in
acquiring knowledge of others' reputations and in advertising one's own good reputation, our sensitivity to issues of
distributive fairness in exchanges, our capacity to distinguish between accidental and purposeful harms (and our
inclination to forgive the injuries of the former kind), our sensitivity to cheats and our antipathy toward them (our
eagerness to punish them even at material cost to ourselves), and our heightened sense of possession. The crucial
question is whether a moral sense forged by group selection could be expected to exhibit the same attributes. And I
confess to finding this a very difficult question to assess. It is not obvious, for example, that group interests are served
by members having elevated the possession relation into the moralized notion of ownership. It is not obvious that
group interests will be served by members being acutely aware of distributive fairnessafter all, the group might do
just fine, or better, with a terribly inequitable and undeserved distribution of resources. Of course, saying that it is not
obvious doesn't mean it's false. But it is reasonable, I think, at least to conclude that certain features that seem very
central to morality fall smoothly and easily out of the reciprocity hypothesis but follow only with work from the group
selection hypothesis. Hardly a decisive consideration, but a worthwhile dialectical point nonetheless.
What if it turns out that the two hypotheses equally well explain the available evidence? Then, by Sober and Wilson's
own methodological lights, we should plump for the explanation in terms of individual selection (1998, p. 126). With
careful reservations, they endorse George Williams's principle of parsimony that one should postulate adaptation at no
higher a level than is necessitated by the facts (1966, p. 262). Their corollary is that this does not allow one to
reject a multilevel selection hypothesis without consulting the data. ... Multilevel selection hypotheses must be
evaluated empirically on a case-by-case basis, not a priori on the basis of a spurious global principle (p. 126). Quite
so. By merely putting forward a hypothesis, I don't take myself to have established anything in advance of empirical
evidence; but it is good to have options on the table before we start digging.
3. Finally, I want to acknowledge, but reject as uninteresting, the possibility argued for by Sober and Wilson that
reciprocal altruism is really just a special form of group selection, involving a group of two (in the case of a
straightforward direct reciprocal relation). For Sober and Wilson, the relevant notion of a group constituting a vehicle of
selection is a trait group: a population of n individuals (where n > 1) that influence each other's fitness with respect
to a certain trait but not the fitness of those outside the group (1998, p. 92). Kim Sterelny (1996) has argued
plausibly that there is a difference in kind between groups that satisfy the foregoing criterion (including partners in
reciprocal exchanges) and the superorganisms often used as paradigmatic examples of group selection (including
especially colonies of social insects). Examples of the latter category exhibit an extreme degree of cohesion and
integration; their members share a common fate; and such groups possess adaptations that cannot be equivalently
redescribed at the individual level (e.g., the
end p.278
tendency of newly hatched queens to kill their sisters). Such groups have as respectable a claim to being robustly
objective vehicles of selection as do organisms. Concerning examples of the former category, by contrast, the decision
to describe selection as occurring at the level of the group is a purely optional one, for this group-level description is
equivalent to an individual-level description. Regarding this category, Sterelny (following Dugatkin & Reeve, 1994)
advocates a pluralistic approach, where the only difference between preferring individuals or trait groups as the vehicle
of selectionthat is, of regarding the process as one of individual selection or group selectionis a heuristic one,
depending on our explanatory and predictive interests (p. 572).
Going along with Sterelny, I am willing to concede that, on a certain liberal understanding of what it takes to be a
group, reciprocal relations may count as group-selected, or they can be equivalently described in terms of individual
selection. Any debate on the matter, says John Maynard Smith, is not about what the world is like ... [but] is largely
semantic, and could not be settled by observation (1998, p. 639). But it is clear that there is a kind of group-selective
process that they are not an example of: what Sterelny calls superorganism selection (1996, p. 577). One could
argue that human cooperative faculties (e.g., morality) are the product of superorganism selection, or one might
instead argue that they may be explained by invoking only, say, reciprocity. These are quite distinct hypotheses, and it
cannot be reasonably denied that if we were unable to distinguish between them, due to a methodological decision to
lump reciprocity (along with kin selection and the extended phenotype) under the umbrella term of group selection,
this would be an unacceptable loss of explanatory detail in the service of theoretic unification.
end p.279
17 A Framework for the Psychology of Norms
Chandra Sekhar Sripada
Stephen Stich
No concept is invoked more often by social scientists in the explanations of human behavior than norm.
Encyclopedia of the Social Sciences
Humans are unique in the animal world in the extent to which their day-to-day behavior is governed by a complex set
of rules and principles commonly called norms. Norms delimit the bounds of proper behavior in a host of domains,
providing an invisible web of normative structure embracing virtually all aspects of social life. People also find many
norms to be deeply meaningful. Norms give rise to powerful subjective feelings that, in the view of many, are an
important part of what it is to be a human agent. Despite the vital role of norms in human lives and human behavior,
and the central role they play in explanations in the social sciences, very little systematic attention has been devoted
to norms in cognitive science. Much existing research is partial and piecemeal, making it difficult to know how
individual findings cohere into a comprehensive picture. Our goal in this essay is to offer an account of the
psychological mechanisms and processes underlying norms that integrates what is known and can serve as a
framework for future research.
In section 1, we'll offer a preliminary account of what norms are. In sections 2 and 3, we'll assemble an array of facts
about norms and the psychology that makes them possible, drawn from a variety of disciplines. Though the distinction
is not a sharp one, in section 2, we'll focus on social level facts, while in section 3, our focus will be on how norms
affect individuals. In section 4, we'll offer a tentative hypothesis about the innate psychological architecture subserving
the acquisition and implementation of norms, and explain why we believe an architecture like the one we propose can
explain many of the facts assembled in sections 2 and 3. Section 5, the last and longest section, focuses on open
questionsimportant issues about the cognitive science of norms that our account in section 4 does not address. In
some cases, we've left these issues open because little is known about them; in other cases, more is known but crucial
questions are still very much in dispute. Though we are acutely aware that our account of the psychology of norms
leaves many important questions unanswered, we hope that the framework we provide will contribute to
end p.280
future research by clarifying some of those questions and offering an overview of how they are related.

1

1. One issue we won't consider is how the psychological mechanisms we'll posit might have evolved. We believe that one of the
advantages of the account we'll offer is that there is a plausible account of the evolution of these mechanisms. But assembling this
evolutionary story is a substantial project which we won't attempt to undertake here.
1 A Preliminary Characterization of Norms
We'll begin with an informal and provisional account of what we mean when we talk of norms. As we use the term, a
norm is a rule or principle that specifies actions that are required, permissible, or forbidden independently of any legal
or social institution. Of course, some norms are also recognized and enforced by social institutions and laws, but the
crucial point is that they needn't be. To emphasize this fact, we'll sometimes say that norms have independent
normativity. Closely linked to the independent normativity of norms is the fact that people are motivated to comply
with norms in a way that differs from their motivation to comply with other kinds of social rules. Very roughly, people
are motivated to comply with norms as ultimate ends, rather than as a means to other ends; we'll refer to this type of
motivation as intrinsic motivation, and we'll have much more to say about it in section 3. People can also be motivated
to comply with a norm for instrumental reasons, though intrinsic compliance motivation adds a substantial additional
motivational force. Violations of norms, when they become known, typically engender punitive attitudes, like anger,
condemnation, and blame, directed at the norm violator, and these attitudes sometimes lead to punitive behavior.
We believe that norms, as we've characterized them, are an important and theoretically useful subcategory of social
rules, and that our characterization is broadly in line with other accounts, both historical and more recent (see
Durkheim, 1903/1953; McAdams, 1997; Parsons, 1952; Petit, 1991). However, it is worth emphasizing that our
account of norms is not intended as a conceptual analysis or an account of what the term norm means to ordinary
speakers. Nor do we offer our characterization of norms as a formal definition. At best, it gives a rough-and-ready way
to pick out what we believe is a theoretically interesting natural kind in the social sciences. If the framework for a
psychological theory of norms set out in section 4 is on the right track, then a better account of the crucial features of
norms can be expected to emerge as that theory is elaborated. One of the components of our framework is a norm
database, and it is the theory's job to tell us what can and cannot end up in that database.
Though there are a substantial number of empirically well-supported generalizations about norms, those generalizations
and the evidence for them are scattered in the literatures of a number of different disciplines. In the next two sections,
we'll assemble some of these generalizations and say a bit about the evidence for each. We'll begin with social-level
features of norms, and then turn to individual-level facts about the ways norms are acquired and how they influence
behavior.
end p.281
2 Some Social-Level Facts about Norms
Norms are a cultural universal. The ethnographic database strongly suggests that norms and sanctions for norm
violations are universally present in all human societies (Brown, 1991; Roberts, 1979; Sober & Wilson, 1998).
Moreover, there is reason to think that the universal presence of norms is very ancient. There is no evidence that
norms originated in some society and spread by contact to other societies in the relatively recent past. Rather, norms
are reliably present and are highly elaborated in all human groups, including hunter-gatherer groups and groups that
are culturally isolated. This is just what we would expect on the hypothesis that norms are very ancient. All of this, we
think, suggests that there are innate psychological mechanisms specialized for the acquisition and implementation of
norms, since the existence of these mechanisms would help explain the universal presence of norms in all human
groups.
In addition to being present in all cultures, norms tend to be ubiquitous in the lives of people in those cultures. They
govern a vast array of activities, ranging from worship to appropriate dress to disposing of the dead. And while some
norms deal with matters that seem to be of little importance, others regulate matters like status, mate choice, food,
and sex that have a direct impact on people's welfare and their reproductive success.
Although norms are present in all human groups, one of the most striking facts about them is that the contents of the
norms that prevail in different groups are quite variable. Moreover, these differences follow a characteristic pattern in
which there is substantial homogeneity in the norms that prevail within groups and both commonalities and differences
in the norms that prevail across groups. We believe that the distributional pattern of norms is an important source of
evidence about the psychological mechanisms that underlie them. For this reason, we'll spend some time discussing
the issue in more detail.
In assessing the distribution of norms across human groups, one question that immediately arises is: Are there any
norms that are universally present in all human groups? The question must be handled with some care, since many
candidate norm universals are problematic because they verge on being analytictrue in virtue of meaning alone. For
example Murder is wrong or Theft is wrong don't count as legitimate universals since, roughly speaking, murder
simply means killing someone else in a way that is not permissible, and theft simply means taking something from
another in a way that is not permissible. For this reason, it is important, wherever possible, to frame the contents of
norms in a nonnormative vocabulary. While analytic principles like Murder is wrong and Theft is wrong may be
universals, the specific rules that regulate the circumstances under which killing or taking an item in the possession of
another person is permitted are not so nearly uniform across groups.
With this caveat in mind, we return to the question of the distributional pattern of norms across human groups. One
important fact is that there is a pattern to be discerned; norms are not indefinitely variable or randomly distributed
across human groups. Rather, there are certain kinds of norms one sees again and again in almost all human societies,
though in order to discern these commonalities, one has to stay
end p.282
at a fairly high level of generality. For example, most societies have rules that prohibit killing, physical assault, and
incest (or sexual activity with one's kin). In addition, most societies have rules promoting sharing, reciprocating, and
helping, at least under some circumstances (Cashdan, 1989). Most societies have rules regulating sexual behavior
among various members of society, and especially among adolescents (though the content of these rules varies
considerably) (Bourguignon & Greenbaum, 1973). And most societies have at least some rules that promote
egalitarianism and social equality. For example, in nearly all hunter-gatherer groups, attempts by individuals to garner
a disproportionate share of resources, women, or power are disapproved of sharply (Boehm, 1999). Examples like
these could be multiplied easily in domains such as social justice, kinship, marriage, and many others.
While there is no doubt that there are certain high-level commonalities in the norms that prevail across groups, as one
looks at norms in more detail, it is clear that there is tremendous variability in the specific rules one finds in different
groups. Consider, for example, norms dealing with harms. While some kind of harm norm or other is found in virtually
all human groups, the specific harm norms that prevail across groups are quite variable. In some simple societies,
almost all harm-causing behaviors are strongly prohibited. Among the Semai, an aboriginal people of the Malaysian
rain forest, for example, hitting and fighting, as well as more mundane behaviors such as insulting or slandering, are
all impermissible, and Semai groups have among the lowest levels of violence of any human societies (Robarchek &
Robarchek, 1992). But other groups permit a much wider spectrum of harm-causing behaviors. In groups such as the
Yanomano of South America, the use of violence to settle conflicts is permitted (and indeed extremely common), and
displays of fighting bravado are prized rather than condemned (Chagnon, 1992). Among the Yanomano, mortality due
to intra- and intertribe conflict is extremely high, and some ethnographers have suggested that the level of mortality
due to violence found among the Yanomano is not at all uncommon in simple societies (Keeley, 1996). In addition to
variability in the kinds of harm and level of harm that are permitted, harm norms also differ with respect to the class
of individuals a person is permitted to harm. Many groups draw a sharp distinction between harms committed against
individuals within one's own community and individuals outside the group (though many groups do not draw such a
sharp distinction; LeVine & Campbell, 1972). Moreover, some societies permit some kinds of violence directed against
women, children, animals, and also certain marginalized subgroups or castes (Edgerton, 1992). The variability in harm
norms is also evidenced by the manner in which they change over time. The philosopher Shaun Nichols (2004, ch. 7)
provides a fascinating description of the gradual change in harm norms in Western societies over the last 400 years.
Incest prohibitions are another case in which high-level commonalities are found in conjunction with variability at the
level of specific rules. It appears that almost all societies have norms prohibiting sexual intercourse between members
of the nuclear family (we'll call these nearly universal rules core incest prohibitions). But incest prohibitions almost
always extend beyond this core. In particular, incest prohibitions almost always extend to other kinds of sexual activity,
and they almost
end p.283
always extend beyond just the nuclear family; they prohibit sexual activity with at least some members of one's
nonnuclear kin. But the details of how incest prohibitions extend beyond core incest prohibitions are, as numerous
studies have revealed, tremendously variable (Murdock, 1949). For example, at one extreme are exogamous groups, in
which marriage with anyone within one's own tribal unit is considered incestuous, though the offense is seldom seen as
being of the same level of severity as intercourse within one's nuclear family.
Another feature of the distributional pattern of norms is that while most groups have some rule or other that falls
under certain high-level themes, generalizations about commonalities in the norms found across groups typically have
exceptions. For example, the incest prohibition is sometimes cited as the best example of a norm that is a universal
feature of all human groups. And while it is true that core incest prohibitions can be found in virtually all groups, even
this generalization may not be exceptionless. There is good evidence that brother-sister marriage (including sexual
relations) occurred with some frequency in Egypt during the Roman period, and was practiced openly and unabashedly.
In addition, brother-sister marriage is known to have occurred in a number of royal lineages, including those of Egypt,
Hawaii, and the Inca empire (Durham, 1991).
To sum up, we've identified three key features of the distributional pattern of norms. First, norms tend to cluster under
certain general themes. Second, the specific rules that fall under these general themes are quite variable, though
clearly thematically connected. And third, there are typically at least some exceptions that diverge from the general
trend.
3 Some Individual-Level Facts about Norms
We turn now to some facts about how norms emerge within individuals, and how individuals are affected by the norms
they acquire. There is excellent evidence indicating that norms exhibit a reliable pattern of ontogenesis. Regardless of
their biological heritage, almost everyone (excepting those with serious psychological deficits) acquires the norms that
prevail in the local cultural group in a highly reliable way. In no human group is it the case that some individuals
reliably acquire the prevailing norms while many others don't. It also appears that all individuals acquire at least some
norms of their group relatively early in life. All normal children appear to have knowledge of rules of a distinctly
normative type between three and five years of age, and can distinguish these normative rules from other social rules
(Nucci, 2001; Turiel, 1983). In addition, some competences associated with norms, such as the ability to reason about
normative rules and rule violations, appears very early. Denise Cummins has shown that children as young as three to
four perform substantially better on deontic rule reasoning tasks than they do on similar indicative reasoning tasks
(Cummins, 1996).
Further evidence about the ontogenesis of norms comes from a major crosscultural study in which Henrich and his
colleagues investigated norms of cooperation and fairness in 15 small-scale societies using standard experimental game
paradigms. (We'll discuss these games more fully later.) While this study found considerable diversity in the norms of
cooperation and fairness prevailing in these
end p.284
societies, it also found that much of the crosscultural variation in norms among adults was already present by the time
subjects reached the age of nine, and it persists thereafter (Henrich et al., 2001). In another crosscultural experimental
study, Shweder and his colleagues examined moral norms in children and adults in Hyde Park, Illinois, and
Bhubaneswar, India (Shweder et al., 1987). As in Henrich and colleagues' study, there were lots of differences in the
norms that prevailed in the two communities, and most of the differences were already established by the time
subjects reached the age of seven.
Perhaps the most striking (and most overlooked) feature of norms is that they have powerful motivation effects on the
people who hold them. Philosophers have long emphasized that from a subjective perspective, moral norms present
themselves with a unique kind of subjective authority that differs from standard instrumental motivation. We believe
that this philosophical intuition reflects a deep empirical truth about the psychology of norms, and we refer to the type
of motivation associated with norms as intrinsic motivation. Our claim is that people are disposed to comply with
norms even when there is little prospect for instrumental gain, future reciprocation, or enhanced reputation, and when
the chance of being detected for failing to comply with the norm is very small. The claim we are making must be
treated with care, however. At any given time, a person may be subject to multiple sources of motivation. So in some
cases in which people are intrinsically motivated to comply with a norm, they may also be instrumentally motivated to
comply with the norm. In other cases in which people are intrinsically motivated to comply with norms, they may
nonetheless fail to comply for instrumental reasons. So our claim is not that people always follow norms or that when
they follow norms they do so only because of intrinsic motivation. Rather, our claim is that humans display an
independent intrinsic source of motivation for norm compliance, and thus that people are motivated to comply with
norms over and above (and to a substantial degree over and above) what would be predicted from instrumental
reasons alone.
There is an implication of our claims about intrinsic motivation that is worth emphasizing. Many norms, though by no
means all, direct individuals to behave unselfishly. More precisely, many norms direct individuals to behave in ways
that are contrary to what would in fact maximize satisfaction of their selfish preferences. Thus, in saying that people
are intrinsically motivated to comply with norms, we are committed to the claim that people are motivated to comply
in a way that frequently leads them to behave genuinely unselfishly. While philosophers have taken the claim that
people are intrinsically motivated to comply with norms to be obvious and platitudinous, economic theorists and
evolutionary-minded scientists have often argued that such behavior is very implausible from the perspective of selfish
rationality (see Barash, 1979, pp. 135, 167; Downs, 1957). We believe the arguments used by these theorists are
deeply flawed. But a full rebuttal would take us far from the current topic, and here we instead emphasize that the
claim that people are intrinsically motivated to follow norms has substantial direct empirical justification.
Some of this evidence comes from anthropology and sociology. A central principle of these disciplines is that people
internalize the norms of their group. According to the internalization hypothesis, individuals exhibit a characteristic
style of motivation in which the individual intrinsically values compliance with moral
end p.285
rules even when there is no possibility of sanction from an external source (Durkheim, 1912/1968; Scott, 1971).
Internalization is invoked to explain a seemingly obvious and ubiquitous fact: having been taught to comply with the
moral rules of their group, people exhibit a lifelong pattern of highly reliable compliance with the rule. Furthermore,
this pattern of compliance does not seem to depend on overt coercion, or even the threat of coercion, at each
particular instance in which compliance is displayed. Consistent with the internalization hypothesis, the ethnographic
record routinely reports that people view norms as being distinctive because of their absoluteness, their authority, and
the manner in which people regard them as deeply meaningful (see Edel & Edel, 2000). These features of norms
suggest that norm compliance is based on something over and above instrumental motivation.
Closer to home, the economist Robert Frank (1988) has pointed out a number of cases of norm compliance in day-to-
day life that are not plausibly viewed as the product of instrumental rationality. His examples include tipping at a
highway restaurant one will never revisit, jumping in a river to save a drowning person, refraining from littering on a
lonely beach, returning a lost wallet containing a substantial amount of cash, and many others.
Though descriptive data of this sort is compelling enough, a problem for those who wish to defend the claim that
people intrinsically comply with norms is that it is easy for skeptics to concoct a selfish instrumental motive for what
superficially appears to be intrinsic compliance behavior. For this reason, experimental data that can distinguish the
competing hypotheses is crucial. The social psychologist C. Daniel Batson has, over the course of a number of years,
extensively studied the motivational structure of helping behavior using a number of ingenious experimental paradigms.
Batson finds that helping behavior is best accounted for on the hypothesis that people promote the welfare of others
as an ultimate end (especially when their empathy is engaged) and not on alternative hypotheses that treat helping as
instrumental toward ulterior benefits such as future reciprocation, or gaining social approval (Batson, 1991). There is
now a large literature in sociology and social psychology that reaches a similar conclusion. Reviewing this literature,
Pilliavin and Charng note:
There appears to be a paradigm shift away from the earlier position that behavior that appears to be altruistic
must, under closer scrutiny, be revealed as reflecting egoistic motives. Rather, theory and data now being
advanced are more compatible with the view that true altruismacting with the goal of benefiting another
does exist and is part of human nature. (1990, p. 27)
But perhaps the most compelling data indicating that people follow norms as ultimate ends comes from experimental
economics, where people's motivations to comply with norms of fairness and reciprocity can be precisely detected and
quantified. There is now abundant evidence that in experimental games, subjects cooperate at levels far higher than
instrumental rationality alone would predict. For example, subjects routinely cooperate in one-time only, anonymous
prisoner's dilemma games (Marwell & Ames, 1981). In such games, choosing to cooperate is the fair thing to do,
while choosing to defect will earn the subject a higher payoff,
end p.286
regardless of what the other person chooses. Furthermore, these results are obtained even when subjects are explicitly
told that they will play the game only once, and their identity will remain anonymous. The fact that subjects still
routinely choose to cooperate suggests that that they are complying with norms of fairness and reciprocity as an
ultimate end, rather than pursuing what would satisfy their selfish preferences. There are a large number of other
kinds of games, such as public goods games, the ultimatum game, the centipede game, and others in which similar
results have been obtained (see Thaler, 1992, especially chaps. 2 and 3, for a review).
In addition to emphasizing the intrinsic nature of motivations to comply with moral norms, philosophers have also
recognized the intrinsic nature of motivation to punish norm violations. Kant, famously, was a retributivist who held
that punishment for violations of moral norms is a moral duty and is intrinsically valuable, and a substantial number of
other philosophers have endorsed the retributivist position (Kant, 1887/1972, pp. 1027; see Ezorsky, 1972, ch. 2,
sec. 2). Other philosophers associated with distinct moral traditions have also recognized the important role of duties
to punish in the moral domain. Mill, for example, maintains that moral violations are the ones that we feel that society
ought to punish (Mill, 1863/1979, ch. 5). And a number of other philosophers have advanced similar claims (Gibbard,
1990, ch. 3; Moore, 1987). Here, again, we believe that these philosophical intuitions reflect a deep descriptive truth.
Before discussing the empirical literature on intrinsic motivation to punish, it's worth reemphasizing some of the
caveats made earlier. In claiming that people are intrinsically motivated to punish norm violations, we are not claiming
that these motivations always translate into punitive behaviors. Human motivations are multifaceted and complex, and
people with intrinsic motivations to punish a norm violator may also have instrumental motivations not to punish. Thus
motivations to punish serve to raise the probability of punitive behaviors, though they needn't translate into punitive
behaviors in every instance. Furthermore, we are not claiming that every norm violation generates intrinsic motivations
to punish. Rather, our claim is that norm violations that have the appropriate salience and severity generate
motivations to punish. So while there is a reliable connection between norm violations and motivations to punish, this
connection need not be realized in every occurrence of a norm violation.
There is a large anthropological and sociological literature attesting to the fact that norm violations elicit both punitive
emotions like anger and outrageand punitive behaviors like criticism, condemnation, avoidance, exclusion, or even
physical harmfrom most people within a society, and that these attitudes and behaviors are directed at rule violators
(Roberts, 1979; Sober & Wilson, 1998). Furthermore, many social scientists have explicitly noted that punishment for
norm violation, of this informal type, is universally present in all societies. For example, ostracism is a human universal
(Brown, 1991); gossip and criticism are human universals (Dunbar, 1996; Wilson et al., 2000); and in all human
groups, systems of sanctions, which utilize ostracism and gossip, as well as other informal sanctions, are applied to
those who violate moral norms (Black, 1998; Boehm, 1999).
But here, again, it might be argued that, though there is ample evidence that people are disposed to punish norm
violators, they do so for strictly selfish instrumental
end p.287
reasons. For example, people may punish to send a message to the violator, which produces a selfish gain for the
punisher because the violator is deterred from repeating the offense. However, there is good evidence that motivations
to punish are often truly intrinsic, and that punishment is not inflicted for selfish instrumental reasons alone.
One particularly striking finding is reported in Haidt and Sabini (2000). In this study, subjects were shown films in
which a normative transgression occurs. Subjects were offered various alternatives endings; they preferred endings in
which the perpetrators of the transgression were made to suffer, knew the suffering was repayment for the
transgression, and suffered in a way that involved public humiliation. More revealingly, though, subjects were also
offered an alternative ending in which the perpetrator realized what he did was wrong, showed genuine remorse, and
grew personally as a result. Subjects' rejection of this ending suggests that their motivation to punish is not based on
selfish instrumental ends, such as avoiding being harmed by the perpetrator in the future. Rather, they appear to be
motivated by intrinsic motivations to punish the violator.
The most powerful evidence for intrinsic motivation to punish norm violations comes from experimental economics.
Since the early 1990s, there has been a surge of interest in experimental economics in studying people's motivations
to punish in controlled laboratory conditions. A large number of studies show that in various experimental situations
and experimental games, people will punish othersat substantial costs to themselvesfor violations of normative
rules or a normative conception of fairness. This data is particularly powerful because it permits quantitative measures
of the extent to which motivations to punish are unselfish and instrumentally irrational.
To illustrate the pattern of results in the literature, we'll describe a study by Fehr and Gachter (2002). In this study,
240 subjects played a public goods game in groups of four. Each member of the group was given 20 monetary units
(MUs) and could either invest in a group project or keep the money for himself. For each unit invested, each of the
four group members received four-tenths of an MU back. If a subject chose not to invest, he kept the full one unit.
Given these payoffs, if all the subjects invest fully, each receives 32 units. If all subjects choose not to invest, each
receives 20 units. Of course, if one subject chooses not to invest but the others invest fully, the free-riding subject
receives the highest payoff, 44 MUs. Thus, the public goods game sets up a conflict between collective benefit and
selfish interest.
Fehr and Gachter studied behavior in the public goods game under two conditiona punishment condition and a no
punishment condition. In the punishment condition, after each period of the game (a period consisted of one round of
investment), subjects were informed