You are on page 1of 277

Statistics in Psychology

An Historical Perspective
Second Edition
This page intentionally left blank
Statistics in Psychology
An Historical Perspective
SecondEdition
Michael Cowles
York University, Toronto
LAWRENCE ERLBAUM ASSOCIATES, PUBLISHERS
2001 Mahwah,New Jersey London
Copyright 2001by Lawrence Erlbaum Associates, Inc.
All rights reserved.No part of this bookmay bereproduced
in anyform, by photostat, microform, retrieval system, or any
other means, without the prior written permissionof thepublisher.
Lawrence Erlbaum Associates, Inc., Publishers
10 IndustrialAvenue
Mahwah,NJ 07430
Coverdesignby Kathryn Houghtaling Lacey
Library of Congress Cataloging-in-Publication Data
Cowles, Michael, 1936--
Statisticsin psychology: anhistorical perspective / Michael Cowles.--
2
nd
ed.
p. cm.
Includes bibliographical references (p.) andindex.
ISBN 0-8058-3509-1(c: alk. paper)-ISBN 0-8058-3510-5(p: alk. paper)
1. PsychologyStatisticalmethodsHistory. 2. Social sciences
Statisical methodsHistory.I. Title.
BF39.C67 2000
150'.7'27~dc21 00-035369
The final camera copyfor this workwaspreparedby theauthor,andthereforethe
publisher takesno responsibilityfor consistencyor correctnessof typographical style.
However, this arrangement helps to make publicationof this kindof scholarshippossible.
Books publishedby Lawrence Erlbaum Associates areprintedon
acid-freepaper,andtheir bindingsarechosenfor strengthanddurability.
Printedin theUnited Statesof America
1 0 9 8 7 6 5 4 3 2 1
Contents
Preface ix
Acknowledgments xi
1 The Development of Statistics 1
Evolution, Biometrics,andEugenics 1
The Definition of Statistics 6
Probability 7
The Normal Distribution 12
Biometrics 14
Statistical Criticism 18
2 Science,Psychology,and Statistics 21
Determinism 21
ProbabilisticandDeterministic Models 26
ScienceandInduction 27
Inference 31
Statisticsin Psychology 33
3 Measurement 36
In Respectof Measurement 36
SomeFundamentals 39
Error in Measurement 44
Vi CONTENTS
4 The Organization of Data 47
The Early Inventories 47
Political Arithmetic 48
Vital Statistics 50
Graphical Methods 53
5 Probability 56
The Early Beginnings 56
The Beginnings 57
The Meaningof Probability 60
Formal Probability Theory 66
6 Distributions 68
The Binomial Distribution 68
The Poisson Distribution 70
The Normal Distribution 72
7 Practical Inference 77
Inverse Probability
andthe Foundationsof Inference 77
FisherianInference 81
Bayesorp< . 05? 83
8 Sampling and Estimation 85
RandomnessandRandom Numbers 85
Combining Observations 88
Samplingin Theory and inPractice 95
The Theoryof Estimation 98
The Battle for Randomization 101
9 Sampling Distributions 105
The Chi-SquareDistribution 705
The t Distribution 114
The F Distribution 121
The Central Limit Theorem 124
CONTENTS VJJ
10 Comparisons, Correlations,
and Predictions 127
Comparing Measurements 727
Galton'sDiscoveryof Regression 129
Galton' s Measureof Co-relation 138
The Coefficientof Correlation 141
Correlation- Controversies
andCharacter 146
11 Factor Analysis 154
Factors 154
The Beginnings 156
Rewriting the Beginnings 162
The Practitioners 164
12 The Designof Experiments 171
The Problemof Control / 71
Methodsof Inquiry 1 73
The Conceptof Statistical Control 176
The Linear Model 181
The Designof Experiments 752
13 Assessing Differencesand Having Confidence 186
FisherianStatistics 186
The Analysisof Variance 187
Multiple Comparison Procedures 795
ConfidenceIntervals
andSignificanceTests 199
A Note on 'One-Tail'
and Two-Tail' Tests 203
14 Treatments and Effects: The Rise of ANOVA 205
The Beginnings 205
The Experimental Texts 205
The Journalsand thePapers 207
The Statistical Texts 272
Expected Means Squares 273
Viii CONTENTS
15 The Statistical Hotpot 216
Times of Change 216
NeymanandPearson 217
StatisticsandInvective 224
Fisher versus Neyman andPearson 228
Practical Statistics 233
References 236
Author Index 254
Subject Index 258
Preface
In this secondedition I have made some corrections to errors in algebraic
expressions that I missedin the firstedition and Ihavebriefly expandedon some
sections of the original whereI thought such expansion would make the
narrativecleareror moreuseful. The main changeis theinclusionof two new
chapters;one onfactor analysisand one on therise of the use ofANOVA in
psychologicalresearch. I am still of theopinion that factor analysis deserves its
own historical account,but I ampersuaded that the audiencefor sucha work
would be limited werethe early mathematical contortions to befully explored.
I have triedto providea brief non-mathematical background to its arrival on the
statistical scene.
I realized thatmy accountof ANOVA in the first edition did not dojustice
to thestory of its adoptionby psychology,andlargely due to myre-readingof
the work of Sandy Lovie(of the University of Liverpool, England)and Pat
Lovie (of Keele University, England), who always writepapersthat 1 wish 1 had
written, decidedto try again.I hope thatthe Lovies will not be toodisappointed
by my attempt to summarize their sterling contributions to thehistory of both
factor analysisandANOVA.
As before,any errorsandmisinterpretationsare myresponsibility alone. I
would welcome correspondence that points to alternative views.
I would like to give special thanksto thereviewersof the first edition for
their kind commentsand allthosewho have helpedto bring aboutthe revival
of this work.In particular Professor Niels Waller of Vanderbilt University must
be acknowledgedfor his insistentandencouraging remarks. I hope thatI have
ix
X PREFACE
deserved them. My colleaguesand manyof my studentsat York University,
Toronto, have been very supportive. Those students, both undergraduate and
graduate,who have expressed their appreciation for my inclusionof some
historical backgroundin my classeson statisticsandmethod have givenme
enormous satisfaction. This relatively short account is mainly for them, and
I hopeit will encourage some of themto explore someof these mattersfurther.
There seemsto be aslow realization among statistical consumers in
psychology that there is moreto theenterprise than null hypothesis significance
testing,andother controversies to exerciseus. It isstill my firm belief that just
a little more mathematical sophistication andjust a little more historical
knowledge woulddo agreatdealfor the way wecarry on ourresearch business
in psychology.
The editorsandproduction peopleat Lawrence Erlbaum, ever sharp and
efficient, get onwith the job andbring their expertiseandsensible adviceto the
project and Ivery much appreciate their efforts.
My wife hassacrificeda great dealof hertime andgiven me considerable
help with thefinal stagesof this revisionand she and my family, even yet,put
up with it all. Mere thanksare notsufficient.
Michael Cowles
Acknowledgments
I wish to expressmy appreciationto anumberof individuals, institutions,and
publishersfor granting permissionto reproduce material that appearsin this
book:
Excerpts from Fisher Box, J. (c) 1978,R. A. Fisher thelife of a scientist,from
Scheffe,H. (1959) Theanalysisof variance,andfrom Lovie, A. D & Lovie, P.
Charles Spearman, Cyril Burt, and theorigins of factor analysis. Journal of the
History of the Behavioral Sciences, 29, 308-321.Reprintedby permissionof
John Wiley& Sons Inc.,the copyright holders.
Excerptsfrom a numberof papersby R.A. Fisher. Reprinted by permissionof
Professor J.H. Bennett on behalf of the copyright holders,the University of
Adelaide.
Excerpts, figures andtables by permissionof Hafner Press,a division of
Macmillan Publishing Companyfrom Statistical methods for research workers
by R.A. Fisher. Copyright (c) 1970by University of Adelaide.
Excerpts from volumes of Biometrika. Reprintedby permission of the
Biometrika Trusteesandfrom Oxford University Press.
Excerpts from MacKenzie, D.A. (1981). Statisticsin Britain 1865-1930.
Reprintedby permissionof theEdinburgh University Press.
XI
Xii ACKNOWLEDGMENTS
Two plates from Galton, F. (1885a). Regression towards mediocrity in
hereditarystature. Journal of the Anthropological Instituteof Great Britainand
Ireland, 15, 246-263.Reprintedby permissionof the Royal Anthropological
Instituteof GreatBritain andIreland.
ProfessorW. H. Kruskal, Professor F. Mosteller,and theInternational Statistical
Institutefor permissionto reprint a quotationfrom Kruskal,W., & Mosteller,
F. (1979). Representative sampling, IV: The history of theconceptin statistics,
1895-1939.International StatisticalReview,47, 169-195.
Excerptsfrom Hogben,L. (1957). Statistical theory. London: Allen andUnwin;
Russell,B. (1931).Thescientific outlook. London: Allen andUnwin; Russell,
B. (1946). Historyof western philosophyand itsconnection withpolitical and
social circumstances from theearliest timesto thepresentday. London: Allen
and Unwin; von Mises, R. (1957). Probability, statisticsand truth. (Second
revised English Edition preparedby Hilda Geiringer) London: Allenand
Unwin. Reprintedby permissionof Unwin Hyman,the copyright holders.
Excerptsfrom Clark, R. W. (1971). Einstein,the life and times.New York:
World. Reprintedby permissionof Peters, Fraser andDunlop, Literary Agents.
Dr D. C. Yalden-Thomsonfor permissionto reprint a passagefrom Hume,D.
(1748). An enquiry concerning human understanding. (In D. C. Yalden-
Thomson(Ed.). (1951). Hume, Theory of Knowledge. Edinburgh: Thomas
Nelson).
Excerpts from various volumesof the Journal of theAmerican Statistical
Association. Reprinted by permissionof theBoardof Directorsof theAmerican
Statistical Association.
An excerpt reprintedfrom Probability, statistics,and data analysisby O.
Kempthorneand L.Folks, (c) 1971 Iowa StateUniversity Press, Ames, Iowa
50010.
Excerptsfrom De Moivre, A. (1756).Thedoctrineof chances:or, A methodof
calculating the probabilities of eventsin play. (3rd ed.),London:A. Millar.
Reprintedfrom the edition publishedby theChelseaPublishingCo.,New York,
(c) 1967with permissionand Kolmogorov,A. N. (1956). Foundationsof the
theory of probability. (N. Morrison, Trans.). Reprinted by permissionof the
Chelsea Publishing Co.
ACKNOWLEDGMENTS Xiii
An excerptreprinted with permission of Macmillan Publishing Companyfrom
An introduction to thestudy of experimental medicineby C. Bernard(H. C.
Greene,Trans.),(c) 1927 (Original work published in 1865)andfrom Science
and human behaviorby B. F. Skinner(c) 1953 by Macmillan Publishing
Company, renewed 1981 by B. F.Skinner.
Excerptsfrom Galton,F. (1908). Memoriesof mylife. Reprintedby permission
of Methuenand Co.
An excerptfrom Chang, W-C. (1976). Sampling theories andsamplingpractice.
In D. B. Owen(Ed.), On thehistoryof statisticsand probability (pp.299~315).
Reprintedby permissionof Marcel Dekker, Inc. New York.
An excerpt from Jacobs,J. (1885). Reviewof Ebbinghaus's Ueber das
Gedachtnis. Mind, 10, 454-459 and from Hacking, I. (1971). Jacques
Bernoulli'sArt of Conjecturing. British Journalfor thePhilosophyof Science,
22,209-229. Reprintedby permissionof Oxford University PressandProfessor
Ian Hacking.
Excerptsfrom Lovie, A. D. (1979). The analysisof variancein experimental
psychology: 1934-1945.British Journal of Mathematical and Statistical
Psychology,32, 151-178and Yule, G. U. (1921). Reviewof W. Brown and
G. H. Thomson, The essentialsof mental measurement. British Journal of
Psychology,2, 100-107andThomson,G. H. (1939)The factorial analysisof
humanability. I. The present positionand theproblems confronting us. British
Journal of Psychology,30, 105-108.Reprintedby permissionof theBritish
Psychological Society.
Excerptsfrom Edgeworth,F. Y. (1887). Observations andstatistics:An essay
on the theory of errors of observationand the firstprinciplesof statistics.
Transactions of the Cambridge Philosophical Society, 14, 138169 and
Neyman,J., & Pearson,E. S.(1933b).The testingof statistical hypotheses in
relationto probabilitiesa priori. Proceedingsof the Cambridge Philosophical
Society,29, 492-510.Reprintedby permissionof the Cambridge University
Press.
Excerptsfrom Cochrane,W. G. (1980). Fisher and theanalysisof variance.In
S. E. Fienberg,& D. V. Hinckley (Eds.).,R. A. Fisher: An Appreciation (pp.
17-34)andfrom Reid,C. (1982). Neyman -fromlife. Reprintedby permission
of Springer-Verlag,New York.
XJV ACKNOWLEDGMENTS
Excerptsandplatesfrom Philosophical Transactions of the Royal Societyand
Proceedingsof theRoyal Societyof London. Reprintedby permissionof the
Royal Society.
Excerptsfrom LaplaceP. S. de(1820).A philosophical essay on probabilities.
F. W. Truscott, & F. L. Emory, Trans.). Reprinted by permissionof Dover
Publications,New York.
Excerptsfrom Galton, F. (1889). Natural inheritance, Thomson, W. (Lord
Kelvin). (1891). Popular lecturesand addresses, andTodhunter,I. (1865). A
history of the mathematical theoryof probability from thetimeof Pascalto that
of Laplace. Reprintedby permissionof Macmillan andCo., London.
Excerptsfrom various volumesof theJournal of the Royal Statistical Society,
reprintedby permissionof theRoyal Statistical Society.
An excerptfrom Boring, E. G.(1957). Whenis humanbehavior predetermined?
TheScientific Monthly, 84, 189-196.Reprintedby permissionof theAmerican
Associationfor theAdvancement of Science.
Data obtainedfrom Rutherford, E., & Geiger, H. (1910). The probability
variations in the distribution of a particles. Philosophical Magazine, 20,
698-707. Material usedby permissionof Taylor and Francis, Publishers,
London.
Excerptsfrom various volumesof Nature reprintedby permissionof Macmillan
Magazines Ltd.
An excerpt from Forrest,D. W. (1974). Francis Galton:Thelife and work of a
Victorian genius.London: Elek. Reprinted by permissionof Grafton Books,a
division of theCollins PublishingGroup.
Excerptsfrom Fisher,R. A. (1935, 1966,8th ed.). Thedesignof experiments.
Edinburgh: OliverandBoyd. Reprintedby permissionof theLongman Group,
UK, Limited.
Excerptsfrom Pearson,E. S. (1966). TheNeyman-Pearson story: 1926-34.
Historical sidelights on an episodein Anglo-Polish collaboration.In F. N.
David (Ed.). Festschriftfor J. Neyman. London:Wiley. Reprinted by
permissionof JohnWiley andSons Ltd., Chichester.
ACKNOWLEDGMENTS XV
An excerptfrom Beloff, J. (1993) Parapsychology: a concise history. London:
Athlone Press.Reprinted with permission.
Excerptsfrom Thomson,G. H. (1946) Thefactorial analysisof human ability.
London: Universityof London Press. Reprinted by permissionof TheAthlone
Press.
An excerpt from Spearman,C. (1927) The abilities of man. New York:
Macmillan Company. Reprintedby permissionof Simon & Schuster, the
copyright holders.
An excerpt from Kelley, T. L. (1928) Crossroadsin themind of man: a study
of differentiable mental abilities. Reprinted with the permissionof Stanford
University Press.
An excerpt from Gould, S. J.(1981)Themismeasureof man. Reprinted with
the permissionof W. W. Norton & Company.
An excerptfrom Boring, E. G.(1957) Whenis human behavior predetermined?
Scientific Monthly, 84,189-196. And from Wilson, E. B. (1929) Reviewof The
abilities of man. Science,67, 244-248.Reprintedwith permissionfrom the
American Associationfor theAdvancement of Science.
An excerpt from Sidman, M. (1960/1988) Tacticsof scientific research:
Evaluating experimental data inpsychology.New York: Basic Books. Boston:
Authors Cooperative (reprinted). Reprinted with the permissionof Dr Sidman.
An excerptfrom Carroll, J. B.(1953)An analytical solutionfor approximating
simple structurein factor analysis. Psychometrika, 18, 23-38.Reprinted with
the permissionof Psychometrika.
Excerptsfrom Garrett H. E. & Zubin, J. (1943) The analysisof variancein
psychologicalresearch. Psychological Bulletin, 40, 233-267andfrom Grant,
D. A. (1944) On "The analysisof variancein psychological research."
Psychological Bulletin, 41, 158-166.Reprintedwith the permissionof the
American Psychological Association.
An excerpt from Wilson, E. B. (1929) Reviewof Crossroadsin the mind of
man. Journal of General Psychology,2, 153-169. Reprinted with the
permissionof the Helen Dwight Reid Educational Foundation. Published by
Heldref Publications, 1319 18th St. N.W. Washington, D.C. 20036-1802.
This page intentionally left blank
1
The Development
of Statistics
EVOLUTION, BIOMETRICS, AND EUGENICS
The central concernof thelife sciencesis thestudyof variation.To what extent
does this individual or groupof individualsdiffer from another? What are the
reasonsfor thevariability? Can thevariability be controlledor manipulated?
Do the similarities that exist springfrom a common root? What are theeffects
of the variation on thelife of theorganisms?Thesearequestionsaskedby
biologistsandpsychologists alike.
The life-science disciplines aredefinedby thedifferent emphases placed on
observed variation, by thenatureof theparticular variablesof interest,and by
the waysin which the different variables contributeto thelife andbehaviorof
the subject matter. Change and diversity in nature reston an organizing
principle, the formulationof which hasbeen saidto be thesingle mostinfluential
scientific achievement of the 19th century:the theoryof evolutionby meansof
natural selection.The explicationof thetheoryis attributed, rightly,to Charles
Darwin (1809-1882).His book TheOrigin of Specieswas publishedin 1859,
but a numberof other scientistshadwritten on theprinciple,in whole or in part,
and thesemenwereacknowledgedby Darwin in later editionsof his work.
Natural selectionis possible because there is variationin living matter.The
strugglefor survival withinandacrossspeciesthen ruthlessly favors the indi-
viduals that possessa combinationof traits and characters, behavioral and
physical,that allows themto copewith the total environment, exist, survive,
and reproduce.
Not all sourcesof variability arebiological. Many organisms to agreateror
1
2 1. THE DEVELOPMENT OF STATISTICS
lesser extent reshape their environment, their experience, andtherefore their
behavior through learning. In human beings this reshapingof theenvironment
hasreachedits most sophisticatedform in what hascometo becalled cultural
evolution.A fundamental feature of thehuman condition,of human nature, is
our ability to processa very great deal of information. Human beings have
originality and creative powers that continually expand the boundariesof
knowledge. And, perhaps most important of all, our language skills, verbal and
written, allowfor the accumulationof knowledgeand itstransmissionfrom
generationto generation. The rich diversityof human civilization stemsfrom
cultural, aswell asgenetic, diversity.
Curiosity about diversityandvariability leadsto attemptsto classify and to
measure.The orderingof diversity and theassessment of variation have spurred
the developmentof measurement in thebiological andsocial sciences, and the
applicationof statisticsis onestrategyfor handlingthe numericaldataobtained.
As sciencehas progressed,it has become increasingly concerned with
quantification as ameansof describing events.It is felt that preciseand
economical descriptions of eventsand therelationships among themarebest
achievedby measurement. Measurement is thelink between mathematics and
science,and theapparent(at anyrateto mathematicians!) clarityandorderof
mathematicsfosterthe scientist's urgeto measure.The central importanceof
measurement wasvigorously expoundedby Francis Galton(1822-1911):"Un-
til the phenomenaof anybranchof knowledge have been submitted to meas-
urementandnumberit cannot assume the statusanddignity of a Science."
Thesewords formed partof the letterheadof the Departmentof Applied
Statisticsof University College, London,an institution that received much
intellectualandfinancial supportfrom Galton. And it is with Galton,who first
formulatedthe methodof correlation, thatthe commonstatisticalprocedures
of modern social science began.
The natureof variation and thenatureof inheritancein organisms were
much-discussedand much-confused topicsin the second halfof the 19th
century. Galtonwasconcernedto makethe studyof heredity mathematical and
to bring orderinto the chaos.
FrancisGalton was Charles Darwin's cousin. Galton's mother was the
daughterof Erasmus Darwin (1731-1802) by his secondwife, andDarwin's
fatherwasErasmus'sson by hisfirst. Darwin, who was 13yearsGalton'ssenior,
had returned homefrom a 5-year voyageas thenaturaliston board H.M.S.
Beagle(an Admiralty expeditionary ship) in 1836and by1838had conceived
of the principle of natural selectionto accountfor someof theobservationshe
hadmadeon theexpedition.The careersandpersonalitiesof GaltonandDarwin
were quitedifferent. Darwin painstakingly marshaled evidence and single-
mindedly buttressedhis theory, but remaineddiffident aboutit, apparently
EVOLUTION, BIOMETRICS AND EUGENICS 3
uncertainof its acceptance. In fact, it was only the inevitability of the an-
nouncementof the independent discovery of the principle by Alfred Russell
Wallace (1823~1913) that forced Darwin to publish, some20 yearsafter he had
formed the idea. Gallon,on theother hand, though a staidandformal Victorian,
was notwithout vanity, enjoyingthe fame andrecognition brought to him by
his many publicationson abewildering varietyof topics. The steady streamof
lectures, papersandbooks continued unabated from 1850until shortly before
his death.
The notion of correlated variationwas discussedby the newbiologists.
Darwin observesin TheOrigin of Species:
Many laws regulate variation, some few of which can bedimly seen,...I will
here only alludeto what may becalled correlated variation. Important changes
in the embryoor larva will probably entail changes in themature animal. . .
Breeders believe that long limbs arealmost always accompanied by anelongated
head.. .catswhich areentirely whiteandhave blue eyesaregenerallydeaf...
it appears that white sheep and pigs are injured by certain plants whilst
dark-coloured individuals escape ... (Darwin, 1859/1958,p. 34)
Of course,at this time,the hereditary mechanismwasunknown, and, partly
in an attemptto elucidateit, Galtonbegan,in the mid-1870s,to breed sweet
peas.
1
The resultsof his studyof thesizeof sweetpeaseeds over two generations
were publishedin 1877. Whena fixedsizeof parentseedwascompared with
the mean sizeof theoffspring seeds, Galton observed the tendency that he called
thenreversionandlater regressionto themean. The meanoffspring sizeis not
asextremeas theparental size. Large parent seeds of a particular size produce
seeds that have a mean size that is larger than average, but not aslargeas the
parentsize. The offspring of small parentseedsof a fixedsize havea mean size
that is smaller than average but nowthis mean sizeis not assmall asthat of the
fixed parent size. This phenomenon is discussed later in more detail.For the
moment,suffice it to saythat it is anarithmetical artifact arisingfrom the fact
that offspring sizesdo notmatch parental sizes absolutely uniformly. In other
words, the correlation is imperfect.
Galton misinterpreted this statistical phenomenon as areal trend towarda
reductionin population variability. Paradoxically, however, it led to theforma-
tion of theBiometric Schoolof heredityandthus encouragedthe development
of a great many statistical methods.
1
Mendel had already carriedout hiswork with ediblepeasandthus begunthe scienceof
genetics. The resultsof his work were publishedin a rather obscure journal in 1866and thewider
scientific world remained obliviousof themuntil 1900.
4 1. THE DEVELOPMENT OF STATISTICS
Over the next several years Galton collected data on inherited human
characteristicsby the simple expedientof offering cashprizes for family
records. From these data he arrivedat theregression linesfor hereditary stature.
Figures showing these lines areshownin Chapter10.
A common themein Galton's work, and later that of Karl Pearson
(1857-1936),was aparticular social philosophy. Ronald Fisher (1890-1962)
alsosubscribedto it, although, it mustbe admitted, it wasnot, assuch,a direct
influenceon hiswork. Thesethreemen are thefoundersof what are nowcalled
classical statisticsand allwere eugenists. They believed that the most relevant
andimportant variablesin humanaffairs areinherited. One'sancestorsrather
than one'senvironmental experiences are theoverriding determinants of intel-
lectual capacityand personalityaswell asphysical attributes. Human well-
being, human personality, indeed human society, could therefore, they argued,
be improvedby encouragingthe most ableto have more children than the least
able. MacKenzie (1981) andCowan(1972,1977)have argued that much of the
early work in statisticsand thecontroversies that arose among biologists and
statisticians reflect the commitmentof thefoundersof biometry, Pearson being
the leader,to theeugenics movement.
In 1884, Galtonfinanced andoperatedan anthropometric laboratory at the
International Health Exhibition.For achargeof threepence, members of the
public were measured. Visual andauditory acuity, weight, height, limb span,
strength,and anumberof other variables were recorded. Over 9,000 data sets
were obtained, and, at thecloseof theexhibition,the equipment wastransferred
to the South Kensington Museum where datacollection continued. Francis
Galtonwas anavid measurer.
Karl Pearson(1930)relates that Galton's first forays intothe problemof
correlation involved ranking techniques, although he wasaware that ranking
methods couldbe cumbersome. How could onecomparedifferent measuresof
anthropometric variables?In a flash of illumination, Galton realized that
characteristics measured on scales basedon their own variability (we would
now saystandard score units) could be directly compared. This inspiration is
certainly one of themost importantin theearly yearsof statistics.He recalls
the occasionin Memoriesof my Life, publishedin 1908:
As these linesarebeing written,the circumstances under which I first
clearly graspedthe important generalisation that the laws of hereditywere solely
concerned with deviations expressed in statistical unitsarevividly recalled to
my memory. It was in thegroundsof Naworth Castle, where an invitation had
been givento ramblefreely. A temporary shower drove me toseekrefugein a
reddish recessin therock by thesideof thepathway. Therethe idea flashed
5 EVOLUTION, BIOMETRICS AND EUGENICS
acrossme and Iforgot everythingelsefor a momentin my greatdelight. (Galton,
1908, p. 300)
2
This incident apparently took place in 1888,andbeforethe year wasout,
Co-relationsand Their Measurement Chiefly From Anthropometric Datahad
been presentedto theRoyal Society.In this paper Galton defines co-relation:
"Two variable organsaresaid to beco-relatedwhenthe variation of one is
accompaniedon theaverageby moreor less variationof theother,and in the
samedirection" (Gallon, 1888,p. 135).
The last five words of the quotation indicate that the notion of negative
correlationhad notthen been conceived, but this briefbut important paper shows
that Galtonfully understoodthe importanceof his statistical approach. Shortly
thereafter, mathematicians entered the picture with encouragement from some,
but by nomeansall, biologists.
Much of the basic mathematics of correlationhad,in fact, already been
developedby thetime of Gallon's paper, but theutility of theprocedure itself
in this contexlhadapparently eluded everyone. It wasKarl Pearson, Gallon's
discipleandbiographer,who, in 1896,set theconcepton asound mathematical
foundationandpresented statistics with the solutionto theproblemof repre-
senting covariationby meansof a numerical index, thecoefficient of correla-
tion.
From thesebeginnings springthe whole corpusof present-daystatistical
techniques. George Udny Yule (1871~1951), aninfluential statisticianwho was
not aeugenist,andPearson himself elaborated the conceptsof multiple and
partial correlation.The general psychologyof individual differencesand re-
searchinto the structureof human abilitiesandintelligence relied heavilyon
correlationaltechniques. Thefirsl third of the20th centurysaw theintroduction
of factor analysis throughthe work of Charles Spearman(1863-1945),Sir
Godfrey Thomson(1881-1955),Sir Cyril Burt (1883-1971),and Louis L.
Thurstone(1887-1955).
A further prolific andfundamentallyimportant streamof development arises
from the work of Sir Ronald Fisher.The techniqueof analysisof variancewas
developed directlyfrom the methodof intra-class correlation- anindex of the
extentto which measurements in thesame category or family arerelated, relative
to other categoriesor families.
2
Karl Pearson (1914 -1930)in thevolume publishedin 1924, suggested that this spot deserves
a commemorative plaque. Unfortunately, it looks asthoughtheinspirationcanneverbe somarked,
for Kenna (1973), investigating the episode, reports that: "In the groundsof Naworth Castle there
arenot anyrocks,reddishor otherwise, which could provide a recess,..." (p. 229),and hesuggests
that the locationof theincident might have been Corby Castle.
6 1. THE DEVELOPMENT OF STATISTICS
Fisher studied mathematicsat Cambridgebut also pursued interests in
biology andgenetics.In 1913he spentthe summer workingon afarm in Canada.
He workedfor a while with a City investment company andthenfound himself
declaredunfit for military service because of his extremely poor eyesight. He
turnedto school-teachingfor which he had notalentandwhich he hated. In
1919he had theopportunityof a postat University Collegewith Karl Pearson,
then headof theDepartmentof Applied Statistics,but choseinsteadto develop
a statistical laboratoryat theRothamsted Experimental Station near Harpenden
in England, wherehe developed experimental methods for agricultural research.
Over the next several years, relations between Pearson andFisher became
increasingly strained. They clashedon a variety of issues. Someof their
disagreements helped, andsome hindered, the developmentof statistics.Had
they beencollaboratorsand friends, rather than adversaries and enemies,
statisticsmight havehad aquite different history. In 1933 Fisher became Galton
Professorof Eugenicsat University Collegeand in1943 movedto Cambridge,
wherehe wasProfessorof Genetics. Analysisof variance, whichhas hadsuch
far-reachingeffects on experimentationin thebehavioral sciences, wasdevel-
oped through attempts to tackle problems posed at Rothamsted.
It may befairly said thatthe majority of textson methodologyand statistics
in the social sciences are theoffspring (diversity andselection notwithstanding!)
of Fisher's books, Statistical Methodsfor ResearchWorkers
3
first publishedin
1925(a), and TheDesignof Experimentsfirst publishedin 1935(a).
In succeeding chapters thesestatistical conceptsareexaminedin more detail
and their development elaborated, but first the use of theterm statisticsis
exploreda little further.
THE DEFINITION OF STATISTICS
In an everyday sense when we think of statisticswe think of factsand figures,
of numerical descriptionsof political andeconomic states (from which the word
is derived),and ofinventoriesof thevariousaspectsof our social organization.
The history of statistical proceduresin this sensegoesbackto thebeginnings
of human civilization. Whentradeandcommerce began, when governments
imposed taxes, numerical records were kept. The countingof people, goods,
andchattelswasregularlycarriedout in theRoman Empire, the Domesday Book
attempted to describethe stateof Englandfor theNorman conquerors, and
government agenciesthe world over expenda great dealof money and
3
MauriceKendall (1963)saysof this work, "It is not aneasy book. Somebody once said
that no student should attempt to readit unlesshe hadreadit before" (p. 2).
PROBABILITY 7
energyin collectingandtabulating suchinformation in thepresent day. Statis-
tics areusedto describeandsummarize, in numerical terms, a wide varietyof
situations.
But thereis anothermore recently-developed activity subsumed under the
term statistics:the practiceof not only collectingandcollating numerical facts,
but alsothe processof reasoningabout them. Going beyond the data,making
inferencesanddrawing conclusions with greateror lesserdegreesof certainty
in an orderly andconsistentfashion is the aim ofmodern applied statistics. In
this sense statistical reasoning did not beginuntil fairly late in the 17th century
andthen onlyin a quitelimited way. The sophisticated models now employed,
backedby theoretical formulations that areoften complex,are allless than100
years old. Westergaard (1932) points to theconfusionsthat sometimes arise
becausethe word statisticsis usedto signify both collectionsof measurements
and reasoning about them, and that in former timesit referred merelyto
descriptionsof statesin both numerical andnon-numerical terms.
In adoptingthe statisticalinferential strategythe experimentalist in the life
sciencesis acceptingthe intrinsic variabilityof thesubject matter.In recogniz-
ing a rangeof possibilities,the scientist comesfour-squareagainstthe problem
of deciding whetheror not theparticular set of observationshe or she has
collectedcanreasonablybe expectedto reflect the characteristicsof thetotal
range. Thisis the problemof parameter estimation, the task of estimating
population values (parameters) from a considerationof themeasurements made
on a particular population subset - thesample statistics. A second taskfor
inferential statisticsis hypothesis testing, the processof judging whetheror not
a particular statistical outcome is likely or unlikely to be due tochance. The
statistical inferential strategy depends on aknowledgeof probabilities.
This aspectof statisticshasgrown out of three activities that, at first glance,
appearto bequite different but in fact have somecloselinks. They areactuarial
prediction, gambling,and error assessment. Each addresses the problemsof
making decisions,evaluating outcomes, andtesting predictionsin the faceof
uncertainty,andeachhascontributedto thedevelopment of probability theory.
PROBABILITY
Statistical operations areoften thoughtof aspractical applications of previously
developed probability theory. The fact is, however, that almost all our present-
day statistical techniques have arisen from attemptsto answerreal-life problems
of prediction and error assessment, and theoretical developments have not
always paralleled technical accomplishments. Box (1984) hasreviewedthe
scientific context of a rangeof statistical advances andshown thatthe funda-
mental methodsevolvedfrom the work of practisingscientists.
8 1. THE DEVELOPMENT OF STATISTICS
JohnGraunt,a London haberdasher, born in 1620,is credited withthe first
attemptto predict andexplaina numberof social phenomena from a considera-
tion of actuarialtables.He compiledhis tablesfrom Bills of Mortality, the parish
accountsof deaths that were regularly, if somewhat crudely, recorded from the
beginningof the 17th century.
Grauntrecognizesthat the question mightbe asked:"To what purpose tends
all this laborious buzzling, andgroping?To know, 1. thenumberof the People?
2. How many Males,andFemales?3. How many Married,andsingle?"(Graunt,
1662/1975,p. 77), andsays:"To this 1 might answerin generalby saying, that
those,who cannot apprehend the reasonof these Enquiries, areunfit to trouble
themselvesto askthem." (p. 77).
Graunt reassured readers of this quite remarkable work:
The Lunaticksarealsobut few, viz. \ 58 in 229250thoughI fear manymorethanare
setdown in our Bills ...
So that, this Casualty beingso uncertain,1 shall not force my self to makeany
inference fromthe numbers,andproportions we finde in ourBills concerning it:
onely I dareensureany man at this present,well in his Wits, for one in thethousand,
that heshall not die aLunatick in Bedlam,within thesesevenyears,becauseI finde
not aboveone inabout onethousandfive hundredhavedoneso.(pp. 35-36)
Here is aninference basedon numerical dataandcouchedin termsnot so
very far removedfrom thosein reportsin themodern literature. Graunt's work
wasimmediatelyrecognizedasbeingof great importance, and theKing himself
(CharlesII) supportedhis electionto the recently incorporated Royal Society.
A few years earlierthe seedsof modern probability theory were being sown
in France.
4
At this time gamblingwas apopular habitin fashionable society and
a rangeof gamesof chancewasbeing played. For experienced players the
oddsapplicableto various situations must have been appreciated, but noformal
methods for calculating the chancesof various outcomeswere available.
Antoine Gombauld,the Chevalierde Mere, a "man-about-town" andgambler
with a scientific andmathematical turnof mind, consultedhis friend, Blaise
Pascal (1623-1662),a philosopher, scientist, andmathematician, hoping that
4
But note that therearehints of probability conceptsin mathematics going back at leastas
far as the12th centuryandthat Girolamo Cardano wrote Liber deLudo Aleae,(The Bookon Games
of Chance)a century beforeit waspublishedin 1663 (see Ore, 1953). There is alsono doubt that
quite early in human civilization, therewas anappreciationof long-run relative frequencies,
randomness, anddegreesof likelihood in gaming,andsome quiteformal conceptsare to befound
in GreekandRoman writings.
9 PROBABILITY
he would be able to resolve questionson calculationof expected(probable)
frequencyof gainsandlosses,aswell as on thefair division of the stakesin
games that were interrupted. Consideration of these questions led tocorrespon-
dence between Pascal and hisfellow mathematician Pierre Fermat (1601 -1665).
No doubt their advice aided de Mere's game.
5
More significantly, it was from
this exchange that some of thefoundationsof probability theoryandcombina-
torial algebra were laid.
ChristianHuygens(1629-1695)published,in 1657,a tract On Reasoning
With Gamesof Dice (1657/1970), which waspartly basedon thePascal-Fermat
correspondence, and in1713, Jacques Bernoulli's (1654-1705)book The Art of
Conjecturedevelopeda theoryof gamesof chance.
Pascalhadconnectedthe studyof probability with the arithmetic triangle
(Fig. 1.1), for which he discoverednew properties, although the trianglewas
known in China at least five hundred years earlier. Proofs of the triangle's
properties were obtained by mathematical induction or reasoningby recurrence.
FIG. 1.1 Pascal's Triangle
Poisson(1781-1840),writing of this episodein 1837 says,"A problem concerning games
of chance, proposedby a man of theworld to anaustere Jansenist, was theorigin of the calculus
of probabilities" (quotedby Struik, 1954,p. 145). De Mere wascertainly "a man of theworld"
andPascaldid become austere andreligious, but at thetime of deMere'squestionsPascalwas in
his so-called "worldlyperiod" (1652-1654).I am indebtedto my father-in-law,the late Professor
F.T.H. Fletcher,for many insights intothelife of Pascal.
5
10 1. THE DEVELOPMENT OF STATISTICS
Pascal'striangle, as it isknown in theWest,is atabulationof thebinomial
coefficientsthat may beobtainedfrom the expansionof (P + Q)
n
whereP = Q
= I The expansionwasdevelopedby Sir Isaac Newton(1642-1727),and,
independently, by theScottishmathematician, James Gregory (1638-1675).
Gregory discoveredthe rule about 1670. Newton communicated it to theRoyal
Societyin 1676, although later that year he explained that he had firstformulated
it in 1664 whilehe was aCambridge undergraduate. The example shownin Fig.
1.1 demonstrates that theexpansionof (~ + | )
4
generates, in thenumerators
of the expression, the numbersin the fifth row ofPascal'striangle. The terms
of this expression also give us the fiveexpected frequencies of outcome(0, 1,
2,3, or 4 heads)or improbabilitieswhena fair coin is tossedfour times. Simple
experiment will demonstrate that the actual outcomesin the"real world"of coin
tossing closely approximate the distribution thathasbeen calculatedfrom a
mathematicalabstraction.
During the 18th centurythe theory of probability attractedthe interestof
many brilliant minds. Among themwas afriend and admirer of Newton,
AbrahamDe Moivre (1667-1754).De Moivre, a French Huguenot, was in-
ternedin 1685after the revocationby Louis XIV of the Edict of Nantes,an edict
which hadguaranteedtolerationto French Protestants. He wasreleasedin 1688,
fled to England,and spentthe remainderof his life in London. De Moivre
published what might be describedas agambler's manual, entitled TheDoctrine
of Chancesor a Methodof Calculatingthe Probabilitiesof Eventsin Play. In
the second editionof this work,publishedin 1738,and in arevised third edition
published posthumouslyin 1756, De Moivre (1756/1967) demonstrated a
method, whichhe had firstdevisedin 1733,of approximatingthe sum of avery
large number of binomial terms whenn in (P + Q)" is very large(an immensely
laborious computationfrom the basic expansion).
It may beappreciated that as ngrows larger,the numberof termsin the
expansionalsogrows larger. The graphof thedistribution beginsto resemble
a smooth curve (Fig. 1.2), a bell-shaped symmetrical distributionthat held great
interestin mathematical terms but little practical utility outsideof gaming.
It is safeto saythat no other theoretical mathematical abstraction has had
such an important influence on psychologyand thesocial sciencesas that
bell-shaped curvenow commonly knownby thename that Karl Pearson decided
on- thenormal distribution-a\thougl\he was not the first to use the term. Pierre
Laplace(1749-1827)independently derivedthefunction andbroughttogether
muchof theearlier workon probability in Theorie AnalytiquedesProbabilites,
publishedin 1812.It was hiswork, aswell ascontributionsby many others, that
interpretedthe curveas the Lawof Error andshowed thatit could be appliedto
variableresults obtainedin multiple observations. One of the firstapplications
of the distribution outsideof gamingwas in theassessmentof errors in
FIG. 1.2 The Binomial Distribution for N = 12
and the Normal Distribution
11
12 1. THE DEVELOPMENT OF STATISTICS
astronomical observations. Later the utility of the"law" in error assessment was
extendedto land surveyingandevento range estimation problems in artillery
fire. Indeed, between 1800 and 1820the foundationsof thetheory of error
distribution were laid.
Carl Friedrich Gauss (1777-1855),perhapsthe greatest mathematician of all
time, also made important contributions to work in this area. He was a
consultantto thegovernmentsof Hanoverand ofDenmark when they undertook
geodeticsurveys. The function that helpedto rationalizethe combinationof
observationsis sometimes calledthe Laplace-Gaussian distribution.
Following the work of LaplaceandGauss,the developmentof mathematical
probability theory slowed somewhat and not agreat dealof progresswasmade
until the present century.But it was during the 19th century, throughthe
developmentof life insurance companies andthroughthe growth of statistical
approachesin the social and biological sciences, that the applications of
probability theory burgeoned. Augustus De Morgan (1806-1871), for example,
attemptedto reducethe constructsof probability to straightforward rulesof
thumb. His work An Essayon Probabilities and onTheir Applicationto Life
Contingenciesand InsuranceOffices, publishedin 1838, is full of practical
adviceand iscommentedon byWalker (1929).
THE NORMAL DISTRIBUTION
The normal distributionwas sonamed because many biological variables when
measuredin large groupsof individuals,andplottedasfrequency distributions,
do show close approximations to thecurve. It is partly for this reason that the
mathematicsof thedistributionareusedin data assessment in thesocial sciences
and inbiology. The responsibility,aswell as thecredit,for this extensionof the
use ofcalculations designed to estimate error or gambling expectancies into the
examinationof human characteristics rests with Lambert Adolphe Quetelet
(1796-1874),a Belgian astronomer.
In 1835Queteletdescribedhis conceptof the averageman- / 'homme moyen.
L'homme moyenis Nature's ideal,an ideal that corresponds with a middle,
measured value. But Nature makeserrors,and in, as it were,missingthe target,
producesthe variability observedin human traitsandphysicalcharacters.More
importantly, the extent andfrequencyof these errorsoften conformto the law
of frequencyof error-thenormal distribution.
JohnVenn (1834-1933), the English logician, objectedto the use of the word
error in this context:"When Nature presents us with a groupof objectsof every
kind, it is using rathera bold metaphorto speakin this case alsoof a law of
error" (Venn, 1888,p. 42), but theanalogywasattractiveto some.
Quetelet examined the distributionof themeasurements of thechest girths
13 THE NORMAL DISTRIBUTION
of 5,738 Scottish soldiers, these data having been extracted from the 13th
volumeof theEdinburgh MedicalJournal. There is nodoubt thatthe measure-
ments closely approximate to anormal curve.In another attempt to exemplify
the law, Quetelet examined the heightsof 100,000 French conscripts. Here he
noticeda discrepancy between observed andpredicted values:
The official documents would make it appearthat,of the 100,000men,28,620
are ofless height than5 feet 2 inches: calculation gives only 26,345. Is it not a
fair presumption, that the 2,275 men whoconstitutethe difference of these
numbershave beenfraudulently rejected?We canreadily understand that it is
an easy matterto reduceone'sheighta half-inch,or aninch, whenso greatan
interestis atstakeasthat of being rejected. (Quetelet, 1835/1849, p. 98)
Whether or not theallegation stated here - that short (butnot tooshort)
Frenchmen havestoopedso low as toavoid military service- is true is no
longeran issue. A more important point is notedby Boring (1920):
While admittingthe dependenceof the law onexperience, Quetelet proceeds in
numerouscasesto analyze experienceby meansof it. Sucha double-edged
sword is a peculiarly effective weapon,and it is nowonder that subsequent
investigators were tempted to use it inspiteof thenecessary rules of scientific
warfare. (Boring, 1920,p. 11)
The use of thenormal curvein statisticsis not, however, based solely on the
fact that it can beusedto describethe frequencydistributionof many observed
characteristics.It has amuch morefundamental significancein inferential
statistics,aswill be seen,and thedistributionand itsproperties appear in many
partsof this book.
Galton first became awareof the distribution from his friend William
Spottiswoode,who in 1862 became Secretary of the Royal Geographical
Society,but it was thework of Quetelet that greatly impressed him. Many of
the data setshe collected approximated to the law and heseemed,on occasion,
to bealmost mystically impressed with it.
I know of scarcely anythingso apt toimpressthe imaginationas thewonderful
form of cosmic order expressed by the"Law of Frequencyof Error." The law
would havebeenpersonifiedby theGreeksanddeified,if they hadknown of it.
It reigns with serenityand in completeself-effacement amidst the wildest
confusion. The hugerthe mob and thegreaterthe apparent anarchy, the more
perfect is its sway. It is thesupremelaw of Unreason. Whenever a large sample
of chaotic elementsaretaken in handand marshalledin theorderof their
magnitude,an unsuspectedandmostbeautiful form of regularityprovesto have
been latent all along. (Galton, 1889, p. 66)
14 1. THE DEVELOPMENT OF STATISTICS
This rathertheologicalattitude towardthe distribution echoesDe Moivre,
who, overa century before, proclaimed in TheDoctrineof Chances:
Altho' chance produces irregularities, still the Oddswill beinfinitely great, that
in the processof Time, thoseirregularities will bearno proportion to the
recurrencyof that Order which naturally results from ORIGINAL DESIGN...
SuchLaws, aswell as theoriginal DesignandPurposeof their Establishment,
must all be fromwi thout... if we blind not ourselves with metaphysical dust,
we shall beled, by ashortandobvious way,to theacknowledgement of thegreat
MAKER andGOVENOURof all; Himself all-wise, all-powerful andgood.(De
Moivre, 1756/1967p. 251-252)
The ready acceptance of thenormal distributionas a law of nature encouraged
its wide applicationand also produced consternation when exceptions were
observed. Quetelet himself admitted the possibility of theexistenceof asym-
metric distributions,andGaltonwas attimes less lyrical, for critics hadobjected
to the use of thedistribution,not as apractical toolto beused with caution where
it seemedappropriate, but as asort of divine rule:
It hasbeenobjectedto someof my former work,especiallyin HereditaryGenius,
that I pushedthe applicationof the Law ofFrequencyof Error somewhattoo
far.
I may have doneso, ratherby incautious phrases than in reality; ... I am
satisfiedto claim the Normal Law is afair averagerepresentationof theobserved
Curves during nine-tenths of their course;...(Galton, 1889,p. 56)
6
BIOMETRICS
In 1890, WalterF. R. Weldon(1860-1906)was appointedto the Chair of
Zoology at University College, London.He wasgreatly impressedandmuch
influencedby Gallon's Natural Inheritance.Not only did thebook showhim
how thefrequencyof thedeviationsfrom a "type" might be measured, it opened
up for him, and forotherzoologists,a hostof biometric problems.In two papers
publishedin 1890 and 1892, Weldon showed that various measurements on
shrimps mightbe assessed usingthe normal distribution.He also demonstrated
interrelationships (correlations) between two variableswithin the individuals.
But the critical factorin Weldon's contributionto thedevelopment of statistics
was hisprofessorial appointment, for this broughthim into contact with Karl
Pearson, then Professor of Applied MathematicsandMechanics,a post Pearson
had held since 1884. Weldonwas attempting to remedy his weaknessin
6
Note thatthis quotationand thepreviousonefrom Galtonare 10pagesapartin thesame
work!
BIOMETRICS 15
mathematicsso that he could extendhis research,and heapproached Pearson
for help. His enthusiasmfor thebiometric approach drew Pearson away from
more orthodox work.
A second important link waswith Galton,who hadreviewed Weldon'sfirst
paperon variation in shrimps. Galtonsupportedandencouragedthe work of
thesetwo youngermen until his death, and, under the termsof his will, left
45,000to endowa Chair of Eugenicsat theUniversity of London, together
with the wish thatthe post mightbeofferedfirst to Karl Pearson.Theoffer was
madeand accepted.
In 1904, Galtonhadoffered the University of London 500to establishthe
study of national eugenics.Pearsonwas amemberof theCommittee thatthe
University set up, and the outcomewas adecisionto appointthe Gallon Research
Fellow at what was to benamedthe Eugenics RecordOffice. This becamethe
Galton Laboratory for National Eugenicsin 1906, and yet more financial
assistancewasprovidedby Gallon. Pearson, still Professorof Applied Mathe-
matics,was itsDirector aswell asHeadof theBiometrics Laboratory. This latter
received muchof its funding over many yearsfrom grantsfrom the Worshipful
Companyof Drapers, whichfirst gave moneyto theUniversity in 1903.
Pearson's appointment to theGallon Chair brought applied statistics,bio-
melrics, and eugenics together under his direction at University College.II
cannot however be claimedabsolutelythat the day-to-day workof these units
wasdriven by acommon theme. Applied statistics andbiometrics were primar-
ily concernedwith the development andapplicationof statisticaltechniques to
a variety of problems,including anthropometric investigations; the Eugenics
Laboratory collected extensive family pedigreesandexamined actuarial death
rates. Of course Pearson coordinated all thework, andtherewas interchange
andexchange among the staff that workedwith him, but Magnello (1998,1999)
hasargued that therewas not asingleunifying purposein Pearson'sresearch.
Others, notably MacKenzie (1981), Kevles (1985), and Porter (1986), have
promotedthe view that eugenics was thedriving force behindPearson'sstatis-
tical endeavors.
Pearsonwas not aformal memberof theeugenics movementHe did not
join the Eugenics Education Society, andapparentlyhe tried to keepthe two
laboratories administratively separate, maintaining separate financial accounts,
for example,but it has to berecognized that his personal viewsof thehuman
condition and itsfuture includedthe conviction that eugenics was ofcritical
importance.Therewas anobviousand persistent interminglingof statistical
resultsandeugenicsin his pronouncements. For example, in his Huxley Lecture
in 1903 (publishedin Biometrikain 1903and 1904),on topicsthat were clearly
biometric,havingto dowith his researcheson relationships between moral and
16 1. THE DEVELOPMENT OF STATISTICS
intellectual variables, he ended witha plea,if not a rallying cry, for eugenics:
The mentally better stockin thenationis notreproducing itselfat thesame rateas it
did of old; the less able, and theless energetic, aremore fertile thanthe better stocks.
... Theonly remedy,if one bepossibleat all, is to alter the relativefertility of the
good and the badstocksin thecommunity.. . . intelligencecan beaidedand be
trained,but notraining or educationcancreateit. You must breedit, that is thebroad
result for statecraft whichflows from the equalityin inheritanceof thepsychicaland
the physical characters in man. (Pearson, 1904a, pp. 179-180).
Pearson'scontributionwasmonumental, for in less than8 years, between
1893 and 1901, he published over30 paperson statisticalmethods. The first
waswritten as aresultof Weldon's discovery that the distributionof one set of
measurementsof thecharacteristicsof crabs, collectedat thezoological station
at Naplesin 1892,was"double-humped." The distributionwasreducedto the
sum of two normal curves. Pearson (1894)proceededto investigatethe general
problemof fitting observed distributions to theoretical curves. This work was
to lead directlyto theformulationof the x
2
testof "goodnessof fit" in 1900,one
of the most important developments in thehistory of statistics.
Weldon approachedthe problemof discrepancies between theory and
observationin a much more empirical way, tossing coins anddice andcompar-
ing the outcomes withthe binomial model. These data helped to produce
another lineof development.
In a letter to Galton, writtenin 1894, Weldon asksfor a commenton the
resultsof 7,000tossingsof 12 dice collectedfor him by aclerk at University
College:
A day or two agoPearsonwantedsomerecordsof thekind in a hurry, in order
to illustratea lecture,and Igavehim therecordof theclerk's7000tosses... on
examinationhe rejects them, because he thinks the deviationfrom the theoreti-
cally most probable resultis so great as to make the record intrinsically
incredible, (quotedby E. S.Pearson,1965, p. 11)
This incidentset off agood dealof correspondence between Karl Pearson,
F.Y. Edgeworth(1845-1926),an economist andstatistician, andWeldon, the
details of which are nowonly of minor importance. But, as Karl Pearson
remarked, "Probabilitiesare very slippery things" (quotedby E. S.Pearson,
1965, p. 14), and thesearchfor criteria by which to assessthe differences
between observed andtheoretical frequencies, andwhetheror notthey couldbe
reasonably attributedto chance sampling fluctuations, began. Statistical re-
search rapidly expanded into careful examinationof distributions other than
the normal curveandeventually intothe propertiesof sampling distributions,
BIOMETRICS 17
particularly throughthe seminal work of Ronald Fisher.
In developinghis researchinto the propertiesof theprobability distributions
of statistics, Fisher investigated the basisof hypothesis testingand thefounda-
tions of all thewell-known testsof statistical significance. Fisher'sassertion
that p = .05 (1 in 20) is theprobability thatis convenientfor judging whetheror
not a deviationis to beconsidered significant (i.e. unlikely to be due tochance),
hasprofoundly affected research in the social sciences,althoughit shouldbe
noted thathe was not theoriginatorof theconvention (Cowles& Davis, 1982a).
Of course,the developmentof statistical methods doesnot endhere,nor have
all the threads been drawn together. Discussion of theimportant contribution
of W. S. Gosset("Student,"1876-1937)to small sample workand therefine-
ments introduced into hypothesis testing by Karl Pearson'sson, EgonS.
Pearson(1895-1980)andJerzy Neyman(1899-1981)will be found in later
chapters, whenthe earlier details have been elaborated.
Biometrics and Genetics
The early yearsof the biometric school were surrounded by controversy.
Pearsonand Weldon heldfast to the view that evolution took placeby the
continuous selectionsof variations that were favorable to organismsin their
environment.The rediscoveryof Mendel's workin 1900 supportedthe concept
that heredity depends on self-reproducing particles (what we nowcall genes),
andthat inherited variationis discontinuousandsaltatory. The sourceof the
developmentof higher typeswas occasional genetic jumps or mutations.
Curiously enough, thiswas theview of evolution that Galtonhad supported.
His misinterpretationof thepurely statistical phenomenon of regressionled him
to thenotion thata distinctionhad to bemade between variations from the mean
that regressandwhat he called"sports"(a breeder'sterm for ananimalor plant
variety thatappearsapparently spontaneously) that will not.
A championof theposition that mutations were of critical importancein the
evolutionaryprocesswasWilliam Bateson(1861-1926)and aprolongedand
bitter argument withthe biometricians ensued. The Evolution Committeeof the
Royal Society broke down over the dispute. Biometrikawasfoundedby Pearson
and Weldon, with Galton'sfinancial support,in 1900, after the Royal Society
had allowed Batesonto publish a detailed criticismof a paper submittedby
Pearson beforethe paper itselfhadbeen issued. Britain's important scientific
journal, Nature, tookthe biometricians' sideandwould not print lettersfrom
Bateson. Pearson replied to Bateson'scriticismsin Biometrika but refusedto
acceptBateson'srejoinders, whereupon Bateson hadthem privately printedby
the CambridgeUniversity Pressin theformat of Biometrika\
At the British Association meetingin Cambridgein 1904, Bateson, then
18 1. THE DEVELOPMENT OF STATISTICS
Presidentof theZoological Section, took the opportunityto deliver a bitter attack
on the biometric school. Dramatically waving aloft the published volumesof
Biometrika,he pronounced them worthless and hedescribed Pearson's correla-
tion tablesas: "aProcrusteanbedinto whichthe biometricianfits his unanalysed
data." (quotedby Julian Huxley,1949).
It is even said that Pearson andBatesonrefusedto shake hands at Weldon's
funeral. Nevertheless, after Weldon's deaththe controversy cooled. Pearson's
work became more concerned with the theoryof statistics, althoughthe influ-
enceof his eugenic philosophy wasstill in evidence, and by1910, when Bateson
becameDirector of theJohn Innes Horticultural Institute, the argument haddied.
However, some statistical aspects of this contentious debate predated the
evolution dispute,andechoesof them- indeed, marked reverberations from
them - arestill around today, although of course MendelianandDarwinian
thinking arecompletely reconciled.
STATISTICAL CRITICISM
Statisticshasbeen calledthe "scienceof averages,"andthis definitionis not
meant in a kindly way. The great physiologist Claude Bernard (1813-1878)
maintained that the use ofaveragesin physiology couldnot be countenanced:
becausethe true relationsof phenomenadisappear in theaverage; whendealing
with complex andvariable experiments, we must study their various circum-
stances, andthen present our most perfect experiment as atype, which, however,
still standsfor true facts.
... averagesmust thereforeberejected, because they confuse while aiming to
unify, anddistort while aiming to simplify. (Bernard,1865/1927,p. 135)
Now it is, of course, true that lumping measurements together may notgive
us anything more thana pictureof thelumping together,and theaverage value
may not beanything likeany oneindividual measurement at all, but Bernard's
ideal type fails to acknowledgethe reality of individual differences. A rather
memorable exampleof a very real confusionis given by Bernard(1865/1927):
A startling instanceof this kindwasinventedby aphysiologistwho took urinefrom
a railway stationurinal where peopleof all nationspassed, and whobelievedthat he
could thus present an analysisof average European urine! (pp. 134-135).
A less memorable, but just astelling, exampleis that of thesocial psycholo-
gist who solemnly reports "mean social class."
Pearson(1906) notes that:
One of theblows to Weldon, which resultedfrom his biometric viewof life
STATISTICAL CRITICISM 19
wasthat his biological friendscould not appreciatehis newenthusiasms. They
could not understandhow theMuseum"specimen"was in thefuture to be
replacedby the"sample"of 500 to1000 individuals, (p. 37)
The view is still not wholly appreciated. Many psychologists subscribe to
the position thatthe most pressing problems of thediscipline,andcertainlythe
onesof most practical interest, areproblemsof individual behavior. A major
criticism of the effect of the use of thestatistical approachin psychological
researchis thefailure to differentiateadequately between general propositions
that applyto most, if not all, membersof a particular groupand statistical
propositions that applyto someaggregatedmeasureof the membersof the
group. The latter approach discounts the exceptionsto thestatisticalaggregate,
which not only may be themost interestingbut may, on occasion, constitute a
large proportionof thegroup.
Controversy abounds in thefield of measurement, probability, andstatistics,
and themethods employedare open to criticism, revision, and downright
rejection. On theother hand, measurement andstatistics playa leading rolein
psychological research, and thegreatest danger seemsto lie in anonawareness
of the limitations of thestatistical approachand thebasesof their development,
aswell as the use of techniques, assisted by thehigh-speed computer, asrecipes
for datamanipulation.
Miller (1963) observedof Fisher, "Few psychologists have educated us as
rapidly, or have influencedour work aspervasively,as didthis fervent, clear-
headed statistician."(p. 157).
Hogben (1957) certainly agreesthat Fisherhasbeen enormouslyinfluential
but heobjectsto Fisher's confidence in his ownintuitions:
This intrepid beliefin what he disarmingly calls common sense... has ledFisher
... to advancea batteryof conceptsfor thesemantic credentials of which neither
he nor hisdisciplesoffer any justification enrapport with the generallyaccepted
tenetsof theclassical theoryof probability. (Hogben, 1957, p. 504)
Hogben also expresses a thoughtoften shared by natural scientists when
they review psychological research, that:
Acceptability of a statisticallysignificant result of an experimenton animal
behaviourin contradistinctionto aresult whichthe investigatorcanrepeat before
a critical audience naturally promotes a high outputof publication. Hencethe
argumentthat the techniques workhas atempting appeal to young biologists.
(Hogben, 1957,p. 27)
Experimental psychologists may well agree thatthe tightly controlledex-
periment is the apotheosisof classicalscientific method,but they are not so
20 1. THE DEVELOPMENT OF STATISTICS
arrogantas tosuppose that their subject matter will necessarily submit to this
form of analysis,andthey turn, almost inevitably, to statistical,asopposedto
experimental, control. This is not amuddle-headed notion, but it does present
dangersif it is accepted without caution.
A balanced,but notuncritical, viewof theutility of statisticscan bearrived
at from a considerationof theforces that shaped the disciplineand anexami-
nationof its development. Whether or notthis is anassertion that anyone, let
alonethe authorof this book,canjustify remainsto beseen.
Yet thereareWriters, of a Classindeed very different from that of JamesBernoulli,
who insinuateas if theDoctrine of Probabilities could haveno placein anyserious
Enquiry; andthat Studiesof this kind, trivialandeasyasthey be,rather disqualify
a man forreasoningon anyother subject. Let the Reader chuse.(De Moivre,
1756/1967,p. 254)
2
Science, Psychology,
and Statistics
DETERMINISM
It is apopular notion thatif psychologyis to beconsidereda science, thenit
most certainlyis not anexact science. The propositionsof psychologyare
consideredto beinexact becauseno psychologist on earth would venturea
statement suchasthis: "All stable extravertswill, when asked,volunteerto
participate in psychological experiments."
1
The propositionsof the natural
sciencesareconsideredto beexact because all physicists wouldbe preparedto
attest (with somefew cautionary qualifications) that, "fireburns" or, more
pretentiously, that "e = mc
2
." In short, it is felt that the order in theuniverse,
which nearly everyone (though for different reasons)is sure must be there,has
been more obviously demonstrated by thenatural rather thanthe social scien-
tists.
Order in theuniverse implies determinism, a mostuseful and amost vexing
term, for it brings thosewho wonder about such things into contact with the
philosophicalunderpinningsof therather everyday concept of causality.No one
hasstatedthe situation more clearly than Laplace in his Essai:
Present eventsareconnectedwith preceding onesby a tiebasedupon the evident
principle that a thing cannot occur without a causewhich producesit. This axiom
known by the nameof theprinciple of sufficient reason, extends even to actions
which areconsideredindifferent; the freestwill is unable withouta determinative
motive to give them bi rth;...
We ought thento regardthe present stateof theuniverseas theeffect of its anterior
1
Not wishingto make pronouncements on theprobabilistic natureof thework of others,thewriter
is makinganoblique referenceto work in which he and acolleague (Cowles& Davis, 1987)found that
thereis an 80%chance that stable extraverts will volunteerto participatein psychological research.
21
22 2. SCIENCE, PSYCHOLOGY, AND STATISTICS
stateand as thecauseof the onethat is to follow. Given for oneinstantanintelligence
which could comprehendall the forces by which natureis animated and the
respectivesituationsof thebeingswho composeit- an intelligence sufficiently vast
to submit thesedatato analysis - it would embracein the same formulathe
movementsof thegreatestbodiesof theuniverseandthoseof thelightest atom;for
it nothingwould be uncertainand thefuture, as thepast,would be present to its eyes.
(Laplace,1820/1951,pp. 3-4)
The assumptionof determinismis simply the apparently reasonable notion
that eventsare caused.Sciencediscovers regularitiesin nature, formulates
descriptionsof these regularities, and provides explanations, that is to say,
discoverscauses.Knowledgeof thepastenablesthe future to be predicted.
This is thepopular viewof science,and it isalsoa sort of working ruleof thumb
for those engaged in thescientific enterprise. Determinism is thecentral feature
of the development of modern science up to thefirst quarterof the20th century.
The successesof thenatural sciences, particularly the successof Newtonian
mechanics, urged andinfluenced someof thegiantsof psychology, particularly
in North America,to adopt a similar mechanistic approach to the study of
behavior. The rise of behaviorism promotedthe view that psychology could
be ascience, "like other sciences."Formulae couldbe devised that would allow
behaviorto bepredicted,and atechnology couldbe achieved that would enable
environmentalconditionsto be somanipulatedthat behavior couldbe control-
led. The variability in living things was to bebrought under experimental
control, a program that leads quite naturally to thenotion of thestimulus control
of behavior. It follows thatconceptssuchaswill or choiceor freedomof action
could be rejectedby behavioral science.
In 1913, JohnB. Watson(1878-1958)publisheda paper that became the
behaviorists' manifesto. It begins, "Psychology as thebehaviorist viewsit is a
purely objective experimental branch of naturalscience.Its theoretical goal is
the predictionandcontrol of behavior" (Watson, 1913, p. 158).
Oddly enough, this pronouncement coincided with work that began to
questionthe assumptionof determinismin physics,the undoubted leader of the
natural sciences.In 1913, a laboratory experiment in Cambridge, England,
providedspectroscopic proof of what is known as theRutherford-Bohr model
of the atom. Ernest Rutherford (later Lord Rutherford, 1871-1937)hadpro-
posed that the atomwaslike a miniaturesolar system with electrons orbiting a
central nucleus. Niels Bohr (1885-1962),a Danish physicist, explained that the
electrons movedfrom oneorbit to another, emittingor absorbing energy asthey
movedtowardor awayfrom the nucleus. The jumping of anelectronfrom orbit
to orbit appearedto beunpredictable. The totality of exchanges could only be
predictedin a statistical, probabilistic fashion.That giantof modern physicists,
DETERMINISM 23
Albert Einstein(1879-1955),whose workhadhelpedto start the revolutionin
physics,wasloath to abandonthe conceptof a completely causal universe and
indeed never did entirely abandonit. In the 1920s, Einstein made the statement
that hasoften been paraphrased as,"God doesnot play dice withthe world."
Nevertheless,he recognizedthe problem. In a lecture givenin 1928, Einstein
said:
Today faithin unbroken causalityis threatenedpreciselyby thosewhosepathit had
illumined astheir chief and unrestricted leader at thefront, namelyby therepre-
sentativesof physics... All natural lawsaretherefore claimedto be, "in principle,"
of a statisticalvariety and ourimperfect observationpracticesalone havecheatedus
into a belief in strict causality, (quotedby Clark, 1971,pp. 347-348)
But Einstein never really accepted this proposition, believing to the endthat
indeterminacywas to beequated with ignorance. Einstein may beright in
subscribing ultimatelyto theinflexibility of Laplace's all-seeing demon, but
another approachto indeterminacywas advancedby Werner Heisenberg
(1902-1981)a German physicist who, in 1927, formulatedhis famous uncer-
tainty principle. He examinednot merelythe practical limits of measurement
but the theoretical limits,andshowed that the act ofobservationof theposition
andvelocity of a subatomic particle interfered with it so as toinevitably produce
errors in themeasurement of one or theother. This assertion hasbeen takento
mean that, ultimately,the forces in our universeare randomand therefore
indeterminate.Bertrand Russell (1931) disagrees:
Spaceandtime were inventedby theGreeks, andserved their purpose admirably
until the present century. Einstein replaced them by akind of centaur whichhe called
"space-time,"andthis did well enoughfor a coupleof decades, but modern quantum
mechanicshasmadeit evident thata more fundamental reconstruction is necessary.
The Principleof Indeterminacyis merelyanillustration of this necessity,not of the
failure of physical lawsto determinethe courseof nature, (pp. 108-109)
The important pointto beawareof is that Heisenberg's principle refers to
the observerand the act of observationand notdirectly to thephenomena that
arebeing observed. This implies that the phenomena have an existence outside
their observationanddescription,a contention that, by itself, occupies philoso-
phers. Nowhereis thedemonstration that technique andmethod shape the way
in which we conceptualize phenomena more apparent than in the physicsof
light. The progressof eventsin physics thatled to theview thatlight wasboth
waveandparticle,a view that Einstein's workhadpromoted, beganto dismay
him when it was used to suggest that physics would have to abandon strict
24 2. SCIENCE, PSYCHOLOGY, AND STATISTICS
continuity andcausality. Bohr responded to Einstein's dismay:
You, the man whointroducedthe ideaof light asparticles! If you are soconcerned
with the situationin physicsin which the natureof light allows for a dual interpre-
tation, thenask theGerman government to ban the use of photoelectric cellsif you
think that light is waves, or the use ofdiffraction gratingsif light is corpuscular,
(quotedby Clark, 1971,p. 253)
The parallelsin experimental psychologyareobvious. That eminent histo-
rian of thediscipline, Edwin Boring, describes a colloquiumat Harvard when
his colleague, William McDougall,who: "believed in freedomfor thehuman
mind - in atleast a little residueof freedom- believedin it andhopedfor as
much as hecould savefrom the inroadsof scientific determinism,"and he, a
determinist, achieved, for Boring, an understanding:
McDougall'sfreedomwas myvariance. McDougall hoped that variance would
alwaysbe found in specifyingthe laws of behavior,for there freedom might still
persist. I hoped then- less wise thanI think I am now (it was 31yearsago)- that
science would keep pressing variance towards zero as alimit. At any rate this general
fact emergesfrom this example: freedom, when you believeit is operating, always
residesin anareaof ignorance. If thereis aknown law, you do nothave freedom.
(Boring, 1957,p. 190)
Boring was really unshakablein his belief in determinism,and that most
influential of psychologists,B. F. Skinner, agrees with the necessityof assuming
order in nature:
It is aworking assumption which must be adoptedat thevery start. We cannotapply
the methods of scienceto a subject matter whichis assumedto move about
capriciously. Sciencenot only describes,it predicts. It dealsnot only with the past
but with the f ut ure. . .If we are to use themethodsof sciencein the field of human
affairs, we must assume that behavior is lawful and determined. (Skinner, 1953,
p. 6)
Carl Rogersis among thosewho have adoptedas afundamental positionthe
view that individualsareresponsible, free, andspontaneous. Rogers believes
that, "the individual choosesto fulfill himself by playing a responsibleand
voluntary partin bringing aboutthe destined events of the world" (Rogers,1962,
quotedby Walker, 1970,p. 13).
The use of theword destinedin this assertion somewhat spoils the impact of
what mosttaketo be theindividualistic andhumanisticapproachthat is espoused
by Rogers. Indeed,he hasmaintained that the conceptsof scientific determi-
nism andpersonal choicecanpeacefully coexist in the way inwhich the particle
DETERMINISM 25
andwave theoriesof light coexist.The theoriesaretrue but incompatible.
Thesefew wordsdo little more than suggest the problems thatarefacedby
the philosopherof science whenhe or shetacklesthe conceptof methodin both
the naturaland thesocial sciences. What basic assumptions can wemake?So
often the argumentspresentedby humanistic psychologists have strong moral
or even theological undertones, whereas those offering the determinist's view
point to theregularities that exist in nature- even human nature - andaver that
without such regularities, behavior would be unpredictable. Clearly, beings
whose behaviorwas completely unpredictable would have been unable to
achievethe degreeof technological andsocial cooperation that marks the human
species. Indeed, individuals of all philosophical persuasions tend to agree that
someone whose behavior is generally not predictable needs some sort of
treatment.On theother hand,the notion of moral responsibility implies free-
dom. If I am to bepraisedfor my good worksandblamedfor my sins,a statement
that "nature is merely unfolding as it should" is unlikely to beacceptedas a
defensefor the latter, and unlikely to beadvancedby me as areasonfor the
former. One way out of theimpasse,it is suggested,is to reject strict "100
percent"determinismand toaccept statistical determinism. "Freedom"then
becomes part of theerror termin statistical manipulations. Grimbaum (1952)
considersthe arguments against both strict determinism andstatistical determi-
nism, arguments based on thecomplexityof human behavior, the conceptof
moral choice, and theassignmentof responsibility,assertions that individuals
areuniqueandthat therefore their actions are notgeneralizablein thescientific
sense,and that human beingsvia their goal-seeking behavior themselves
determinethe future. He concludes,"Sincethe important arguments against
determinismwhich we have considered arewithout foundation,the psychologist
neednot bedeterredin his questand canconfidently use thecausal hypothesis
as aprinciple, undauntedby the caveat of the philosophical indeterminist"
(Grunbaum,1952,p. 676).
Feigl (1959) insists that freedom must not beconfused withthe absenceof
causality, and causal determination must not beconfused with coercionor
compulsionor constraint. "To be free means that the chooseror agent is an
essentiallink in thechainof causal events andthat no extraneous compulsion
- be it physical, biological, or psychological- forceshim to act in adirection
incompatiblewith his basic desiresor intentions" (Feigl, 1959, p. 116).
To some extent, andmany modern thinkers would say to alarge extent (and
Feigl agreeswith this), philosophical perplexities can beclarified, if not entirely
resolved,by examiningthe meaningof theterms employedin thedebate rather
than arguing about reality.
Two further points mightbe made.Thefirst is that variabilityanduncertainty
in observationsin thenaturalaswell as thesocial sciences require a statistical
26 2. SCIENCE, PSYCHOLOGY, AND STATISTICS
approachin order to reveal broad regularities, andthis appliesto experimenta-
tion andobservationnow, whatever philosophical stanceis adopted.If the very
idea of regularity is rejected, then systematic approaches to thestudy of the
human conditionareirrelevant. The secondis that,at the end of thediscussion,
most will smile withDr Samuel Johnson when he said,"I know thatI havefree
will andthere'san end onit."
PROBABILISTIC AND DETERMINISTIC MODELS
The natural sciencessetgreat storeby mathematical models. For example,in
the physicsof massandmotion, potential energy is given by PE = mgh, where
m is mass,g is accelerationdue togravity, and h isheight. Newton'ssecond
law of motion states that F = ma, where F is force, m is mass, and a is
acceleration.
These mathematical functionsor modelsmay betermed deterministic models
because, giventhe valueson theright-hand sideof theequation,the construct
on the left-hand sideis completely determined. Any variability that mightbe
observedin, for example, Forcefor a measured mass and agiven acceleration
is dueonly to measurement error. Increasing the precisionof measurement using
accurate instrumentation and/or superior technique will reducethe error in Fto
very small margins indeed.
2
Some psychologists, notably, Clark Hul l in learning theoryandRaymond
B. Cattell m personality theoryand measurement,approachedtheir work with
the aim(one mightsay thedream!)of producing parallel models for psycho-
logical constructs.Econometricians similarlysearchfor models thatwill de-
scribe the market with precision and reliability. For social and biological
scientists thereis apersistentandenduring challenge that makes their disciplines
both fascinatingandfrustrating.That challengeis thesearchfor meansto assess
andunderstandthe variability thatis inherentin living systemsand societies.
In univariate statistical analysis we areencompassing those procedures
wherethereis just onemeasuredor dependent variable (the variable that appears
on the left-hand sideof theequals signin thefunction shown next)and one or
more independent or predictor variables: those that areseenon theright-hand
2
For thoseof you whohave been exposed to physics, it must be admitted that although
Newton's laws were more or less unchallenged for 200yearsthe modelshe gaveus are notdefinitions
but assumptionswithin the Newtonian system, as thegreat Austrian physicist, Ernst Mach, pointed
out. Einstein's theoryof relativity showed that they are notuniversally true. Nevertheless, the
distinction betweenthe modelsof the natural and thesocial andbiological sciencesis, for the
moment,a useful one.
27 SCIENCEAND INDUCTION
sideof theequation.Thefunction is known as thegeneral linear model.
The independent variables arethose thatarechosen and/or manipulated by
the investigator. Theyare assumedto berelatedto or havean effect on the
dependent variable.In this modelthe independent variables, x\, x
2
, *
3j
x
n
, are
assumedto bemeasured without error, po , PI, Pa, PS, P, are unknown parame-
ters. Bo is aconstant equivalent to theinterceptin thesimple linear model, and
the othersare theweightsof theindependent variables affecting, or being used
to estimate,a given observationy.Finally, e is therandom errorassociatedwith
the particular combinationof circumstances: those chosen variables, treat-
ments, individual difference factors, and so on. But these modelsare prob-
abilistic models. Theyarebasedon samplesof observationsfrom perhapsa
variety of populationsandthey may betested under the hypothesisof chance.
Even when they pass the testof significance,like other statistical outcomes they
are notnecessarily completely reliable. The dependent variable can, at best,be
only partly determinedby such models, andother samplesfrom other popula-
tions, perhaps using differently defined independent variables, may, andsome-
timesdo, give us different conclusions.
SCIENCE AND INDUCTION
It is commonto tracethe Westernintellectual tradition to two fountainheads,
Greek philosophy, particularly Aristotelian philosophy, andJudaic/Christian
theology. Science began with the Greeksin the sense that theyset out its
commonlyacceptedground rules. Science proceeds systematically. It givesus
a knowledgeof nature thatis public anddemonstrable and, most importantly,
open to correction. Science provides explanations that are rational and,in
principle, testable, rather than mystical or symbolicor theological.The cold
rationality that this implieshasbeen temperedby theJudaic/Christian notionof
the compassionate human being as acreature constructed in theimageof God,
and, by thebelief thatthe universeis thecreationof God and assuch deserves
the attentionof thebeings thatinhabit it. It is these streams of thought that give
us thedebate about intellectual values andscientific responsibilityandsustain
the view that science cannot be metaphysically neutral nor value free.
But it is not true to saythat thesetraditionshavecontinuouslyguided Western
thought. Christianity took some time to become established. Greek philosophy
andscience disappeared under the pragmatic technologists of Rome. Judaism,
weakenedby the loss of its homeand thepersecutionof its adherents, took
refuge in the refinement andinterpretationof its ancient doctrines. When
28 2. SCIENCE, PSYCHOLOGY, AND STATISTICS
Christianity did becomethe intellectual shelter for the thinkersof Europe,it
embracedthe view thatGod revealed what He wishedto revealandthat nature
everywherewassymbolicof a general moral truth known only to Him. These
ideas, formalized by the first great Christian philosopher, St. Augustine
(354-430),persistedup to, andbeyond,the Renaissance, andsimplistic versions
may be seenin theexpostulationsof fundamentalistpreachers today.For the
best part of 1,000 yearsscientific thinking was not animportant partof
intellectual advance.
Aristotle's rationalismwasrevivedby "the schoolmen," of whom the greatest
was Thomas Aquinas(1225?-1274),but the influenceof scienceand the
shapingof themodernintellectualworld beganin the 17th century.
The modern world,so far asmental outlookis concerned, beginsin the seventeenth
century. No Italian of theRenaissance would havebeenunintelligible to Platoor
Aristotle; Luther would have horrifiedThomasAquinas,but would not have been
difficult for him to understand. Withthe seventeenth century it is different: Plato
and Aristotle, AquinasandOccam,could not have made heador tail of Newton.
(Russell, 1946,p. 512)
This is not todenythe work of earlier scholars. Leonardo da Vinci (1452
-1519)vigorously propoundedthe importanceof experienceandobservation,
and enthusiastically wroteof causalityand the"certainty" of mathematics.
Francis Bacon(1561-1626)is aparticular exampleof one whoexpressedthe
importanceof systemandmethodin the gainingof newknowledge,and his
contribution, althoughoften underestimated, is of great interest to psychologists.
Hearnshaw (1987) gives an accountof Bacon that shows how hisideascan be
seenin thefoundationandprogressof experimentalandinductive psychology.
He notes that: "Bacon himself made few detailed contributions to general
psychologyassuch,he sawmore clearly than anyone of his time the need for,
and thepotentialitiesof, a psychologyfoundedon empirical data,andcapable
of being appliedto 'the relief of man'sestate'"(Hearnshaw, 1987, p. 55).
A turning pointfor modern science arrives with the work of Copernicus
(1473-1543). His accountof theheliocentric theoryof our planetary system
waspublishedin the yearof his deathand hadlittle impactuntil the 17th century.
The Copernican theory involved no newfacts, nor did it contributeto mathe-
maticalsimplicity. As Ginzburg (1936) notes, Copernicus reviewed the existing
facts andcameup with a simpler physical hypothesis than that of Ptolemaic
theorywhich stated that the earthwas thecenterof theuniverse:
The fact that PTOLEMYand hissuccessorswere led to make an affirmation in
violenceto thefactsas then known shows that their acceptanceof thebelief in the
SCIENCE AND INDUCTION 29
immobility of theearthat thecentreof theuniversewas not theresultof incomplete
knowledgebut rathertheresultof a positive prejudice emanating from non-scientific
considerations. (Ginzburg, 1936, p. 308)
This statementis onethat scientists, when they arewearing their scientists'
hats, would support andelaborate upon, but anexaminationof the stateof the
psychological sciences could not fully sustainit. Nowhereis our knowledge
more incomplete than in thestudyof thehuman condition, andnowhereare our
interpretations more open to prejudiceandideology.The studyof differences
betweenthe sexes, the nature-nurture issuein the examinationof human
personality, intelligence, andaptitude,the sociobiological debateon theinter-
pretation of the way inwhich societiesare organized, all are marked by
undertonesof ethicsandideology thatthescientific purist wouldsee asoutside
the notion of anautonomous science. Nor is this a complete list.
Scienceis not somuch about factsasabout interpretationsof observation,
andinterpretationsaswell asobservationsareguidedandmoldedby precon-
ceptions. Ask someoneto "observe"and he or shewill askwhat it is that is to
be observed. To suggest that the manipulationandanalysisof numerical data,
in the senseof theactual methods employed, canalsobe guidedby preconcep-
tions seems very odd. Nevertheless, the development of statisticswasheavily
influencedby theideological stance of its developers.The strengthof statistical
analysisis that its latter-day users do nothaveto subscribeto theparticular views
of the pioneersin orderto appreciateits utility andapply it successfully.
A common viewis that science proceeds through the processof induction.
Put simply, thisis theview thatthefuture will resemblethe past.The occurrence
of an eventA will leadus toexpectan eventB if past experiencehasshownB
alwaysfollowing A Thegeneral principle that B follows A is quickly accepted.
Reasoningfrom particular casesto general principlesis seen as thevery
foundationof science.
The conceptsof causality and inference come togetherin the processof
induction. The great Scottish philosopher David Hume (1711-1776) threw
down a challenge that still occupiesthe attentionof philosophers:
As to pastexperience,it can beallowed to give direct andcertain informationof
those precise objects only, and that precise periodof time, which fell under its
cognizance: but whythis experience should beextendedto future times,and toother
objects, whichfor aughtwe know, may beonly in appearance similar; this is themain
questionon which I would insist.
Thesetwo propositionsare farfrom beingthesame, / have found that suchan
object hasalways beenattended withsuchan effect andI foresee that other objects,
which are,in appearance, similar, will be attended withsimilar effects. I shall allow,
if you please,that the onepropositionmay justly be inferred fromthe other: I know,
30 2. SCIENCE, PSYCHOLOGY, AND STATISTICS
in fact, thatit alwaysis inferred. But if you insist thatthe inferenceis madeby a
chainof reasoning,I desireyou toproduce that reasoning. (Hume, 1748/1951, pp.
33-34)
The arguments against Hume's assertion that it is merely the frequent
conjunctionor sequencingof two events that leads us to abelief thatonecauses
the other have been presented in many formsandthis account cannot examine
them all. The most obvious counter is that Hume'sown assertion invokes
causality. Contiguityin time andspacecausesus toassume causality. Popper
(1962) notes that the ideaof repetition basedon similarity as thebasisof a belief
in causality presents difficulties. Situationsarenever exactlythe same.Similar
situationsareinterpretedasrepetitionsfrom a particular pointof view - and
that point of view is asystemof "expectations, anticipations,assumptions, or
interests"(Popper, 1962, p. 45).
In psychological matters there is theadditional factor of'volition. I wish to
pick up my pen andwrite. A chainof nervousand muscularand cognitive
processes ensuesand I dowrite. The fact that human beings can and docontrol
their future actions leadsto asituation wherea general denial of causalityflies
in the faceof common sense. Such a denial invites mockery.
Hume's argument that experience does not justify predictionis morediffi-
cult to counter. The courseof natural eventsis notwholly predictable,and the
history of the scientific enterpriseis littered with the ruins of theoriesand
explanations that subsequent experience showed to bewanting. Hume's skep-
ticism, if it was accepted, would lead to a situation where nothing could be
learnedfrom experienceandobservation.The history of humanaffairs would
refute this, but theargument undoubtedly leads to acautious approach. Predict-
ing the future becomesa probabilistic exerciseandscienceis nolonger ableto
claim to be the way tocertaintyandtruth.
Using the probability calculusas an aid topredictionis onething; usingit
to assessthe value of a particular theoryis another. Popper(1962) regards
statements about theories havinga high degreeof probability asmisconcep-
tions. Theoriescan beinvokedto explain various phenomena andgoodtheories
arethose that stand up tosevere test. But, Popper argues, corroboration cannot
be equated with mathematical probability:
All theories,including the best,havethe sameprobability, namelyzero.
That an appealto probability is incapableof solving the riddle of experienceis a
conclusionfirst reachedlong ago byDavid Hume...
Experience doesnot consist in the mechanical accumulation of observations.
Experienceis creative. It is the result of free, bold and creative interpretations,
controlledby severe criticismandsevere tests. (Popper, 1962, pp. 192-193)
I NFERENCE 31
INFERENCE
Induction is, andwill continueto be, alarge problemfor philosophical discus-
sion. Inference can benarrowed down. Although the termsaresometimes used
in the same senseandwith the same meaning, it is useful to reservethe term
inference for themakingof explicit statements about the propertiesof a wider
universe thatare basedon amuch narrowerset of observations. Statistical
inference is precisely that,and thediscussion just presented leads to the
argumentthat all inferenceis probabilistic andthereforeall inferential state-
mentsarestatistical.
Statistical inference is a way ofreasoningthat presents itself as amathemati-
cal solutionto theproblemof induction. The searchfor rulesof inferencefrom
the time of Bernoulli andBayesto that of NeymanandPearsonhasprovided
the spur for the developmentof mathematical probability theory. It hasbeen
argued thatthe establishingof a set ofrecipesfor data manipulationhas led to
a situationwhereresearchersin thesocial sciences"allow statisticsto do the
thinking for them."It hasbeenfurther argued that psychological questions that
do not lend themselvesto thecollectionandmanipulationof quantitative data
areneglectedor ignored. These criticisms are not to betakenlightly, but they
can beanswered.In the firstplace, statistical inference is only a part of formal
psychological investigation. An equally important component is experimental
design.It is thelastingcontributionof RonaldFisher,a mathematical statistician
and achampionof thepractical researcher, that showedus how theformulation
of intelligent questions in systematic frameworks would produce datathat, with
the help of statistics, could provide intelligent answers. In thesecond place, the
social sciences have repeatedly come up with techniques that have enabled
qualitativedatato bequantified.
In experimental psychologytwo broad strategies have been adopted for
coping withvariability. The experimental analytic approach sets out boldly to
containor to standardizeasmanyof the sourcesof variability aspossible. In
the micro-universeof theSkinner box, shaping andobservingthe rat'sbehavior
dependon aknowledgeof theantecedent andpresent conditions under which
a particular pieceof behaviormay beobserved.The second approach is that of
statistical inference. Experimental psychologists control (in the senseof stand-
ardizeor equalize) those variables that they cancontrol, measure what they wish
to measure witha degreeof precision,assume that noncontrolled factors operate
randomly,andhope that statistical methods will teaseout the"effects" from the
"error." Whateverthe strategy, experimentalists will agreethat the knowledge
they obtain is approximate. It has also been generally assumed that this
approximate science is an interim science. Probabilityis part of scientific
32 2. SCIENCE, PSYCHOLOGY, AND STATISTICS
method but not part of knowledge. Some writers have rejected this view.
Reichenbach(1938),for example, sought to devisea formal probability logic
in which judgmentsof the truth or falsity of propositionsis replacedby the
notion of weight. Probability belongs to aclassof events. Weight refers to a
single event,and asingle eventcanbelongto many classes:
Supposea manforty yearsold hastuberculosis;. . . Shall we consider. . . the
frequency of death withinthe classof menforty yearsold, or within the classof
tubercular people?...
We takethenarrowestclassfor which we have reliable statistics... weshould
takethe classof tubercularmen offorty ... thenarrower the classthe betterthe
determinationof wei ght...a cautious physicianwill even placethe man inquestion
within a narrower classby makinganX-ray; he will then use as theweight of the
case,theprobability of death belongingto aconditionof thekind observedon the
film. (Reichenbach, 1938, pp. 316-317)
This is afrequentist viewof probability, and it is theview that is implicit in
statistical inference. Reichenbach's thesis should have an appealfor experimen-
tal psychologists, although it is not widely known. It reflects, in formal terms,
the way inwhich psychological knowledge is reportedin the journals and
textbooks, although whether or not thewriters andresearchers recognize this
may be debated.The weight of a given propositionis relativeto thestateof our
knowledge,andstatements about particularindividualsandparticular behaviors
areproneto error. It is not that we aretotally ignorant,but that manyof our
classesare toobroadto allow for substantialweight to beplacedon theevidence.
Popper (1959) takes issue with Reichenbach's attempts to extendthe relative
frequencyview of probability to include inductive probability. Popper, with
Hume,maintains that a theoryof inductionis impossible:
We shall haveto get accustomedto theidea thatwe must not look uponscienceas
a "body of knowledge",but rather as asystemof hypotheses; that is to say, as a
systemof guessesor anticipations... of which we arenever justifiedin saying that
we know that theyare"true" or "more or lesscertain"or even"probable". (Popper,
1959, p. 317)
Now Popper admits only that a systemis scientific whenit can betestedby
experience. Scientific statements aretestedby attemptsto refuteor falsify them.
Theories that withstand severe tests arecorroboratedby thetests,but they are
not proven,nor arethey even made more probable. It is difficult to gainsay
Popper's logicandHume's skepticism. They arefood for philosophical thought,
but scientistswho perhaps occasionally worry about such things will put them
aside,if only becauseworking scientistsarepractical people.The principle of
inductionis theprinciple of science,and thefact that Popper and Hume can
STATISTICS IN PSYCHOLOGY 33
shout from the philosophical sidelines that the "official" rules of thegameare
irrational andthat the "real" rulesof thegameare notfully appreciated, will not
stopthe gamefrom being played.
Statistical inference may bedefinedas the use of methodsbasedon therules
of chance to draw conclusionsfrom quantitativedata. It may be directly
compared with exercises where numbered tickets aredrawnfrom a bag oftickets
with a view to making statements about the compositionof thebag,or wherea
die or acoin is tossedwith a view to making statements about its fairness.
Supposea bagcontains tickets numbered 1,2,3,4, and 5.Each numeral appears
on the same,very large, number of tickets. Now suppose that 25 tickets are
drawnat randomand with replacement from the bag and the sum of the numbers
is calculated.The obtainedsumcould be as low as 25 and as high as 125, but
the expected valueof the sumwill be 75,because each of thenumerals should
occur on one fifth of thedrawsor thereabouts. The sumshouldbe 5(1 +2 + 3
+ 4 + 5) = 75. Inpractice,a given drawwill havea sumthat departsfrom this
valueby anamount aboveor below it that can bedescribedaschance error.
The likely sizeof this error is given by astatistic calledthe standard error, which
is readily computedfrom the formula a/Vw, whereCT is thestandarddeviation
of the numbersin the bag. Leaving aside, for the moment,the problemof
estimatingCT when,as isusual,the contentsof the bag areunknown,all classical
statistical inferential procedures stem from this sortof exercise.The real vari-
ability in the bag isgiven by thestandard deviation, and thechance variability
in the sumsof thenumbers drawnis given by thestandard error.
STATISTICS IN PSYCHOLOGY
The use ofquantitative methodsin the studyof mental processes begins with
GustavFechner(1801-1887)who sethimself the problemof examiningthe
relationship betweenstimulusandsensation. In 1860he publishedElementeder
Psychophysik,in which he describeshis inventionof a psychophysical law that
describesthe relationship between mind andbody. He developed methods of
measuringsensationbasedon mathematicalandstatisticalconsiderations, meth-
ods that have theirechoesin present-day experimental psychology. Fechner
madeuse of thenormal law in hisdevelopment of themethodof constant stimuli,
applyingit in the Gaussian sense as a way of dealingwith erroranduncontrolled
variation.
Fechner'sbasic assumptions and theconclusionshe drewfrom his experi-
mentalinvestigations have been shown to befaulty. Stevens' work in the 1950s
and thelater developments of signal detection theory have overtaken the work
of the 19th century psychophysicists, but therevolutionary natureof Fechner's
methods profoundly influenced experimental psychology. Boring (1950)
34 2. SCIENCE, PSYCHOLOGY, AND STATISTICS
devotesa whole chapterof his book to thework of Fechner.
Investigationsof mental inheritanceandmental testing beganat aboutthe
same time with Galton, who took the normal law of error from Queteletand
madeit the centerpieceof his research.The error distributionof physics became
a descriptionof the distributionof valuesabouta value thatwas"most probable."
Galton laidthe foundationsof themethodof correlation that wasrefinedby Karl
Pearson, work that is examinedin more detail laterin this volume.At the turn
of the century, Charles Spearman (1863-1945) usedthe methodto define
mentalabilities asfactors. Whentwo apparentlydifferent abilities areshownto
be correlated, Spearman took this asevidencefor the existenceof a general
factor G, afactor of generalintelligence, andfactors that were specific to the
different abilities. Correlational methods in psychology were dominant for
almost the whole of the first half of this centuryand thetechniquesof factor
analysiswere honed during this period. Chapter 11 providesa review of its
development,but it is worth noting here that Dodd (1928) reviewed the
considerable literature that hadaccumulated over the 23years since Spearman's
original work, andWolfle (1940) pushed this labor further. Wolfle quotes Louis
Thurstoneon what he takesto be themost importantuse offactor analysis:
Factoranalysisis useful especiallyin thosedomains wherebasicandfruitful concepts
areessentiallylacking andwhere crucial experiments have beendifficult to conceive.
... They enableus tomake onlythecrudestfirst map of a newdomain. But if we
have scientific intuitionandsufficient ingenuity,the rough factorialmap of a new
domain will enableus toproceedbeyondthefactorial stageto themore direct forms
of psychologicalexperimentationin thelaboratory. (Thurstone, 1940, pp. 189-190)
The interesting point about this statementis that it clearly sees factor analysis
as amethodof data exploration rather than an experimental method. As Lovie
(1983) points out, Spearman's approach was that of an experimenter using
correlational techniques to confirm his hypothesis, but from 1940on, that view
of the methodsof factor analysishas notprevailed.
Of course,the beginningsof general descriptive techniques crept into the
psychological literature over the same period. Means andprobable errorsare
commonly reported, and correlation coefficients are also accompaniedby
estimatesof their probable error.And it wasaround 1940 that psychologists
startedto become awareof thework of R. A. Fisherand toadopt analysisof
varianceas thetool of experimental work.It can beargued that the progression
of events thatled to Fisherian statistics also led to adivision in empirical
psychology,a split between correlational andexperimental psychology.
Cronbach (1957) chose to discussthe "two disciplines"in his APA presiden-
tial address. He notes thatin thebeginning:
STATISTICS IN PSYCHOLOGY 35
All experimentalproceduresweretests,all testswere experiments... .the statistical
comparisonof treatmentsappearedonly around 1900. . . Inference replaced
estimation: the mean and itsprobable error gave way to thecritical ratio. The
standardizedconditions and thestandardizedinstruments remained, but thefocus
shifted to thesinglemanipulatedvariable,andlater,following Fisher,to multivariate
manipulation. (Cronbach, 1957, p. 674)
Although there have been signs that the twodisciplinescanwork together,
the basic situationhas notchangedmuchover 30 years. Individual differences
areerror varianceto theexperimenter;it is the between-groupsor treatment
variance thatis of interest. Differential psychologists look for variationsand
relationships among variables within treatment conditions. Indeed, variation in
the situation here leads to error.
It may befairly claimed that these fundamentaldifferencesin approach have
had themost profoundeffect on psychology.And it may befurther claimed that
the sophisticationandsuccessof themethodsof analysis thatareusedby the
two camps have helped to formalizethe divisions. CorrelationandANOVA
haveled to multiple regression analysis andMANOVA, and yet themethods
are based on thesame model- thegeneral linear model. Unfortunately,
statistical consumers are frequently unawareof the fundamentals, frightened
away by the mathematics,or, bored and frustratedby the argumentson the
rationale of the probability calculus, they avoid investigation of the general
structureof themethods. When these problems have been overcome, the face
of psychologymay change.
3
Measurement
IN RESPECT OF MEASUREMENT
In the late 19th centurythe eminent scientist William Thomson, LordKelvin
(1824-1907),remarked:
I often saythat whenyou canmeasurewhat you arespeakingabout,andexpressit
in numbers,you know something about it; but whenyou cannot measure it, when
you cannot express it in numbers, your knowledge is of ameagreandunsatisfactory
kind: it may be thebeginningof knowledge, but youhave scarcely, in your thoughts,
advancedto thestageof science whatever the matter mightbe. (William Thomson,
Lord Kelvin, 1891,p. 80)
This expressionof theparamount importance of measurement is part of our
scientific tradition. Many versionsof thesame sentiment, for example, that of
Gallon, notedin chapter1, andthat of S. S.Stevens(1906-1973),whose work
is discussed later in this chapter,arefrequently noted with approval. Clearly,
measurement bestows scientific respectability,a stateof affairs that does scant
justiceto thework of people like Harvey, Darwin, Pasteur, Freud, andJames,
who, it will be noted, if they are to belabeled"scientists,"arebiological or
behavioralscientists.The natureof thedataand thecomplexityof thesystems
studiedby thesemen arequitedifferent in quality from the relatively simple
systems that were the domainof thephysical scientist. This is not, of course,to
deny the difficulty of the conceptualand experimental questions of modern
physics,but thefact remains that, in this field, problemscanoften be dealt with
in controlled isolation.
It is perhaps comfortingto observe that,in theearly years, therewas a
36
IN RESPECT OF MEASUREMENT 37
skepticism about the introductionof mathematics intothe social sciences.
Kendall (1968) quotesa writer in theSaturday Reviewof November11, 1871,
who stated:
If we saythat G representsthe confidenceof Liberalsin Mr Gladstoneand D the
confidenceof Conservativesin Mr Disraeli and x, y thenumberof thoseparties;and
infer that Mr Gladstone'stenureof office dependsupon someequation involving
dG/dx, dD/dy, we have merely wrappedup a plain statementin a mysterious
collectionof letters. (Kendall, 1968, p. 271)
And for GeorgeUdny Yule, the most level-headed of theearly statisticians:
Measurement doesnot necessarily mean progress. Failing thepossibility of measur-
ing that whichyou desire,the lust for measurement may, for example, merely result
in your measuringsomethingelse- andperhapsforgetting thedifference-or in your
ignoring somethingsbecausethey cannotbe measured. (Yule, 1921,pp. 106-107)
To equate science with measurement is amistake. Scienceis about system-
atic and controlled observationsand theattempt to verify or falsify those
observations. And if the prescriptionof science demanded that observations
must be quantifiable, thenthe natural aswell as thesocial sciences would be
severely retarded. The doubts about the absoluteutility of quantitative descrip-
tion expressedso long agocouldwell be ponderedon bytoday's practitioners
of experimental psychology. Nevertheless, the fact of thematteris that the early
yearsof theyoung disciplineof psychology show, with some notable excep-
tions, a longing for quantificationand, thereby, acceptance. In 1885 Joseph
Jacobs reviewed Ebbinghaus's famous work on memory,Ueber das Gedacht-
nis. He notes"If sciencebe measurement it mustbe confessed that psychology
is in a badway" (Jacobs, 1885, p. 454).
Jacobs praises Ebbinghaus's painstaking investigations and hiscareful re-
porting of his measurements:
May we hopeto see the daywhen schoolregisterswill record that suchandsucha
lad possesses 36 British Association unitsof memory power or when we shall be
ableto calculatehow long a mind of 17 "macaulays"will taketo learn Book ii of
Paradise Lost"? If this be visionary, we may atleast hopefor muchof interest and
practical utility in thecomparisonof the varying powersof different minds which
cannow atlast belaid downto scale.(Jacobs,1885, p. 459)
The enthusiasmof themental measurers of the firsthalf of the20th century
reflects the same dream, andeven today,the smile that Jacobs' words might
bring to thefacesof hardened test constructors andusers contains a little of the
old yearning.The urgeto quantify our observationsand toimpose sophisticated
statistical manipulations on them is avery powerful one in thesocial sciences.
38 3. MEASUREMENT
It is of critical importanceto remember that sloppy andshoddy measurement
cannotbe forgiven or forgottenby presenting dazzling tables of figures, clean
and finely-drawngraphs,or by statistical legerdemain.
Yule (1921), reviewing BrownandThomson's bookon mental measurement,
commentson theproblemin remarks thatare notuntypicalof themisgivings
expressed,on occasion,by statisticiansandmathematicians when they seetheir
methodsin action:
Measurement.O dear! Isn'tit almostaninsult to theword to term someof these
numericaldata measurements? They are of thenatureof estimates, most of them,
andoutrageouslybadestimatesoften at that.
And it should alwaysbe the aim of theexperimenternot to revel in statistical
methods (whenhe does revel and notswear)but steadilyto diminish, by continual
improvement of his experimental methods, the necessityfor their use and the
influencethey haveon hisconclusions.(Yule, 1921,pp. 105-106)
The general tenorof this criticismis still valid, but thecombinationof
experimental designandstatistical method introduced a little later by Sir Ronald
Fisherprovidedthe hope,if not thecomplete reality, of statisticalasopposedto
strict experimental control.The modernstatisticalapproach more readily rec-
ognizesthe intrinsic variability in living matter and itsassociated systems.
Furthermore, Yule's remarks were made before it becameclearthat all actsof
observation contain irreducible uncertainty (asnotedin chap.2).
Nearly 40 years after Yule's review, Kendall (1959) gently reminded us of
the importanceof precisionin observationandthat statistical procedures cannot
replaceit. In anacuteandamusing parody, he tells the story of Hiawatha. It is
a tragic tale. Hiawatha,a "mighty hunter,"was anabysmal marksman, although
he didhavethe advantageof having majoredin applied statistics. Partly relying
on his comrades' ignoranceof thesubject,he attemptedto show thathis patently
awful performancein a shooting contest was notsignificantly different from
that of his fellows. Still, theytook awayhis bow andarrows:
In a corner of theforest
Dwells alonemy Hiawatha
Permanentlycogitating
On thenormal law of error.
Wonderingin idle moments
Whether an increasedprecision
Might perhapsberather better
Even at therisk of bias
If therebyone, now andthen,could
Registeruponthe target.
(Kendall, 1959,p. 24)
SOME FUNDAMENTALS 39
SOME FUNDAMENTALS
Measurementis theapplicationof mathematicsto events. We usenumbersto
designateobjectsandeventsand therelationships that obtain between them. On
occasionthe objectsarequite realand therelationships immediately compre-
hensible; dining-room tables, for example, and their dimensions, weights,
surfaceareas,and so on. Atother times,we may bedealing with intangibles
suchasintelligence,or leadership,or self-esteem.In thesecasesour measure-
ments are descriptionsof behavior that,we assume, reflectsthe underlying
construct. But thecritical concernis thehope that measurement will provide
us with preciseandeconomical descriptions of eventsin a manner that is readily
communicatedto others. Whatever one'sview of mathematics with regard to
its complexitiesand difficulty, it is generallyregardedas adiscipline thatis
clear, orderly,and rational. The scientist attemptsto addclarity, order, and
rationality to theworld aboutus byusing measurement.
Measurementhasbeena fundamental feature of human civilizationfrom its
very beginnings. Divisionof labor, trade,andbarterareaspectsof our condition
that separateusfrom the huntersandgathererswho wereour forebears. Trade
andcommerce mean that accounting practices areinstitutedand the"worth" of
a job or anartifact has to belabeledanddescribed. When groups of individuals
agreedthat a sheep couldfetch three decent-sized spears and acouple of
cooking-pots,the species madea quantum leap intoa world of measurement.
Counting, makinga tally, representsthe simplestform of measurement. Simple
thoughit is, it requires that we have devisedan orderly anddeterminate number
system.
Developmentof early societies, likethe development of children, must have
includedthe masteryof signsandsymbolsfor differencesandsameness and,
particularly, for onenessandtwoness. Most primitive languages at least have
wordsfor "one," "two," and"many," andmodern languages, including English,
have extra wordsfor one and two(single, sole, lone, couple, pair, and soon).
Trade,commerce,andtaxation encouraged the developmentof more com-
plex number systems that required more symbols. The simple tallyrecordedby
a mark on aslatemay bemademore comprehensibleby altering the mark at
convenient groupings,for example, at every five units. This system, still
followed by some primitive tribes, aswell as bypsychologists when construct-
ing, by hand, frequency tables from large amountsof data, corresponds with a
readily availableandportable countingaid - the fingers of one hand.
It is likely that the familiar decimal system developed because the human
handstogether have10 digits. However, vigesimal systems, based on 20, are
known, andlanguageonceagain recognizes the utility of 20 with the word score
in Englishand quatre-vingt for 80 in French. Contraryto generalbelief, the
decimal systemis not theeasiestto use arithmetically, and it isunlikely that
40 3. MEASUREMENT
decimal schemeswill replaceall other counting systems. Eggs andcakeswill
continueto besold in dozens rather than tens, and thehours about the clock will
still be 12. This is because12 hasmore integral fractional parts than 10; that
is, you candivide 12 bymore numbers and getwhole numbersand notfractions
(which everyonefinds hard)thanyou can 10. Viewed in this way,the now-aban-
doned British monetary system of 12 penniesto theshilling and 20shillings to
the pound doesnot seemso odd orirrational.
Systemsof number notationand counting havea baseor radix. Base5
(quinary), 10 (decimal), 12 (duodecimal),and 20(vigesimal) have been men-
tioned, but anybaseis theoretically possible.For many scientificpurposes,
binary (base2) is used because this system lies at theheartof theoperationsof
the electronic computer.Its two symbols,0 and 1, canreadily be reproducedin
theoff and on modesof electrical circuitry. Octal (base 8) andhexadecimal (base
16) will also be familiar to computer users.The baseof a number system
correspondsto the number of symbols thatit needsto expressa number,
provided thatthe systemis aplacesystem. A decimal number, say304, means
3 X 100plus 0 X 10 plus 4 X 1.
The symbol for zero signifies an empty place. The inventionof zero, the
earliest undoubted occurrence of which is in India over 1,000 yearsago but
which was independently usedby theMayasof Yucatan, marksan important
step forwardin mathematical notation andarithmetical operation.The ancient
Babylonians,who developeda highly advanced mathematics some4,000years
ago, had asystem witha baseof 60 (abasewith many integral fractions) that
did not havea zero. Their scriptsdid not distinguish between, say, 125 and
7,205, andwhich one ismeantoften has to beinferred from the context. The
absenceof a zeroin Roman numerals may explainwhy Romeis not remembered
for its mathematicians,and therelative sophisticationof Greek mathematics
leads some historians to believe that zeromay have been invented in theGreek
world andthence transmittedto India.
Scalesof Measurement
Using numbersto count events, to order events, and toexpressthe relationship
between events, is theessenceof measurement. These activities have to be
carriedout accordingto some prescribed rule. S. S.Stevens (1951) in his classic
pieceon mathematicsandmeasurement definesthe latter as"the assignment of
numeralsto objectsor events accordingto rules" (p. 1).
This definitionhasbeen criticizedon thereasonable grounds that it apparently
doesnot exclude rulesthat do nothelp us to beinformative,nor rulesthat ensure
that the samenumeralsarealways assignedto thesame events under the same
conditions. Ellis (1968)haspointedout that some such rule as,"Assign the first
numberthat comes into your head to eachof theobjectson thetable in turn,"
SOME FUNDAMENTALS 41
mustbe excluded fromthe definition of measurement if it is to be determinative
andinformative. Moreover, Ellis notes that a rule of measurement must allow
for different numerals,or rangesof numerals, to beassignedto different things,
or to thesame things under different conditions. Rules such as "Assign the
number3 to everything" aredegenerate rules. Measurement must be madeon
a scale,and weonly havea scale whenwe havea nondegenerate, informative,
determinative rule.
For themoment,the historical narrativewill be setasidein orderto delineate
and comment on the matter. Stevens distinguished four kinds of scales of
measurement, and hebelieves that all practical common scales fall into one or
otherof his categories.Thesecategoriesareworth examiningandtheir utility
in the scientific enterprise considered.
The Nominal Scale
The nominal scale,assuch, doesnot measure quantities. It measures identity
and difference. It is often said thatthe first stagein a systematic empirical
scienceis thestageof classification. Likeis grouped with like. Events having
characteristicsin commonareexamined together. The ancient Greeks classified
the constitutionof nature into earth, air, fire, andwater. Animal, vegetable, or
mineral areconvenient groupings. Mendeleev's (1834-1907)periodictableof
the elementsin chemistry,andplant andanimal species classification in biology
(the Systema Naturae of thegreat botanist Carl von Linne, knownasLinnaeus
[1707-1778]),and themany typologies that exist in psychologyarefurther
examples.
Numbers can, of course,be usedto label eventsor categoriesof events. Street
numbers,or house numbers, or numberson football shirts"belong"to particular
events,but thereis, for example,no quantitative significance between player
number10 andplayer number 4, on ahockey team, in arithmetical terms. Player
number10 is not 2.5times player number 4. Such arithmetical rules cannot be
appliedto theclassificatory exercise. However, it is frequentlythe casethat a
tally, a count,will follow the constructionof a taxonomy.
Clearly, classificationsform a large partof thedataof psychology. People
may be labeled Conservative, Liberal, Democrat, Republican, Socialist, and so
on, on thevariableof "political affiliation," or urban, suburban, rural, on the
variableof "location of residence,"and wecould thinkof dozens,if not scores,
of others.
The Ordinal Scale
The essential relationship that characterizes the ordinal scaleis greater than
(symbolized>) or less than (symbolized <). These scales of measurement have
42 3. MEASUREMENT
provedto bevery useful in dealing with psychological variables. It might, for
example,be comparatively easy to state that, according to some specifiedset of
criteria, Bill is more neurotic than Zoe, who ismore neurotic than John, or that
Mary is more musical than Sue, who ismore musical than Jane, and so on, even
though we are not inpossessionof a precise measuring instrument that will
determineby howmuchthe individualsdiffer on ourcriteria. It follows thatthe
numbersthat we assignon theordinal scale represent only an order.
In orderinga groupof 10 individualsaccordingto judgmentsin termsof a
particularset ofcriteria for leadership,we maydesignatethe onewith the most
leadership ability"1" and gothroughto the onewith the least andrank this
person"10," or we maystartwith the onewith leastability andrank him or her
"1," progressingto the onewith most whomwe rank "10." Either methodof
orderingis permissibleandboth give preciselythe samesort of information,
provided thatthe onedoing the ordering adheres to thesystem being used and
communicatesthe rule to others.
The Interval and Ratio Scales
Whenthe gapsbetweenequal points,in thenumericalsense,on themeasure-
ment scalearetruly quantitativelyequal, thenthe scaleis calledan interval scale.
The differencebetween130 cm and 137 cm is exactlythe sameas thedifference
between137 cm and 144 cm. When this sortof scale startsat atrue zero point,
as in thedistanceor length scale just mentioned, Stevens designates them ratio
scales.For example,an individual who is 180 cmtall is twice the heightof the
child of 90 cm,who, in turn, is twice the lengthof thebabyof 45 cm. Butwhen
the scalehas anarbitrary zero point - for example,the Fahrenheit andCelsius
temperature scales - theseratio operationsare notlegitimate. It is neither
meaningful nor correctto saythat a temperatureof 30 isthree timesas hot as a
temperatureof 10. This ratiodoesnot represent three times any temperature-
related characteristic. Perhaps a more readily appreciated example is that of
calendar time. Althoughit is true to saythat 60 yearsis twice 30 years,it is
meaninglessto saythat 2000 A.D.will be twice 1000 A.D.in anydescription
of "age." In 1000 A.D. the famous Parthenon in Athenswas nottwice as old
as it was in 500A.D. Why not? Becauseit was built in about 440 B.C. It
follows from these examples that arithmetical operations on aratio scaleare
conductedon thescale values themselves, but on theinterval scale withits
arbitrary zero such operations areconductedon theinterval values.
The higher-order interval and ratio scales haveall the propertiesof the
lower-order nominal andordinal scalesand may be soapplied. Psychologists,
however, strive where possible for interval measurement, for it has the
appearanceat least of greater precision. This desire sometimes leads to
conceptualdifficulties andstatistical misunderstandings. It is easyto seethat
SOME FUNDAMENTALS 43
the 3 cm bywhich a line A of 12 cmdiffers from a line B of 9 cm is thesame
as the 3 cm bywhich B exceedsthe 6-cm-long lineC. We areagreedon how
we will define a centimeter, and measurementof length in these unitsis
comparatively straightforward.
But what of this statement? "Albert has anIntelligence Quotient (IQ)of
120, Billy one of 110, and Colin of 100." The meaning of the 10-point
differencesbetweentheseindividuals,even whenthe measuring instrument (the
test)usedis specified,is not soeasyto see,nor is it possibleto saywith complete
confidencethat the gap inintelligence between Albert andBilly is preciselythe
same amount as the gapbetweenBilly andColin. Suppose that Question 1 on
our test, say, of numerical ability, asksthe respondentto add 123 to456;
Question2, to divide 432 by 144; andQuestion3 to add thesquareof 123 to
the square rootof 144. Few would argue that these demands exhibit the same
degreeof difficulty, and,by thesame token, few would agreeon theweighting
that mightbe assignedto themarksfor each question. These weightings would,
to some extent, reflect thediffering conceptsof numerical ability. How much
more numerically ableis theindividual who hasgraspedthe conceptof square
root thanthe one whounderstands nothing beyond addition and subtraction?
Although these examples could, quite justifiably, be describedas exaggerated,
they aregiven to showhow difficult, indeed, impossible, it is to constructa true
interval scalefor a test of intelligence.
Leaving asidefor themomentthe fact thatno perfectly reliable instrument
existsfor themeasurement of IQ, wealso know that there is noclear agreement
on thedefinition of IQ. In other words, statements about the magnitudeof the
differencesin length betweenlines A, B, and C, or in IQbetween Albert,Billy,
andColin, dependon theexistenceof a standardunit of measurement and on a
constantunit of measurement, onethat is thesame overall pointson thescale.
In the caseof the measurementof length these conditions exist, but in the
measurement of IQ they do not.
Do our arithmetical manipulations of psychometric test scales,asthough
they were true interval scales, suggest that psychologists arepreparedto ignore
the imperfectionsin their scalesfor the sakeof computational convenience?
Certainly the situation just described emphasizes the responsibilityof the
measurerto report on how thescaleshe or sheemploysareusedand tokeep
their limitations in mind.
Many, perhaps most, discussions of scalesof measurement reveal a serious
misconception that also arises from Stevens' examination. It is that the level of
measurement, that is, the specific measurement scale used in a particular
investigation, governs the applicationof various statistical procedures when the
data cometo beanalyzed. Briefly, the notion is abroad that an interval scaleof
measurementis required for the adhibition of parametric test:the t-ratio,
F-ratio, Pearson's correlation coefficient, and so on, all of which arediscussed
44 3. MEASUREMENT
later in this work. Gaito (1980)is one of anumberof writers who have triedto
exposethe fallacy, noting thatit is basedon aconfusionbetween measurement
theory andstatistical theory. Gaito reiterates some of thecriticisms madeby
Lord (1953), who,in a witty andbrilliant discussion, tells the story of a professor
who computedthe meansandstandard deviations of test scores behind locked
doors because he hadtaughthis students that such ordinal scores should not be
added. Matters came to ahead whenthe professor,in a story that shouldbe read
in the original, and notparaphrased, discovers that even the numberson football
jerseysbehaveas if they were"real," that is, interval scale, numbers. In fact,
"the numbersdon'tremember where they came from" (p. 751).
It is important to realizethat the preceding remarksdo not detract from
Stevens' sterling contribution to thedevelopment of coherent concepts regard-
ing scalesin measurement theory. Thedifficulties arisewhentheseareregarded
asprescriptionsfor thechoiceand use of awide varietyof statistical techniques.
ERROR IN MEASUREMENT
All actsof measurement involve practical difficulties that arelumped together
as thesourceof measurement error. Fromthe statisticalstandpoint, measure-
ment error increasesthe variability in data sets, decreasing the precisionof our
summariesand inferences. It follows that scientists strivefor accuracyby
continuallyrefining measuring techniques andinstruments.The oddfact is that
this strategyproceedseven thoughwe know that absolutely precise measure-
ment is impossible.
It is clear that a meter rule mustbe madefrom rigid materialand that
measuringtapesmustnot beelastic. Without thisthe instruments lack reliability
andself-consistency and, put simply, separateandindependent measurements
of the same event areunlikely to agree with each other. Only very rarely do
paper-and-pencil tests,or two forms of thesametest(say,a test of personality
factors), give exactlythe same results when given to anindividual or groupon
two occasions.In a sense these tests arelike elastic measuring tapes,and the
discrepancy betweenthe outcomesis an index of their reliability. Perfect
reliability is indexed1. Poor reliability wouldbe indexed0.3 orless,andgood
reliability 0.8 or more. In acceptingthis latter figure asgood reliability,the
psychologistis acceptingthe inevitable errorin measurement, being prepared
to do sobecausehe or shefeelsthat a quantitative description better serves to
graspthe concepts under investigation, to communicate ideas about the concepts
moreefficiently, and todescribethe relationships between these concepts and
others more clearly.
Reliability is notjust a function of thequality of theinstrument being used.
In 1796, Maskelyne, Astronomer Royal andDirector of theRoyal Observatory
at Greenwich, near London, England, dismissed Kinnebrook, an assistant,
ERROR IN MEASUREMENT 45
because Kinnebrook's timing of stellar transitsdiffered from his, sometimesby
asmuchas 1sec.
1
The accuracyof such measurements is, of course, crucial to
the calculationof thedistanceof thestar from Earth. Some20 years later this
incident suggestedto Bessel,the astronomerat Konigsberg, that such errors
werethe result of individual differencesin observersand led towork on the
personal equation. The astronomers believed that thesediscrepancies were due
to physiological variabilityandearly experimentson reaction time reflect this
view. But asscientific psychologyhasgrown, the reaction-time variablehas
been usedto study choiceanddiscrimination, predisposition, attitudes, and the
dynamicsof motivation. These events mark the beginningsof thepsychology
of individual differences.For theimmediate argument, however, they illustrate
the importanceof both inter-observer andintra-observer reliability.
The circumstancesof anexercisein measurement must remain constant for
eachseparateobservation.If we were, say, measuring the heightsof individuals
in a group, we would ensure that each of them stoodup straight with heels
together, chinin, chest out, headerect,no slouching,no tiptoeing, and so on.
Quite simply, height has to bedefined,and ourmeasurements aremadein accord
with that definition.
When a concept is definedin termsof theoperations, manipulations, and
measurements that aremadein referringto it, wehavean operational definition.
Intelligencemight be defined as thescoreobtained on aparticular test; a
character trait like generosity might be definedas theproportionof one'sincome
that onegives away. These sorts of definition serveto increasethe precisionof
communication. They most certainly do notimply immediate agreement among
scientists about the natureof the phenomena thus defined. Very few people
would maintain thatthe precedingdefinitionsof intelligenceandgenerosityare
entirely satisfactory.
Operationalismis closely alliedto thephilosophyof the Vienna Circle, a
group, formedin thelate 1920s, that aimed to clarify the logic of thescientific
enterprise.The membersof theVienna Circle, mainly scientists andmathema-
ticians, cameto beknown aslogical positivists.They hopedto rid thelanguage
of scienceof all ambiguityand toconfinethe businessof scienceto thetesting
of hypotheses about the observable world.In this philosophy, meaningis
equated with verifiability, andconstructs that are notaccessibleto observation
aremeaningless. Psychologists will recognize this approach asclosely akinto
the doctrineof behaviorism.The view that theoretical constructs can begrasped
via the measurement operations that describe them is aninteresting one, for it
challengesthe contention that there can be nomeasurement without theory and
that operations cannot be describedin nontheoretical terms.
1
Kinnebrookwasemployedat theobservatoryfrom May 1794to February 1796.For some
more detailsof theincidentandwhat happenedto him, seeRowe (1983).
46 3. MEASUREMENT
The logical positivist viewis obviously attractiveto theworking scientist
becauseit seemsto beeminently"hard-nosed."The doctrineof operationalism
demandsan analysisof the natureof measurement and itscontributionto the
descriptionof facts. To statethat scienceis concernedwith the verification of
facts is to imply that therewill be agreement among observers. It follows that
new methodsand newobservational tools, aswell asobserver disagreement,
will changethe facts. The fact or reality of a tomato is quite differentfor the
artist, the gourmet cook,and thebotanist,and thebotanical view changed
dramatically whenit became possible to view the cellular structureof the fruit
througha high-power microscope.
In the broad view, this argument includes not only the idea of levels of
agreement andverification, but alsothe validity of tools andmeasurements. Do
they, in fact, measure what they were designed to measure?And what is meant
by "designedto measure" must have some basis in a conceptual disposition if
not a full-blown theory.
We must not only consider levelsof measurement, but also the limits of
measurement.In chapter2 it wasnoted that observinga system disturbsthe
system, affects the act of measurement, and thus provokesan irreducible
uncertaintyin all scientific descriptions. This proposition is embodiedin the
Uncertainty Principle of Heisenberg:the neareronetries to obtainan accurate
measurement of eitherthe momentumor thepositionof a subatomic particle,
the less certain becomes the measurement of theother. Ultimatelythe principle
appliesto all actsof measurement.
The world of psychological measurement is beset with system-disturbing
features. Experimenter-participant interactions and thearousingandmotivat-
ing propertiesof the setting,be it thelaboratoryor thenatural environment,
contributeto variancein thedata. These factors have been variously described
as experimentereffect, demand characteristics, and, more generally, as the
social psychologyof the psychological experiment. They areexaminedat length
by RosenthalandRosnow (1969)andMiller (1972).
Despitethe difficulties, logical andpractical,of the procedures, scientists
will continueto measure. The behavioral scientist is trying to pin down, withas
muchrigor aspossible, the variability of thepropertiesof living matter that arise
as aresultof theinteractionof environmental andgenetic factors. As mentioned
in chapter1, Quetelet(1835/1849)described these in termsof what we nowcall
the normal distribution, usingthe analogyof Nature's"errors."
Although QueteletandGaltonandothersmay becriticized for thepromo-
tion of the distributionas a"law of nature,"their work recognizedthe irreducible
variancein living matter. This brings out theessenceof thestatistical approach.
There can be noabsolute accuracyin measurement but only a. judgment of
accuracyin terms of the inherent variationwithin and between individuals.
Statisticsare thetools for assessingthe propertiesof these random fluctuations.
4
The Organization
of Data
THE EARLY INVENTORIES
The countingof peopleandlivestock, hearths andhomes,andgoodsandchattels
is an exercise thathas along past.The ancient civilizationsof Babylon and
Egypt conducted censuses for thepurposesof taxationand theraisingof armies,
perhaps 3,000 years before the birth of Christ,andindeedthe birthplaceof Christ
himself was determined partlyby the fact that Maryand Joseph traveledto
Bethlehemin orderto beregisteredfor a Roman tax. Censuses were carried out
fairly regularlyby theRomans,but after thefall of Rome many centuries passed
before they became part of the routineof government. It can beargued that
these exercises are notreally importantin statistical historybecausethey were
not usedfor the purposesof making comparisons or for drawing inferences in
the modern sense (that is, by usingthe probability calculus), but they areearly
examplesof thedescriptionsof States.
Wheninferences were drawn they were informal.The utility of such descrip-
tions wasrecognizedby Aristotle, who prepared accounts, which were largely
non-numerical,of 158states, listing details of their methodsof administration,
judicial systems, customs, and so on.Although almostall of these descriptions
are lost, they clearly were intended to beusedfor comparative purposes and
compiledaspart of Aristotle's theoryof the State.Such systematic descriptions
became part of our intellectual traditionin Europe, particularlyin the German
States, duringthe 17th and 18th centuries. Staatenkunde, the comparative
descriptionof states, madefor thepurposesof throwing light on their organiza-
tion, their power,andtheir weaknesses, became an important discipline pro-
moted especiallyby Gottfried Achenwall (1719-1772), who wasProfessorat
the University of Gottingen. Hans Anchersen (1700-1765),a Danish historian,
introducedtables intohis work, andalthough these early tables were largely
47
48 4. THE ORGANIZATION OF DATA
non-numerical, theyform a bridge betweenthe early comparative descriptions
and Political Arithmetic, wherewe see thefirst inferences basedon vital
statistics (see Westergaard, 1932, for anaccountof these developments).
Someearly statisticalaccounts were, until quite recently,thought to have
been made solely for thepurposesof taxation. Notable among these is Domes-
day Book, orderedin 1085by William the Conqueror,"to placethe government
of the conqueredon awritten basis" (Finn, 1973,p. 1).This massive survey,
which was completedin less than2 years, is not just a tax roll, but an
administrative record made to assist in government. Moreover, Galbraith
(1961) notes that the Domesday Inquest made no effort to preservethe over-
whelmingmassof original returns: "Instead, the practical geniusof theNorman
king preservedin a 'fair copy' what waslittle more thanan abstractof the total
returns" (Galbraith, 1961, p. 2), sothat Domesday Book is perhapsone of the
earliest examples of a summaryof a large amount of data.
POLITICAL ARITHMETIC
The studentin statisticswho developsan interest in their history will be
astonishedif he or shevisits a good libraryandsearchesout titles on thehistory
of statisticsor old bookswith the word statisticsor statistical in thetitle. Very
soon,for example,TheHistory of Statistics,publishedin 1918to commemorate
the 75th anniversaryof TheAmerican Statistical Association, and editedby
Koren, will be discovered. But thereare nomeansor variancesor correlation
coefficientsto befound here. Thisvolumecontains memoirs from contributors
from 15 countries detailingthe developmentof thecollection andcollation of
economicand vital statistics,for all kinds of motivesand usingall kinds of
methods, withinthe various jurisdictions. Such works are notunusual, andthey
relateto avariety of endeavors concerned with the use ofnumerical data both
for comparativeand inferential purposes. They reflect the layman's viewof
statisticsin thepresent day.
Political arithmetic may bedated from the publication by John Graunt
(1620-1674),in 1662,of Natural and Political Observations Mentioned in a
Following Indexand Made Upon the Bills of Mortality, althoughthe name
political arithmetic was apparentlythe inventionof Graunt'sfriend William
Petty (1623-1687).It hasbeen reported that Petty was infact the authorof the
Observations,but this is not so. Petty's Political Arithmetickwas written in
about 1672but was notpublisheduntil 1690. Petty's workwas ofinterestto
governmentandsurelyhadruffled some feathers.Petty'sson, dedicatinghis
father'sbook to theking, writes:
He wasallowedby all, to be theInventorof this Methodof Instruction; wherethe
perplexedandintricate waysof theWorld are explain'd by a very mean peiceof
POLITICAL ARITHMETIC 49
Science;and had not theDoctrins of this EssayoffendedFrance,they hadlong since
seenthe light, and hadfound Followers,aswell asimprovements before this time,
to theadvantageperhapsof Mankind. (Lord Shelborne, Dedication, in Petty, 1690)
That the work had theimprimatur of authority is given in its frontispiece:
Let this Book called PoliticalArithmetick,which waslong sinceWrit by Sir William
Petty deceased, be Printed.
Given at theCourt at Whitehall the 7th dayof Novemb. 1690.
Nottingham.
It wasKarl Pearson's (1978) view, however, that Petty owed more to Graunt
than Grauntto Petty. Grauntwas awell-to-do haberdasher who wasinfluential
enoughto beableto securea professorshipat Gresham College, London, for
Petty, a man who had avariety of postsandcareers. Graunt was acultured,
self-educatedman and was friends with scientists, artists, andbusinessmen. The
publicationof theObservationsled toCharlesII himself supportinghis election
to theRoyal Societyin 1662. Graunt's workwas the firstattemptto interpret
social behaviorandbiological trendsfrom countsof therather crude figures of
births anddeaths reportedin Londonfrom 1604to 1661- theBills of Mortality.
A continuing themein Graunt's workis theregularityof statistical summaries.
He apparently acceptedimplicitly the notion that whena statistical ratiowas
not maintained then some newinfluencemustbe present, that some additional
factor is thereto bediscovered.For example,he tried to show thatthe number
of deathsfrom the plaguehad been under-reported
1
and by examiningthe
number of christeningsreasonedthat the decreasein populationof London
becauseof theplague wouldbe recoveredin 2 years.He attemptedto estimate
the populationdistribution by sex and age and constructeda mortality table.
Essentially, Graunt's work, andthat of Petty, emphasizes the use of mathematics
to impose orderon data and,by extension,on societyitself.
Pettywas afounding memberof the Royal Society, whichwasgrantedits
charter in 1662. Althoughhe hadbeen a supporterof Cromwell and the
Commonwealth,he wasknightedin 1661by CharlesII. He was not somuch
concerned withthe vital statistics studiedby Graunt. Rather, in his early work
he suggested ways in which the expensesof thestate couldbe curtailedand the
1
The weekly accounting of burialswasbegunto allay public fearsaboutthe plague. However,
in epidemic years when anxiety was at itshighest,it appears that casesof the plague were concealed
becausemembersof families with the illness were kept together, both the sick and thewell.
50 4. THE ORGANIZATION OF DATA
revenuefrom taxes increased, and acontinuing themein his work is the
estimationof wealth.
His Political Arithmetick (Petty, 1690)comparesEngland, France,and
Holland in termsof territory, trade, shipping, housing, and so on. Hisinterest
was inpractical political mattersandmoney, topics that reflected his profes-
sional careeraswell as hisphilosophy.
In the context of the times, Grauntand Petty's workmay beseenas a
demonstrationof their belief thatquantificationprovided knowledge that was
free from controversyandconflict. Buck (1977) discusses theseefforts in the
political climateof thetimes. The 17th century was, to say theleast,a difficult
time for thepeopleof England.TheCivil War, the executionof theKing, the
turmoil of theCommonwealth, religious conflict, the Restoration,the weakness
of the monarchsof theStuart line,the Glorious Revolution that swept James II
from the throne- all contributedto civil strife andpolitical unease.And yet the
post-Restoration years saw thefoundingof theRoyal Society,the establishment
of the Royal Observatoryat Greenwich,and thesettingof the stagefor the
triumphs of Newtonian sciencein the 18th century. It was also the era of
Restoration literature. Both John Evelyn (the diaristwho gavethe Societyits
nameand itsmotto) and thepoet Dryden were members of theRoyal Society.
Buck (1977) andothers have argued that the philosophyunderlyingthe contri-
butionsof PettyandGrauntdiffers from that of the 18th centuryin that it does
not acceptthe existenceof natural orderin society. Indeedit could not, because
the political mechanisms necessary for the maintenanceof a relatively stable
social systemhad notbeen established.
The practicaleffectsof thework thatwasbegunby Pettyhas itscounterpart
in the present-day agencies of the state that collect datafor all mannerof
purposes. In this accountof thedevelopment of statistics, however, this path
will beleft for now, in orderto returnto theenterprise that sprang from Graunt's
mortality tables. Actuarial science hasmoreto dowith our perspective because
it involvesthe early studyof probability andrisk and the use of these mathe-
maticsfor inference.
VITAL STATISTICS
Actuarial sciencehasbeendefinedas theapplicationof probability to insurance,
particularlyto life insurance.But to apply the rulesof probability theory there
haveto bedata,andthese dataareprovidedby mortality tables.Life expectancy
tablesgo backas far as the 4th century, when they were used under Roman law
to estimatethe valueof annuities,but themajor impetusto the systematic
examinationof themathematicsof tontinesandannuitiesis generally placedin
17th century Holland. JohnDe Witt (1625-1672),the Grand Pensionary of
Holland andWest Friesland, applied probability principles to thecalculationof
VITAL STATISTICS 51
annuities.
2
Dawson (1901/1914) statesthat De Witt must be consideredto be
the founderof actuarial science, although his work was notrediscovered until
1852. More prominent is the work of De Witt's contemporaries, Christiaan
Huygens(1629-1695)andJohn Hudde(1628-1704). Graunt's work influenced
that of Huygensand hisyounger brother,who debatedthe relative meritsof
estimatingthe mean durationof life and theprobability of survival or deathat
a given age.
Theseattemptsto bring rationalityandorder intothe general areaof life
insurancewasaided considerably by thework of Edmund Halley(1656-1742),
the English astronomer andmathematicianandpatronof Newton. He published
An Estimateof the Mortality of Mankind, Drawn From Curious Tables of the
Births and Funerals at theCity of Breslaw; With an Attempt to Ascertain the
Price of Annuitieson Lives,in Philosophical Transactionsin 1693. He was a
brilliant man, and hisclear expositionof the problemsis remarkableon two
counts. Thefirst is that it wasproducedin orderto fulfill a promiseto theRoyal
Society to contribute material for the Transactions, whichhadresumed publi-
cation after a break of several years,and was atopic that was out of the
mainstreamof his work. The secondis that the earlylife offices in Englanddid
not use thematerial,the common opinion being that insurance was largely a
gameof chance. The general realization that an understandingof probabilities
helpedin theactual determinationof gainsandlossesin both gamesof chance
andlife insurancewas not tocomeuntil almostthe middle of the 18th century.
The necessityof reliablesystemsfor the determinationof premiumsfor all kinds
of insurance became more andmore important in therise of 18th century trade
and commerce.
This accountof thebeginningsof actuarial scienceis by nomeans complete,
but the brief description highlightsthe utility of data organizationand prob-
ability for themakingof inferencesand thebusinessof practical prediction.As
Karl Pearson(1978)notedin his lectures, there were two main linesof descent
from Graunt;the probability-mathematicians andactuaries,and the18th century
political arithmeticians.Of the probability-mathematicians perhaps the most
entertainingandenterprisingwas aphysician, John Arbuthnot (1667-1735).He
wasable,he wasintellectual,he was awit, and he can be credited withthe first
use of anabstract mathematical proposition, the binomial theorem, to testthe
probability of an observed distribution, namely, the proportion of male and
femalebirths. In fact, he appearsto have beenthefirst personto assessobserved
data,chance,andalternative hypotheses using a statistical test. Arbuthnot had,
in 1692, publisheda translationof Huygens workon probability with additions
of his own. In this book he observes that probability may be applied to
Tontines were once a popular formof annuity.Subscribers paid into a joint fund, and theincome
they received increasedfor thesurvivorsas themembers died off.
2
52 4. THE ORGANIZATION OF DATA
Graunt's suggestionin the Observations that the greater number of male than
female births was amatter of divine providenceand not chance. In An
Argument for Divine Providence,Taken From the Constant Regularity
Observ'd in theBirths of Both Sexes, published in 1710, Arbuthnott (sometimes,
as he didhere,he spelledhis namewith two t's) usesthe binomial expansionto
argue thatthe observed distributionof maleandfemale christenings (which he
equates with births) departs so muchfrom equalityas to beimpossible, "from
whenceit follows thatit is Art, not Chance, that governs" (p. 189). An excess
of malesis preventedby:
the wise Oeconomyof Nature;and tojudgeof thewisdom of theContrivance,we
must observethat theexternal Accidentsto which Malesaresubject (who must seek
their Foodwith danger)do makea greathavockof them, andthat thislossexceeds
far that of the other Sex, occasioned by theDiseasesincident to it, as Experience
convincesus. Torepairthat Loss, provident Nature,by theDisposalof its wise
Creator,brings forth more Males than Females; andthat in almost constant propor-
tion. (Arbuthnott, 1710, p. 188)
The resulting near-equal proportions of thesexes ensures that every male
may havea femaleof suitable age, andArbuthnotconcludes, as didGraunt, that:
Polygamy is contraryto the Law ofNatureandJustice,and to thePropagationof the
HumanRace;for where MalesandFemalesare inequal number, if one Mantakes
Twenty Wives,NineteenMen must livein Celibacy, whichis repugnantto theDesign
of Nature; nor is it probable that Twenty Women will be sowell impregnatedby one
Man as byTwenty.(Arbuthnott, 1710, p. 189)
which are asingenious statements of what we nowcall alternative hypotheses
as onecouldfind anywhere.
The most outstandingof the probability-mathematicians of the era was
AbrahamDe Moivre (1667-1754).He wasborn at Vitry in Champagne,a
Protestantand the son of a poor surgeon. Froman early age, although an able
scholar of thehumanities,he wasinterestedin mathematics. After the repeal
of the Edict of Nantesin 1685,he wasinternedand wasfaced withthe choice
of renouncinghis religion or going into exile. He chosethe latter and in1688
went to England. By chancehe met Isaac Newtonand read the famous
Principia, whichhadbeen published in 1687.He masteredthe newinfinitesimal
calculus and passed intothe circles of the great mathematicians, Bernoulli,
Halley, Leibnitz, andNewton. Despitethe efforts of his friends,he wasunable
to obtain a secure academic position andsupported himself as a peripatetic
teacherof mathematics, "and later in life sitting daily in Slaughter'sCoffee
Housein Long Acre, at thebeck andcall of gamblers,who paid him asmall
sumfor calculating odds, and ofunderwritersandannuity brokerswho wished
their values reckoned" (K.Pearson, 1978, p. 143).
53 GRAPHICAL METHODS
De Moivre first publishedhis treatiseon annuitiesin 1725. Essentiallyand
without going intothe details,andsomeof thedefects,of his procedures,De
Moivre usedthe summationof seriesto compute compound interests and
annuityvalues.The crucial factor was inapplyingthe relatively straightforward
mathematicsto a mortality table. He examined Halley's Breslau table and
concludedthat the numberof individualswho survivedto agiven agefrom a
total startingout at anearlieragecould be expressedasdecreasing terms in an
arithmeticseries. In short, De Moivre combined probabilities and theinterest
factor to compute annuity values in a manner that pointed the way tomodern
actuarial science. Thomas Simpson (1710-1761)hasbeen describedas De
Moivre's younger rival. He is certainly rememberedin the history of life
insuranceas aninnovator. He appearsinitially to have arguedwith De Moivre
that mortality tables shouldbe usedas they werefound and not fitted to
mathematical rules. Simpsonis dismissedby Karl Pearson (1978) as aplagiarist
who "set out toboil down De Moivre's workandsell the resultat alower price"
(p. 176), whereas Dawson (1901/1914) describes his book as "anattemptto
popularizethe science, never a popular movement among those who hopeto
profit by keepingit exclusive"(p. 102).
De Moivre wasundoubtedly upset by Simpson's reproduction of his ideas,
but Pearson'sview of Simpson's character is not sharedby many historiansof
mathematics,who have describedhim as aself-taught genius.
The applicationof mathematical rulesto tablesof mortality are theearliest
examplesof the use oftheoretical abstractions for practical purposes. They
are attemptsto organizeand make senseof data. The developmentof, and
rationalesbehind, modem statistical summaries are ofimmediate interest.
GRAPHICAL METHODS
As knowledge increases amongst mankind andtransactions multiply, it becomes
more and more desirableto abbreviate and facilitate the modesof conveying
information from onepersonto anotherandfrom oneindividual to themany.
I confessI waslong anxiousto find out whetherI wasactuallythe first whoapplied
the principlesof geometryto mattersof financeas it hadlong before been applied
to chronology withgreatsuccess.I am nowsatisfied upondue inquiry, that I was
the first:
As the eye is thebestjudge of proportion being ableto estimateit with more
accuracy thanany other of our organs,it follows, that wherever relative quantities
arein question,a gradual increaseor decreaseof any ... valueis to bestated,this
modeof representingit is peculiarly applicable.
That I havesucceededin proposingandputting in practicea new anduseful mode
of statingaccounts... and asmuch informationmay beobtainedin five minutesas
would require whole days to imprint on thememory,in a lasting manner, by atable
of figures. (Playfair,180 la, pp. ix-x)
54 4. THE ORGANIZATION OF DATA
These quotations aretakenfrom Play fair's Commercial and Political Atlas,
the third editionof which waspublishedin 1801. They providean unequivocal
statementof his view, and theview of many historians, that he was theinventor
of statistical graphical methods, methods that he termed lineal arithmetic. Also
in 1801,Play fair's Statistical Breviary; Shewing, on aPrinciple Entirely New,
the Resourcesof Every Stateand Kingdomin Europe, appeared. We see,in
handsome hand-colored, copper-plateengravings, lineand bargraphsin the
Atlas, andthese together with circle graphs and piechartsin theBreviary. They
illustrate revenues, importsand exports, population distributions, taxes, and
other economicdata,andthey areaccompaniedby constant reminders of the
utility of thegraphical methodfor theexaminationof fiscal andtrade matters.
William Play fair (1759-1823)was theyoungerbrotherof John Playfair, the
mathematician.He had avariety of occupations, beginning as anapprentice
mechanicalengineeranddraftsmanandlater as awriter andratherunsuccessful
businessman.In his examinationof the history of graphical methods, Funk-
houser (1937) notes that the French translationof the Atlas was very well
receivedon thecontinent;in fact, much more attentionwaspaid to it there than
in England.He also suggests that this might accountfor the greater interest
shownin graphical workby theFrench overthe next century.
As a methodof summarizingdata,we must givefull credit to Playfair for
the developmentof thegraphical method, but thereareexamplesof the use of
graphs that predate his work by many centuries. Funkhouser (1937) mentions
the use ofcoordinatesby Egyptian surveyors andthat latitudesandlongitudes
were usedby the geographersand cartographersof ancient Greece. Nicole
Oresme(c. 1323-1382)developedthe essentialsof analytic geometry, which is
ReneDescartes' (1596-1650)greatest contribution to mathematics, although it
is of course true that graphs could have been used, and infact occasionally were
used, withoutany formal knowledgeof thefact that the curveof a graphcan be
representedby amathematical function. Funkhouser(1936) reportson a10th-
centurygraphof planetary orbitsas afunction of time, andover the years there
havebeen other examples of themethod, perhaps the most obviousof which is
musical notation.
Despite Playfair's ingenuity and hisattemptsat thepromotionof the tech-
nique,the use ofgraphswas surprisingly slowto develop. We find Jevonsin
1874describingthe productionof "best fit" curvesby eye andsuggestingthe
procuring or preparationof "paper divided into equal rectangular spaces, a
convenientsizefor thespaces being one-tenth of aninch square"(p. 493), and
not until its l l th edition, publishedin 1910, did Encyclopaedia Britannica
devotean entry to graphical methods.Funkhouser(1937) suggests, and the
suggestionis eminently plausible, thatthe collectionof public statisticswas
affectedby anongoing controversyas towhether verbal descriptions of political
andeconomic states were superior to numericaltables. If statisticaltables were
GRAPHICAL METHODS 55
regarded with suspicion, then graphs would have been greeted with even more
opprobrium.In FranceandGermanythe use ofgraphswasopenly criticizedby
some statisticians.
In the social sciencesthe personwho most helpedto promotethe use of the
graph as atool in statistical analysiswas Quetelet,who hasalready been
mentioned.It hasbeen noted that the useof the Normal Curvefor thedescription
of humancharacteristics begins with his work. The lawof error and Quetelet's
adaptationof it is of great importance.
5
Probability
A proposition that wouldfind favor with most of us isthat the businessof life
is concernedwith attemptsto domoreor lessthe right thing at moreor lessthe
right time. It follows thata great deal of our thinking, in thesenseof deliberating,
aboutour conditionis boundup with the problemof makingdecisionsin the
faceof uncertainty.A traditional viewof sciencein general,on theother hand,
is that it searchesfor conclusions thatare to betaken to be true, that the
conclusionsshould represent an absolute certainty. If one of itsconclusionsfails,
then the failure is put down to ignorance,or poor methodology,or primitive
techniques- thetruth is still thoughtto be outthere somewhere. In chapter2,
someof the difficulties that canarise in trying to sustain this positionwere
discussed. Probability theory provides us with a meansof reaching answers and
conclusions when, as is thecasein anythingbut themost trivial of circum-
stances,the evidenceis incompleteor, indeed,can neverbe complete. Put
simply, whatis thebestbet, whatare theoddson ourwinning, and howconfident
canwe bethat we areright? The derivationof theoriesandsystems that may
be applied to assist us in answering these questions continues to present
intellectual challenges.
THE EARLY BEGINNINGS
The early beginningsarevery early indeed. David (1962) tells us that many
archeological digs have produced quantities of astragali, animal heel bones, with
the four flatter sides markedin various ways. These bones may have been used
as counting aids,as children's toys,or as theforerunnersof dice in ancient
games.The astragaluswascertainly usedin board gamesin Egypt about 3,500
years beforethe birth of Christ. David suggests that gaming may have been
developedfrom game-playingandreports that it is saidto have been introduced
56
THE BEGINNINGS 57
in Greecejust 300 years before Christ. Shealso suggests that gaming may have
emergedfrom the wager,and thewager from divination and theinterrogation
of oracles with originsin religiousritual. We know that gamingwas acommon
pastimein both GreeceandRomefrom at least100 years before Christ. David
notes that Claudius (10 B.C.- 54 A.D.) wrote a book on how to win atdice,
which, regrettably, has notsurvived. David also commentson a remark in
Cicero'sDe Divinatione,Book I:
When the four dice producethe venus-throw [the outcome when four dice, or
astragali,aretossedand thefaces shownare alldifferent] you maytalk of accident:
but supposeyou madea hundredcastsand thevenus-throw appeared a hundred
times; couldyou call that accidental? (David, 1962, p. 24)
This may be one of theearliest statements of thecircumstances in which we
might acceptor rejectthe null hypothesis. Ciceroquite clearly hadsome grasp
of the conceptsof randomnessandchanceandthat rare events do occurin the
long run. A variety of explanationsmay beadvancedfor thefact that these early
insightsdid notleadto amathematical calculus of probabilities.Greek philoso-
phy, in theworks of PlatoandAristotle, searchedfor orderandregularityin the
phenomenaof the universe,and later, as Christianity spread, the Church
promotedthe ideaof anomnipotentGod who wasunceasingly awareof, and
responsible for, the slightest perturbation in naturalaffairs. Human beings were
doomedto beignorantof the successionof natural eventsandcould achieve
salvation onlyby submissionto thewill of an Almighty. A more mundane
explanationmay beprovidedby thefact that early arithmetic,at anyratein what
we would now call Westernculture,washamperedby theabsenceof a logical
and efficient systemof number notation. Whatever the reasons,andmany
others have been suggested (see, e.g., Acree, 1978; Hacking, 1975), we know
that about 1,600 years elapsed before a foundationfor probability theorywas
laid.
THE BEGINNINGS
The readeris referredto David's interestingbook (1962)for a review of the
contributionsthat emergedin Europeduring the first millennium andbeyond.
In the 17th century, Gerolamo Cardano's book Liber de Ludo Aleae [The Book
on Gamesof Chance] waspublished(in 1663, some87 yearsafter the deathof
its author). A translationof this work by S. H. Gould is to befound in Ore
(1953). The treatiseis full of practical adviceon both oddsand personality,
noting that personswho arerenownedfor wisdom,or old anddignified by civil
honoror priesthood, should not play, but that for boys, young men, andsoldiers,
it is lessof a reproach.
Cardarno also cautions that doctors andlawyers playat adisadvantage, for
58 5. PROBABILITY
"if they win, they seemto begamblers,and if they lose,perhaps theymay be
takento be asunskilful in their own art as ingaming. Men of these professions
incur the same judgmentif they wishto practice music" (Ore, 1953, pp.
187-188).
It cannot be claimed that Cardano produced a theory with this book.
However,it can bestated that he understoodthe importanceof therelationship
between theoretical reasoning andeventsin the "real world" of gaming.
The birth of themathematical theoryof probability, thatis, "when the infant
really seesthe light of day" (Weaver, 1963/1977, p. 31), took placein 1654.In
that year, Antoine Gombauld, Chevalier de Mere (1607-1684),posedthe
questionsto Blaise Pascal (1623-1662)that initiatedthe endeavor. The two may
havemet in thesummerof 1652andsoon become friends, although there is
some doubt about both the time and thecircumstances of their meeting. David,
in her book, deals rather harshly with de Mere. Shedismisseshim ashaving"a
second-rate intelligence" andindicates thathis writings do notshow evidence
of mathematical ability, "being literary pieces of not very high calibre" (David,
1962,p. 85). De Mere was not amathematician, although there is nodoubt that
he held a good opinionof his abilities in a numberof fields. What he appears
to have beenwasFrance's leading arbiter of good tasteandmanners, a man who
studiedthe waysof polite societywith careanddedication.
1
His writings were
soughtafter,and hisopinions respected, by thosewho wishedto maintain good
standingin thedrawing roomsof Paris (seeSt. Cyres, 1909, pp. 136-158,for
an accountof deMere). Pascal appears to have beena willing pupil, although
towardthe end of1654he underwenthis second conversion to thereligious life.
The notion that, astonishingly, the calculusof probabilitieswas theoffspring of
a dilettanteand anascetic, which Poisson's remark (footnoted in chap. 1) has
been takento imply, is a notquite accuratereflection.
Every accountof the history of the concept of probability showshow
importantthe searchfor waysof assessingthe oddsin various gaming situations
hasbeenin shapingits development. Thereseemsto belittle doubt but that
Pascalhadspent sessions at thegaming tables, and itseems unlikely that various
situationshad notbeen discussed with de Mere. What movedthe matter into
the realmof mathematicaldiscoursewas thecorrespondence betweenPascal
and hisfellow mathematician Pierre Fermat, sparked by specific questions that
hadbeen raisedby deMere.
An old gamblinggameinvolvesthe "house"offering a gambler even money
that he or shewill throw at least 1 six in 4throwsof a die. In fact, the oddsare
slightly favorableto thehouse. De Mere's problem concerned the questionof
the oddsin a situation wherethe bet wasthat a player would throwat least1
1
That David's judgment on themattermay be inerror is likely. Michea (1938) maintains that
Descartes' bon sensandPascal'scoeurcan beequated withde Mere'sbon gout,and no onecould
find that DescartesandPascalhadsecond-rate intelligences.
THE BEGINNINGS 59
doublesix in 24throwsof 2 dice. Herethe odds are,in fact, slightly against
the house, even though 24 is to 36 as 4 is to 6. De Mere hadsolved this problem
but usedthe outcometo argueto Pascal thatit showed that arithmeticwas
self-contradictory!
2
A second problem, which de Mere wasunableto solve,
was the"problemof points,"which concerns,for example,how thestakesor
the "pot" shouldbe divided amongthe players, accordingto thecurrent scores
of the participants,if a gameis interrupted. Thisis analtogether moredifficult,
questionand theexchanges between Pascal andFermat that took place over the
summerof 1654 devote much space to it. Theletters
3
showthe beginningof
the applicationsof combinatorial algebraand of thearithmetic triangle.
In 1655, Christiaan Huygens (1629-1695),a Dutch mathematician, visited
Paris and heardof the Pascal-Fermat correspondence. There was enormous
interest in themathematicsof chancebut aseeming reluctance to announce
discoveriesand reveal the answersto questions under discussion, perhaps
becauseof thematerial valueof thefindings to gamblers. Huygens returned to
Holland and worked out the answersfor himself. His short accountDe
Ratiociniis in Ludo Aleae [Calculating in Gamesof Chance]waspublishedin
1657by Fransvan Schooten(1615-1660),who wasProfessorof mathematics
at Leyden. This book, the first on theprobability calculus,was animportant
influenceon James Bernoulli and DeMoivre.
James (also referred to asJacquesor Jakob) Bernoulli (1654-1705)was one
of an extendedfamily of mathematicians who made many valuable contribu-
tions. It is his ArsConjectandi [The Art of Conjecture},publishedin 1713after
editing by his nephew Nicholas, that is of most interest.The first part of the
book is areproductionof Huygens' workwith a commentary.The secondand
third parts deal withthe mathematicsof permutationsandcombinationsand its
applicationto gamesof chance.In the fourth part of the book we find the
theorem thatBernoulli calledhis "golden theorem,"the theorem named "The
Law of Large Numbers"by Poissonin 1837.In anexcellentandmost readable
commentary, Newman (1956) declares the theoremto be "ofcardinal signifi-
canceto the theory of probability," for it "was the first attempt to deduce
statistical measures from individual probabilities" (vol.3, p. 1448). Hacking
(1971) hasexaminedthe whole workin some considerable depth.
Bernoulli likely did not anticipateall of the debatesand confusionsand
2
Theprobability of not gettinga six in asinglethrow of a die is 5/6 and the probability of no
sixes in 4 throwsis (5/6)
4
. Theprobability of at least 1 six in 4throws is 1 - (5/6)
4
, which is
.51775, whichis favorableto thehouse. Theprobability of not getting 2 sixesin a single throw
of 2 dice is 35/36 and theprobability of theevent of at leastoncegetting 2 sixesin 24 throws of
2 dice is 1- (35/36)
24
, which is .49140, whichis slightly unfavorableto thehouse.
3
Translationsof thePascal-Fermat correspondence are to befound in David's (1962) bookand in
Smith (1929).
60 5. PROBABILITY
misconceptions, both popular andacademic, that his theorem would generate,
but that it was asourceof intellectual puzzlement may betakenfrom the fact
that its author states that he meditatedon it for 20years.In its simplestform,
the theorem states that if an eventhas aprobability p, of occurringon asingle
trial, and ntrials are to bemade, thentheproportion of occurrencesof theevent
over the ntrials is alsop.. As n increases, the probability thatthe proportionwill
differ from p by less thana given amount, no matterhow small, also increases.
It also becomes increasingly unlikely that the numberof occurrencesof theevent
will differ from pn byless thana fixed amount, however large. In a seriesof n
tossesof a "fair" coin the probability of heads occurring close to 50% of the
time (0.5 occurrences) increases as nincreases,andthereis amuch greater
probability of a differenceof, say, 10 headsfrom what wouldbe expectedin
1,000tossesof thecoin thanin 100tossesof thecoin. Kneale (1949) notes that
it is often supposed that the theoremis:
A mysteriouslaw of nature which guarantees that in a sufficiently large numberof
trials a probability will be "realizedas afrequency."
A misunderstandingof Bernoulli's theoremis responsiblefor one of thecommon-
est fallacies in the estimation of probabilities,the fallacy of the maturity of the
chances.Whena coin hascomedown headstwice in succession, gamblerssome-
times saythat it is more likelyto come down tails next time because"by the law of
averages"(whatever that may mean)the proportionof tails mustbe broughtto right
sometime. (Kneale,1949, pp. 139-140)
The fact is, of course, that the theoremis nomoreand noless thana formal
mathematical proposition and not astatement about realities and, asNewman
(1956)puts it," neither capable of validating'facts' nor of being invalidatedby
them" (vol. 3, p. 1450). Now this statement accurately represents the situation,
but it mustnot bepassedby without comment.If reality departs severely from
the law, then clearlywe do notdoubt reality,and nor do wedoubtthe law. What
we might do, however,is to doubt someof our initial assumptions about the
observed situation. If red wereto turn up 100timesin a row at theroulette table,
it might be prudentto bet on red on the 101 st spin because this sequence departs
so muchfrom expectation that we might conclude that the wheelis "stuck" on
red. In fact, we might usethis observationassupportfor anassertion that the
wheelis not fair. This example moves us awayfrom probability theoryandinto
the realmof practical statistical inference, an inference about reality that is based
on probabilities indicatedby anabstract mathematical proposition.
THE MEANING OF PROBABILITY
A problemhas been raised.The difficulty inherentin any attempt to apply
Bernoulli'stheoremis that we assume that we know the prior probability^ of
THE MEANING OF PROBABILITY 61
a given eventon asingle trial- that, for example,we know thatthe value AFP
for the appearanceof heads whena coin is tossedis If we doknow p, then
predictionscan bemade, and, givena seriesof observations, inferences may
be drawn from them. The problemis to statethe groundsfor settingp at a
particular value,and this raisesthe questionof what it is that we meanby
probability. The issuewasalways there,but it was notreally confronteduntil
the 19th century.
The title of this sectionis takenfrom ErnestNagel'spaper presented at the
annualmeetingof theAmerican Statistical Association in 1935andprintedin
its journal in 1936.Nagel'sclearandpenetrating account does not, as headmits,
resolve the difficult problems raisedin attemptsto examinethe concept,
problems thatare still the sourceof sometimes quite heated debate among
mathematiciansandphilosophers. What it doesis presentthe alternative views
andcommenton theconclusions that adoption of one orotherof thepositions
entails.
4
Nagel first examinesthe methodological principles that can bebrought
to bear on the question; thenhe delineatesthe contextsin which ideasof
probability may befound; and,finally, he examinesthe three interpretations of
probability that areextant. What follows draws heavily on Nagel'sreview.
Appealsto probabilitiesarefound in everyday discussion, applied statistics
andmeasurement, physical andbiological theories, the comparisonof compet-
ing theories,and in themathematical probability calculus. The three important
interpretationsof the concept are theclassical;the notion of probability as
"rational belief; and thelong-run relativefrequencydefinition.
Laplace(1749-1827)and DeMorgan (1806-1871)were the chief expo-
nentsof theclassical view that regards probability as, in DeMorgan'swords,
"a stateof mind with respect to anassertion"(De Morgan, 1838).The strength
of our belief aboutany given propositionis its probability. Thisis what is meant
in everyday discourse when we say(though perhaps not every day!) that "It is
probable that gaming emerged from game-playing." In evaluating competing
theories this viewis also evident,aswhenit is saidby some,for example, that,
"Biologically based trait theories of personalityaremore probably correct than
social learningtheories."This interpretationis not what is meant when statis-
tical assertions about the weather,the probability of precipitation,for example,
or theoutcomeof anatomic disintegrationaremade.
JohnMaynard Keynes (later Lord Keynes, 1883-1946)interpreted prob-
ability asrational belief, and it hasbeenassertedthat thisis themost appropriate
view for most applicationsof theterm. Evidence about the moral characterof
witnesses would lead to statements about the probability of the truth of one
person'stestimony rather than another's or, in examiningthe resultsof evidence
that attemptedto show,for example, that a social learning theory interpretation
Acree(1978) givesanextensive review that, for thepresent writer, is sometimesdifficult to follow.
62 5. PROBABILITY
of achievement motivationis more probable thanone basedon trait theory.
These statements rest on rational insights intothe relationship between evidence
and conclusion,not on statistical frequencies, because we do nothave such
information. It is Nagel's view,and theview of others, that boththe classical
and the Keynesian views violate animportant methodological principle - that
of verifiability:
On Keynes'view a degreeof probability is assignableto asingle proposition with
respectto given evidence.But what verifiableconsequences can bedrawnfrom the
statement that withrespectto theevidencethe proposition thaton thenext throw
with a given pairof dice a 7will appear,has aprobability of 1/6? For on theview
that it is significant to predicatea probability to thesingleinstance thereis nothing
to beverified or refuted. (Nagel, 1936, p. 18)
The conclusionis that the views we have described "cannot be regardedas
a satisfactory analysis of probability propositionsin thesciences which claim
to abideby this canon" (Nagel, 1936, p. 18).
The third interpretationof probability is thestatistical notionof long-run
relative frequency. Bolzano (1781-1841), Venn (1834-1923), Cournot
(1801-1877),Peirce(1839-1914),and, more recently, von Mises(1883-1953)
and Fisher (1890-1962)are amongthe scientists and mathematicianswho
supported this view.In a simple example, if you aretold that your probability
of survival andrecovery aftera particular surgical procedure is 80%, then this
means thatof 100patientswho have undergone this operation, 80 have recov-
ered.
The majority of working scientistsin psychologywho rely on statistical
manipulationsin the examinationof dataimplicitly or explicitly subscribeto
this meaning. Whether or not it isappropriateto theendeavorandwhetheror
not theconsequences of its acceptancearefully appreciatedis aquestionto be
considered.It is important to note, andthis is notmeantto befacetious, that
following the operationyou areeither aliveor you aredead. In other words,
the probability statementis about a relationship between events, or rather
propositions concerning the occurrenceof events,not abouta single event.In
this respectthe frequentists haveno quarrelwith the Keynesians.
The most common logical objection (in fact, it is a classof objections)to
the frequency view concerns the vague term long-run. Clearly, in order to
establisha frequencyratio the run has tostop somewhere, and yetprobability
valuesaredefined as thelimit of an infinite sequence. Nagel's answer to the
objection, whichvon Mises (1957) also deals with at some length,is that
empirically obtained ratioscan beusedashypothesesfor thetrue valuesof the
infinite series,andthesehypothesescan betested.Further, probability values
might be arrived at by using some other theory than that obtained from a
statistical series, for example, using theoretical mechanics to predict the
THE MEANING OF PROBABILITY 63
probability of a fair coin turningup heads. Againthe predicted probabilitycan
be testedempirically. Nagel also argues that Keynesian examples about the
credibility of witnessescouldbeexaminedin frequencyterms;for instance, "the
relative frequency with whicha regular church-goer tells lies on important
occasionsis asmall number considerably less than " (Nagel, 1936,p. 23). It
is obviouslythe casethat thereis more thanoneconceptionof probability and
that eachone hasvalue in its particular context. Nagel concludes that the
frequencyview is satisfactoryfor "every-daydiscourse, applied statistics and
measurement, andwithin many branchesof thetheoreticalsciences"(p. 26).
It has been noted that psychological scientists have acceptedthis view.
Nevertheless, the fact that probabilityhas to doboth with frequencies andwith
degreesof belief is thealeatoryandepistemological duality that is clearly and
cogently discussedby Hacking (1975). We blur the issueas wecomputeour
statisticsandspeakof theconfidencewe havein our results. The fact thatthe
answerto thequestion "Who,in practice, cares?"is "Probablyvery few," is
basedon anadmittedlyinformal frequencyanalysis,but it is one inwhich we
canbelieve!
Probability and the Foundations of Error Estimation
The practical spurto the developmentof probability theorywas simply the
attemptto formulate rules that, if followed assiduously, would leadto agambler
makinga profit over the long run, although, of course,it was notpossibleto
asserthow long the runmight haveto be forprofit to beassured. Probability
wastherefore boundup with the notion of expectancies,and the 150years that
followed the end of the17th centurysaw its constructs being appliedto
expectancies in life insuranceand the sale of annuities. Johnde Witt
(1625-1672)andJohn Hudde(1628-1704),who calculated annuities basedon
mortality statisticsandwhose workwasmentioned earlier, corresponded and
consulted with Huygens anddrew heavily on hiswritings.
An important figure in thehistory of probability, althoughthe extent of his
contribution was not fully recognizeduntil long, longafter his death, was
AbrahamDe Moivre (1667-1754). His book TheDoctrine of Chances:or A
Method of Calculating the Probabilities of Eventsin Play was first published
in 1718,but it is thesecond (1738) andthird (1756) editionsof the work that
are ofmost interest here. De Moivre's A Treatiseof Annuitieson Lives,editions
of which were publishedin 1724, 1743,and 1750, appliesThe Doctrine of
Chancesto the"valuation of annuities." The Treatiseis to befound bound
together withthe 1756 editionof the Doctrine.
Here the problems of the gaming roomsand thequestionsof actuarial
prediction in the mathematicsof expectanciesarelinked. A third strandwas
not to emergefor over 50 yearswith investigationson theestimation of error.
64 5. PROBABILITY
This latteris associated most closely with the work of Laplace(1749-1827)and
Gauss(1777-1855)and thederivationof "The Law of Frequencyof Error," or
the normal curveas it cameto beknown. And yet we nowknow that this
fundamentalfunction had,in fact, been demonstrated by DeMoivre,whofirst
publishedit in 1733 (the Approximatioad Summam Terminorum Binomii
(a+b)
n
in Seriem Expansi), andEnglish translations of thework are to befound
in the last two editions of The Doctrine of Chances.The Approximatio is
examinedby Todhunter (1820-1884),whose monumental Historyof the
Mathematical Theory of Probability From the Time of Pascal to That of
Laplace, publishedin 1865, still standsas one of themost comprehensive and
oneof thedullest bookson probability. Todhunter devotes considerable space
in his book to De Moivre's work, but, according to Karl Pearson (1926), he
"misses entirelythe epoch-making character of the 'Approximatio' aswell as
its enlargementin the 'Doctrine.' He doesnot say: Hereis the original of
Stirling's Theorem, hereis the firstappearanceof thenormal curve, hereDe
Moivre anticipated Laplaceas thelatter anticipated Gauss" (Pearson, 1926,
p. 552).
It is true that Todhunter does not stateany of theabove precisely,and it is
thereforenow often reported that Karl Pearson discovered the importanceof the
Approximatio in 1922 whilehe waspreparingfor his lecture courseon the
history of statistics thathe gaveat University College London between 1921
"and certainly continuing up till 1929" (E. S.Pearson& Kendall, 1970,p. 479).
Karl Pearson published an article on thematterin 1924 (K.Pearson 1924a) and
some detailsareprovidedby Daw and E. S. Pearson (1972/1979). Quite how
muchcredit we cangive Pearsonfor therevelationand howjustified he was in
his judgment that "Todhunter fails almostentirely to catchthedrift of scientific
evolution." (Pearson, 1926, p. 552) can bedebated.It is certainlythe casethat
one is notcarried alongby theexcitementof the progressionof discoveryas
oneploughsone'sway through Todhunter's book, but hedoes say, in a section
wherehe refersto thepagesof thethird edition of theDoctrine, which givethe
Approximatioandshowan important result:
Thus we have seen that the principal contributionsto our subject fromDe Moivre
are hisinvestigations respecting theDurationof Play,his Theoryof RecurringSeries,
and his extension of the value of Bernoulli's Theoremby the aid of Stirling's
Theorem. Our obligationsto DeMoivre would have been still greater if he had not
concealedthedemonstrationsof theimportant results whichwe have noticed . . . ;
but it will not bedoubted thatthe Theory of Probability owesmoreto him thanto
any other mathematician, with thesole exceptionof Laplace.(Todhunter, 1865/1965,
pp. 192-193)
It is also worth noting that De Moivre's contributionwasremarkedon by
the American historianof psychology, Edwin Boring, in 1920, before Pearson,
THE MEANING OF PROBABILITY 65
andwithout so muchfuss. In a footnote referringto the"normal law of error"
he says, "The so-called'Gaussian' curve.The mathematical propadeutics for
this function were preparedaslong ago as thebeginningof the18th century(De
Moivre, 1718[sic]). SeeTodhunter. .." (Boring, 1920,p. 8).
A more detailed discussion of theApproximatio and thepropertiesof the
normal distributionis given in thenext chapter. The rolesof LaplaceandGauss
in the derivation and applicationsof probability theory are of very great
importance. Laplace brought together a great dealof work that had been
publishedseparatelyas memoirsin his great work Theorie Analytiquedes
Probabilites, first publishedin 1812, withfurther editions appearingin 1814
and 1820. Todhunter(1865/1965)devotesalmost onequarterof his book to
Laplace's work. Fromthe standpointof statisticsasthey developedin thesocial
sciences,it is perhaps only necessary to mentiontwo of hiscontributions:the
derivation of the "Law of Error," and thenotion of inverse probability.
Laplace's theorem, the proof of which, in modern terms, may befound in David
(1949), andwhich hadbeen anticipatedby DeMoivre, showsthe relationship
betweenthe binomial seriesand thenormal curve.Put very simply,and in
modern parlance, as n in(a+b)
n
approachesinfinity, the shapeof the discrete
binomial distribution approaches the continuous bell-shaped normal curve. It
is therefore possibleto expressthe sum of anumberof binomial probabilities
by meansof the areaof the normal curve,and indeedthe valueof this sum
derived from the familiar tablesof proportionsof area underthe normal
distribution is quite closeto theexact value evenfor n of only 10. Thedetails
of this procedureare to befound in manyof theelementary manuals of statistics.
Laplace's work givesa numberof applicationsof probability theoryto
practicalquestions, but it is thebell-shaped curve as it isappliedto theestimation
of error thatis of interest here. Both Gauss andLaplace investigated the question
independentlyof each other. Errors areinescapable, even though they may not
alwaysbe ofcritical importance,in all observations that involve measurement.
Despite the most sophisticatedof instrumentsand themost skilled users,
measurements of, say, distances between stars in our galaxyat aparticular time,
or pointson thesurfaceof theearth,will, when repeated, not always produce
exactlythe same result. The assumptionis made thatthe values thatwe require
aredefinite and,to all intentsandpurposes, unchanging - that is to say, a true
value exists. Variationsfrom the true valueareerrors. Laplace assumed that
everyinstanceof anerror arose because of theoperationof a numberof sources
of error, eachone ofwhich mayaffect the outcomeone way or theother. The
argument proceeds to maintain thatthe mathematical abstraction that we now
call the normal law representsthe distributionof theresultant errors.
Gauss'sapproachis essentially the same as Laplace'sbut has amore
unashamedlypractical flavor. Heassumed that there was anequal probability
of errorsof over-estimationandunder-estimationandshowed,by themethod
66 5. PROBABILITY
of leastsquares, that it is thearithmetic meanof the distributionof measurements
that best reflectsthe true valueof the measurement we wish to make. The
distributionof measurements, the variationin which represents error on these
assumptions, is preciselythat of Laplace'stheorem,or thenormal curve,or, as
it is sometimes termed, the Gaussian distribution.
FORMAL PROBABILITY THEORY
From the 17th to the20th century, mathematical approaches to probability have
been algebraicaland geometrical. The work of Pascal and Fermat led to
combinatorialalgebra.The inventionof thecalculusled toformal examinations
of theoretical probability distributions. Gaussian methods areessentially geo-
metrical. Alwaysthe probability calculusfound itself in difficulties over the
absenceof a formal definitionof probability thatwas universally accepted.
Modern probability theory, but not, it mustbe noted, statistics, hasescaped this
difficulty by developingan arithmetical axiomatic model. The renowned Ger-
manmathematician David Hilbert (1862-1943),Professorof Mathematicsat
Gottingen, presenteda remarkablepaper to the International Congressof
Mathematics meetingin Parisin 1900. In it he presented mathematics with a
seriesof no less than23 problems that,he believed, wouldbe theones that
mathematics should address during the 20th century.
5
Among themwas acall
for a theoryof probability basedon axiomatic foundations. At that time, various
attemptswerebeing madeto developa theory of arithmeticbasedon asmall
numberof fundamentalpostulatesor axioms. These axioms have no basisin
what is called the real world. Questions such as"What do they mean?"are
excluded, ambiguityis avoided.The axioms themselves do notdependon any
assumptions. For example,one of thebest-knownof themathematical logicians,
Guiseppe Peano (1858-1932),basedhis arithmeticon postulates such as"Zero
is a number." Thisis not theplaceto attemptto discussthedifficulties that have
beenuncoveredin the axiomatic approachor in settheory, with whichit is
closely allied. Suffice it to saythat modern mathematical probability theory is
largely basedon thework of Andrei Kolmogorov(1903-1987),a Russian
mathematician whose book Foundations of the Theoryof Probability was first
publishedin 1933. Settheory is most closelylinked with the work of Georg
Cantor (1845-1918),born in St. Petersburgof Danishparentsbut wholived
most of his life in Germany. Set theory grewout of a newmathematical
conceptionof infinity. The theoryhasprofound philosophical aswell asmathe-
matical implicationsbut itsbasis is easy to understand. It is a mathematical
5
In theactual talk Hilbert onlyhadtime to deal with 10 of theproblems,but entire manuscript in
English,can befound in theBulletin of theAmerican Mathematical Society (1902).
67 FORMAL PROBABILITY THEORY
system that deals with defined collections, lists, or classesof objectsor elements:
The theoryof probability,as amathematical discipline, can andshouldbe developed
from axiomsin exactlythe sameway asGeometryandAlgebra. This means that
after we havedefinedthe elementsto bestudiedandtheir basic relations, andhave
statedthe axiomsby which these relations are to begoverned,all further propositions
must be based exclusivelyon these axioms, independent of the usual concrete
meaningof these elements andtheir relations...
... theconceptof afield of probabilities is definedas asystemof setswhich satisfies
certainconditions. What the elementsof this setrepresentis of noimportancein the
purely mathematical development of the theory of probability. (Kolmogorov,
1933/1956,p. 1)
The system cannot befully delineated here, but this chapterwill endwith an
illustration of its mathematical elegance in arriving at thewell-known rulesof
mathematical probability theory. This account follows Kolmogorov closely,
althoughit doesnot use histerminologyandsymbols precisely.
E is acollectionof elements called elementary events. / is a set ofsubsets
of E. Theelementsof f arerandom events.
The axiomsare:
1. / is a field ofsets.
2. / containsthe set E.
3. To eachset A in / isassigneda non-negative real numberp(A).This is called
the probability of eventA.
4. p(E}=\.
5. If A and Bhaveno elementin common, thenp(A + B)= p(A) + p(B)
In settheory,Ahas acomplementarysetA'.In thelanguageof random events,
A' meansthe non-occurrenceof A. To saythat A is impossibleis to write A = 0,
andto saythat A must occuris to write A = E.
BecauseA + A' = E andfrom the 4th and 5thaxioms, p(A)+p(A') = 1 and
p(A) = \-p(A') andbecauseE' = 0, p(E') = 0. Axiom 5 is theaddition rule.
lfp(A) > 0, then p(B \ A) - ^ , * , which is theconditional probabilityof
P(A)
the eventB under conditionA. From this followsthe general formula known
as themultiplication rule, p(AB)= p(A)p(B A), and as weshowin chapter7,
this leadsto Bayes'theorem. Note thatp(B\ A) > Q,p(E \A) = 1, and
p{(B + C)\A}=p(B\A)+p(C\A).
For readersof a mathematical inclinationwho wish to seemoreof the
developmentof this approach(it doesget alittle more difficult), Kolmogorov's
book is conciseand elegant.
6
Distributions
When GrauntandHalley andQueteletmade their inferences, they made them
on thebasisof their examinationof'frequency distributions. Tables, charts, and
graphs- no matter how theinformation is displayed- all can beusedto show
a listing of data, or classificationsof data, andtheir associated frequencies.
Theseare frequencydistributions. By extension, such depictions of the fre-
quencyof occurrenceof observationscan beusedto assessthe expectationof
particularvalues,or classesof values, occurringin the future. Real frequency
distributionscanthenbe usedasprobability distributions.In general, however,
the probability distributions that arefamiliar to theusersof statistical techniques
are theoretical distributions, abstractions basedon amathematical rule, that
match,or approximate, distributions of eventsin thereal world. When bodies
of dataare described,it is the graph and thechart thatare used. But the
theoretical distributions of statisticsandprobability theoryaredescribedby the
mathematical rules or functionsthat definethe relationships between data,both
real andhypothetical,andtheir expectedfrequenciesor probabilities.
Over the last 300 yearsor so, thecharacteristicsof a great many theoretical
distributions,all of which have beenfound to have some practical utility in one
situationor another, have been examined. The following discussionis limited
to threedistributions thatarefamiliar to usersof basicstatisticsin psychology.
An accountof somefundamentalsampling distributionsis given later.
THE BINOMIAL DISTRIBUTION
In the years 1665-1666,when Isaac Newtonwas 23 and had just earnedhis
degree, his Cambridge college (Trinity) was closed becauseof the plague.
Newton went hometo Woolsthorpein Lincolnshireandbegan,in peaceand
leisure,a scientific revolution.These werethe yearsin which Newton developed
68
THE BINOMIAL DISTRIBUTION 69
someof themost fundamental andimportantof his ideas: universal gravitation,
the compositionof light, the theory of fluxions (thecalculus),and thebinomial
theorem.
The binomial coefficientsfor integral powershad been knownfor many
centuries,but fractional powers werenot considereduntil the work of John
Wallis (1616-1703),Savilian Professorof Geometryat Oxford, and themost
influential of Newton's immediate English predecessors. However, expansions
2
)
1/2
of expressions such as (x -x were achievedby Newton earlyin 1665. He
announcedhis discoveryof thebinomial theoremin 1 676 inletterswritten to
the Secretaryof theRoyal Society, althoughhe never formally publishedit nor
did he provide a proof. Newton proceededfrom earlier workof Wallis, who
publishedthe theorem, with creditto Newton, in 1685. The problemwas to
find the areaunderthecurve withordinates(x - x
2
)" Whenn is zerothe first
two termsare x - 1 (x
3
\ andwhen n is1 they are x - 1 (x
3
\ Newton, usingthe
methodof interpolation employedso muchby Wallis, reasoned that when n was
l \*
/2 the corresponding terms should be, ;c - -r. He arrivedat theseries
andthen discovered that the same result couldbe obtainedby deriving, and
subsequently integrating
2
l/1
the binomial expansionof (1 - jc) . Theinterestingandimportant pointto be
noted is that Newton's discoverywas notmadeby consideringthe binomial
coefficientsof Pascal'strianglebut byexaminingthe analysisof infinite series,
a discoveryof muchgreatergeneralityandmathematical significance. Figure
6.1 showsthe binomial distributionfor n = l.
Newton'sdiscoveryof thecalculusin his "goldenyears"at Woolsthorpe
establishes him as its originator, but it was Gottfried Wilhelm Leibniz
(1646-1716),the German philosopher andmathematician,who haspriority of
publication, and it ispretty well established that the discoveries were inde-
pendent. However, a bitter quarrel developed over the claims to priority of
discoveryandallegations were made that, on avisit to Londonin 1673, Leibniz
could have seenthe manuscriptof Newton'sDe Analysi Aequationes Numero
TerminorumInfmitas, which, though writtenin 1669, was notpublisheduntil
1711. AbrahamDe Moivre wasamong those appointed by theRoyal Society
in 1712to report on thedispute. De Moivre made extensive use of themethod
in his own work, and it was hisApproximatio, first printedand circulatedto
somefriendsin 1733, that links the binomial to what we nowcall the normal
70 6. DISTRIBUTIONS
FIG. 6.1 The Binomial Distribution for N = 7
distribution. The Approximatio is includedin the second(1738) and third
(1756) editionsof the Doctrine.
It should be mentioned thata Scottish mathematician, James Gregory
(1638-1675),working at thetime (1664-1668)in Italy, derivedthe binomial
expansionandproduced important work on themathematicsof infinite series,
discoveredquite independentlyof Newton.
THE POISSON DISTRIBUTION
Before the structureof the normal distributionis examined, the work of
Simeon-Denis Poisson (1781-1840)on auseful special caseof thebinomial
will be described. The Ecole Polytechnique wasfoundedin Parisin 1794.It was
the model for many later technical schools, and its methods inspiredthe
productionof many student texts in mathematicsandengineering whichare the
forerunnersof present-day textbooks. Among the brilliant mathematiciansof
the Ecole duringthe earlier yearsof the 19th centurywasPoisson.His nameis
a familiar label in equationsandconstantsin calculus, mechanics, andelectric-
ity. He waspassionately devoted to mathematics and toteaching,andpublished
over 400 works. AmongthesewasRecherchesur la Probabilite desJugements
in 1837. This contains the Poisson Distribution, sometimes called Poisson's law
of large numbers. It wasnoted earlier that as n in (P + Q)
n
increases, the binomial
distributiontendsto thenormal distribution. Poisson considered the case where
THE POISSON DISTRIBUTION 71
as nincreases towardinfinity, P decreases towardzero,and nPremainsconstant.
The resulting distributionhas aremarkable application.
Data collectedby insurance companies on relatively rareaccidents,say,
people trapping their fingers in bathroom doors, indicates that the probability
of this event happeningto any oneindividual is very low, in fact nearzero.
However, a certain number of such accidents(X) is reported everyyear,and the
numberof these accidents varies from year to year. Overa numberof yearsa
statistical regularityis apparent, a regularity thatcan bedescribedby Poisson's
distribution.
If we set X at k, aninteger, then
whereA, is anypositive number, e is theconstant2.7183...,and k! isfactorial
k.
Although the distributionis not commonlyto befound in thebasic statistics
testsin psychology,it is usedin thesocial sciencesand itdoes havea surprising
rangeof applications. It hasbeen usedto fit distributionsin, for example, quality
control (defectsper numberof units produced), numbers of patients suffering
from certain specificdiseases, earthquakes, wrong-number telephone connec-
tions, the daily numberof hits by flying bombsin London during WorldWar
II, misprintsin books,andmany others.
FIG. 6.2 The Poisson Distribution applied to Alpha
Emissions (Rutherford & Geiger, 1910)
72 6. DISTRIBUTIONS
Poisson attempted to extendthe possibleutility of probability theory,for he
appliedit to testimonyand tolegal decisions. These applications received much
criticism but Poisson greatly valued them. Poisson formally discussedthe
conceptsof a random quantityandcumulative distribution functions, andthese
aresignificant theoretical contributions. But hisnameandwork in probability
doesnot occupy much space in theliterature, perhaps because he wasovershad-
owedby famouscontemporaries such asLaplaceandGauss. Sheynin (1978)
hasgiven us acomprehensive review of his work in thearea. An exampleof a
Poisson distributionis given in Figure 6.2.
THE NORMAL DISTRIBUTION
The binomial andPoisson distributions stand apart from the normaldistribution
because theyare applied to discretefrequency data. The inventionof the
calculus provided mathematics with a tool that allowedfor theassessment of
probabilitiesin continuous distributions. The first demonstrationof integral
approximation,to thelimiting caseof thebinomial expansion, wasgiven by De
Moivre. In theApproximatio, De Moivre beginsby acknowledgingthe work
of James Bernoulli:
Altho" the Solutionof Problemsof Chanceoften requires that several Termsof the
Binomial (a + b)
n
[this is modern notation] be added together, nevertheless in very
high Powersthe thing appearsso laborious,and of sogreatdifficulty, that few people
have undertaken that Task; for besides Jamesand Nicholas Bernoulli, two great
Mathematicians,I know of no body thathasattemptedit; in which, tho' they have
shown very great skill, andhavethe praise whichis due totheir Industry,yet some
thingswerefarther required;for what they havedoneis not somuchan Approxima-
tion as thedetermining very wide limits, within which they demonstrated that the
Sumof Termswascontained.(De Moivre, 1756/1967,3rd Ed., p. 243)
De Moivre proceedsto showhow hearrivedat theexpressionof theratio of
the middle termto the sum of all thetermsin theexpansionof (1 +1)" whenn
is a very high power.His answerwas
2
/BVn, where"B representsthe Number
of which the Hyperbolic Logarithmis 1 - 1/12 + 1/360-1/1260+ 1/1080, &c."
He acknowledgesthe help of James Stirlingwho hadfound that "B did denote
the Square-root of theCircumferenceof a Circle whose Radius is Unity, so that
if that Circumferencebe calledc, theRatio of themiddle Termto the Sum of
all the Termswill be expressedby
2
/V(c)" (De Moivre, 1756,3rd Ed., p. 244).
De Moivre hadthus obtained(in modern notation) the expression
, for largen, whereY
0
is themiddle term.
73 THE NORMAL DISTRIBUTION
He also givesthe logarithmof theratio of the middle termto anyterm distant
from it by aninterval /as (w + / - l^)log \m + l- 1 j + (m - I + V2)log \m - I
+ 1} - 2wlogw + log ((m +l)/m , wherem = '/2 andconcludes,in the first of
nine corollaries numbered1 through6 and 8through 10,7 having been omitted
from the numbering, that,"if m or n be aQuantityinfinitely great, thenthe
Logarithm of the Ratio, whicha Term distantfrom the these middleby the
Interval 1, has to themiddle Term, is -2///n" (p. 245). This is merely the
2/n
_
21
expression that, Y/ = Y
0
e~ , for largen.
The second corollary obtains the "Sum of theTerms intercepted between the
Middle, andthat whole distancefrom i t . . . denotedby /", in modern terms, the
sumof Yo+ YI + Y
2
+ Y
3
+ . .. + Y/, as
which is theexpansionof theintegral
When/ is expressedasS^fn , and Sinterpretedas!/2, the sumbecomes
... which convergessofast, thatby help of no morethansevenor eight Terms,the
Sumrequiredmay becarriedto six orseven placesof Decimals:Now that Sumwill
be found to be0.427812,independently fromthe common Multiplicator
2
/Vt, and
therefore to theTabular Logarithm of 0.427182,which is 9.6312529,adding the
Logarithmof
2
At, viz. 9.9019400, the sumwill be 19.5331929, to which answersthe
number0.341344.(De Moivre, 1756, 3rd Ed., p. 245)
This familiar final figure is thearea under the curveof thenormal distribution
betweenthe mean (whichis, of course, alsothe middle value)and anordinate
onestandard deviation from the mean. In thethird corollaryDe Moivre says:
And therefore, if it was possible to take an infinite number of Experiments, the
Probability that an Event whichhas anequal numberof Chancesto happenor fail,
shall neitherappearmore frequently than'/2 n+
]
/2"Jn times, nor morerarely than
'/2 n - !/2 Vn times,will be expressedby thedoubleSum of thenumberexhibitedin
the secondCorollary, that is, by 0.682688,and consequentlytheProbability of
74 6. DISTRIBUTIONS
the contrary...will be 0.317312, thosetwo Probabilities together compleating Unity,
which is themeasureof Certainty. (De Moivre, 1756,3rd Ed., p. 246)
1
Vw is what todaywe call the standard deviation.De Moivre did notname
it but hedid, in Corollary 6, saythat Vw "will be as itwerethe Modulusby
which we are toregulateour Estimation"(De Moivre, 1756,3rd Ed., p. 248).
In fact whatDe Moivre doesis to expandthe exponentialand to integrate
from 0 to SG.
In Corollary 6 De Moivre notes thatif / is interpretedas Vw rather than
V2 \H , thenthe series doesnot convergesofast andthat moreandmore terms
would be required for a reasonableapproximationas / becomesa greater
proportionof V,
... for which reason1 makeuse inthis Caseof theArtifice of Mechanic Quadratures,
first inventedby Sir IsaacNewton...;it consistsin determiningthe Area of a Curve
nearly, from knowinga certain number of its OrdinatesA, B, C, D, E, F, &c.placed
at equalIntervals,(De Moivre, 1756, 3rd Ed., p. 247)
He usesjust 4 ordinatesfor his quadratureandfinds, in effect, that the area
between2o or '/
2
/7 Vw is 0.95428,andthat the areain what we nowcall the
tails is 0.04572. The true valueis alittle less than thisbut it is, nevertheless,
familiar.
Theseresultscan beextendedto theexpansionof (a + b]
n
andwherea and
b are notequal.
If the Probabilitiesof happeningandfailing be in anygiven Ratioof inequality,the
Problemsrelatingto the Sum of theTermsof theBinomial (a + b)
n
will be solved
with the same facilityasthosein which the Probabilitiesof happeningandfailing
are in aRatio of Equality. (De Moivre, 1756/1967,3rd Ed., p. 250)
In Corollary 9, De Moivre in effect, and in modern terms, introduces
^(npq), the expressionwe usetodayfor the standard deviation of thenormal
approximationto thebinomial distribution.
The sum andsubstance oftheApproximatiois that it gives,for thefirst time,
the function that was rediscovered much later, the function that dominates
so-called classical statistical inference- thenormal distribution- which in
modern terminologyis given by thedensityfunction
1
Thevaluefor theproportionof areabetweenla is 0.6826894,so thatDe Moivre was out
by oneunit in thesixth decimalplace.
THE NORMAL DISTRIBUTION 75
The normal distributionis shownin Figure 6.3.
De Moivre's philosophical positionis revealedin the sections headed
"Remark I" in the 1738 editionof theDoctrine and anadditional,andmuch
longer, "RemarkII" in 1756. De Moivre setshis work in thephilosophical
contextof anordered determinate universe. His notion of Original Design (see
the quotationin chapter1) is anotion that persistedat least downto Quetelet.
A powerful deity revealsthe grand design through statistical averages andstable
statisticalratios. Chance produces irregularities. As Pearson remarked:
Thereis much valuein the ideaof the ultimate laws being statistical laws, though
why the fluctuations shouldbe attributedto aLucretian'Chance',I cannot say.It
is not anexactly dignified conceptionof theDeity to supposehim occupied solely
with first momentsandneglecting secondandhigher moments!(Pearson,1978, p.
160)
and elsewhere:
The causeswhich led DeMoivre to his "Approximatio" or Bayesto his theorem
were more theological andsociological than mathematical, anduntil onerecognizes
that the post-Newtonian English mathematicians were more influenced by Newton's
theology thanby hismathematics, thehistory of sciencein theeighteenth century-
in particular thatof thescientistswho were membersof theRoyal Society- must
remainobscure.(Pearson, 1926, p. 552)
FIG. 6.3 The Normal Distribution
76 6. DISTRIBUTIONS
It is interesting that thisis preciselythe sort of analysis that hasbeen brought
to bearon thework of Galton and Karl Pearson himself, save that it is the
philosophyof eugenics that influencedtheir work, rather than Christian theol-
ogy. In RemarkII, De Moivre takesup Arbuthnot's argument for theratio of
maleto female births, whichwasdiscussedin Chapter4, defendingthe argu-
ment againstthe criticisms thathadbeen advancedby Nicholas Bernoulli,who
hadnoted thata chance distributionof theactual male/female birth ratio would
be found if the hypothesized ratio (i.e., the ratio underwhat we would now call
the null hypothesis)hadbeen takento be18:17 rather than 1:1. But DeMoivre
insists:
This Ratiooncediscovered,andmanifestly servingto a wisepurpose,we conclude
the Ratio itself,or if you will the Form of the Die, to be anEffect of Intelligence and
Design. As if we were shewna numberof Dice, each with18 white and 17black
faces, whichis Mr. Bernoulli'ssupposition,we shouldnot doubt but that thoseDice
hadbeenmadeby someArtist; andthat their formwas notowing to Chance,but
wasadaptedto theparticularpurposehe had inView. (De Moivre, 1756/1967,3rd
Ed., p. 253)
With the greatestrespectto DeMoivre, this was clearly not Arbuthnot's
argument, and De Moivre's view that he might have saidit is somewhat
specious. LikeQuetelet'suse of thenormal curve many years later, De Moivre's
view is a prejudgmentand all findings must be madeto fit it. Karl Pearson
(1978)makes essentially the same point, but it is apparentlya point thathe did
not recognizein his ownwork.
7
Practical Inference
Someof thephilosophical questions surrounding induction andinference were
dealt within chapter2. Herethe foundationsof practical inferenceareconsid-
ered.
INVERSE PROBABILITY AND THE FOUNDATIONS
OF INFERENCE
The first exercisesin statistical inference arose from a considerationof statistical
summaries such asthosefound in mortality tables.The development of theories
of inferencefrom the standpointof theimplicationsof mathematical theory can
be datedfrom the work of Thomas Bayes(1702-1761),an English Noncon-
formist clergyman whose ministry was inTunbridge Wells. Bayes wasrecog-
nizedas avery good mathematician, although he published very little,and he
waselectedto theRoyal Societyin 1742 (see Barnard, 1958, for a biographical
note). Curiously enough, the paperfor which he isrememberedwascommu-
nicatedto theRoyal Societyby hisfriend Richard Price
1
more than2 yearsafter
his death,and thecelebrated forms of thetheorem that bear his name, although
they follow from the essay,do not actually appearin the work. An Essay
Towards Solvinga Problemin theDoctrine of Chances(Bayes,1763) is still
the subjectof much discussionand controversy bothas to itscontentsand
implicationsand as to howmuchof its import was contributedby its editor,
RichardPrice. The problem that Bayes addressed is statedby Pricein theletter
1
Pricewasalsoa Unitarian Church minister. His church stillstandsin Newington Greenin north
Londonand is theoldest Nonconformist church building still beingso usedin London. Next doorand
abuttingthe churchis alicensed betting shop. It hasbeen noted that Bayesian analysis, resting, as it
does,on thenotion of conditional probability, is akin to gambling.
77
78 7. PRACTICAL INFERENCE
accompanyinghis submissionof the essay:
Mr De Moivre . . . has . . . after Bernoulli,and to agreater degree of exactness, given
rulesto find theprobability thereis, that if a very great number of trials be made
concerningany event,the proportion of thenumberof timesit will happen,to the
numberof timesit will fail in those trials,shoulddiffer less thanby small assigned
limits from the proportionof theprobability of its happeningto theprobability ofits
failing in onesingle trial. But I know of no personwho hasshewnhow todeduce
the solution of the converse problemto this; namely, "the number of times an
unknown eventhashappenedand failed being given,to find thechance that the
probability ofits happening should lie somewhere between any twonamed degrees
of probability." (Bayes, 1763, pp. 372-373)
A demonstrationof some simple rules of mathematical probability, using a
frequency model, will help to derive and illustrate Bayes' Theorem.The
probability of drawinga redcardfrom a standard deckof playing cardsis 26/52
or/?(R) = 1/2. The probability of drawinga picture cardfrom the deck is 12/52
orp(P) = 3/13. The probability of drawinga redpicture cardis 6/52 or/?(R&
P) = 3/26. Whatis theprobability of drawing eithera redcardor a picture card?
The answeris, of course, 32/52 orp(Ror P) =8/13. Note that:
p(R or P)=p(R) +p(?) -p(R & P), [8/13 = (1/2 + 3/13) - 3/26].
This is known as theaddition rule.
Suppose that you draw a cardfrom the deck,but you are not allowedto see
it. What is theprobability thatit is a redpicturecard? We cancalculatethe
answerto be6/52.Now suppose that you aretold thatit is a redcard. Whatnow
is the probability of it beinga picture card?The probability of drawinga red
cardis 26/52.We alsocanfigure out that if the carddrawn is a redcard then
the probability of it also beinga picture cardis 6/26. In fact,
p(R & P) =/7(R)[p(P|R)] [6/52= 26/52(6/26)]
The term /?(P|R) symbolizesthe conditional probabilityof P, that is, the
probability of P given that R hasoccurredand theexpression denotesthe
multiplication rule. Notethat/?(R& P) is thesameas/?(P& R), andthat this is
equalto p(P)\p(R\P)], or 6/52 = 12/52(6/12). From this fact we seethat:
which is thesimplest formof Bayes' Theorem.In current terminology, the left-
79 INVERSE PROBABILITY AND THE FOUNDATIONS OF INFERENCE
hand sideof this equationis termedthe posteriorprobability,the first term on
the right-hand sidethe prior probability,and theratio p(P\R)/p(R.), the likeli-
hood ratio.
From the addition rulewe canalso show that the probability of a redcardis
the sum of theprobabilitiesof a redpicture cardand a rednumber card minus
the probability of a picture number card (which doesnot exist!), or:
p(R) =p(R & P) + p(R & N) -p(? & N).
But p(P & N) - 0 becausethe eventspicture card andnumber cardare
mutually exclusive. So Bayes' Theoremmay bewritten:
The fact that this formulais arithmeticallycorrect may becheckedby
substitutingthe valueswe canobtainfrom the known distributionof cardsin
the standard deck. However, this does not demonstratethe allegedutility of the
theoremfor statistical inference. For that we must turnto another example after
substitutingD (for Data) in placeof R, and H,(for Hypothesis1) in placeof P,
and H
2
(for Hypothesis2) in placeof N, whereH, and H
2
are twomutually
exclusiveandexhaustive hypotheses. We have:
Bayes' Theorem apparently provides a meansof assessingthe probability
of a hypothesis giventhe data(or outcome), whichis theinverseof assessing
the probability of an outcome givena hypothesis(or rule), and of coursewe
may envisage more than two hypotheses.In theoriginal essay, Bayes demon-
stratedhis construct usingas anexamplethe probability of balls comingto rest
on one orother of thepartsof a plane table. Laplace, who in 1774 arrivedat
essentiallythe same result asBayes,but provideda more generalized analysis,
used the problem of the probability of drawing a white ball from an urn
containingunknown proportionsof black andwhite balls given that a sampling
of a particular ratiohasbeen drawn. Traditionally, writers on Bayes make heavy
use of"urn problems,"andtradition is followed here witha simpleexample.
Phillips (1973) givesa comprehensive account of Bayesian procedures and
showshow they can beappliedto data appraisal in thesocial sciences. Suppose
we arepresented withtwo identical urnsW and B. Wcontains70 white and 30
80 7. PRACTICAL INFERENCE
black balls,and Bcontains40 white and 60black balls. Fromone of theurns
we areallowedto draw 10 balls, and wefind that 6 of themarewhite and 4 of
themareblack. Is the urnthat we have chosen more likely to be W ormore
likely to be B?Presumably most of uswould opt for W. What is theprobability
that it was W?
/?(Hi|D) is theprobability of W given the data,and/?(H
2
|D)is the probability
of B given the data.Now theprobability of drawinga white ball from W is 7/10
and from B it is 4/10. The probability of the data givenH, [p(D|Hi)] is
(0.7)
6
(0.3)
4
, andp(D|H
2
) is (0.4)
6
(0.6)
4
.
Now we apply Bayes' Theorem:
In order to completethe calculation,we haveto have valuesfor p(H,) and
/?(H
2
) - the prior probabilities. And here is thecontroversy.Theobjectivist
view is that theseprobabilitiesareunknownandunknowable because the state
of the urnthat we have chosenis fixed. Thereare nogroundsfor saying that
we have chosenW or B. Thepersonalist wouldsaythat we canalways state
somedegreeof belief. If we arecompletely ignorant about which urn was
chosen,then bothp(H,) and p(H
2
) can beexpressedas0.5. Thisis aformal
statementof Laplace's Principleof Insufficient Reason,or Bayes' Postulate.If
we put thesevalues in our equation,p(Hi|D) works out to be0.64, which
matches common sense. This value could then be usedto revisep(H])and/?(H
2
)
andfurther datacollected.For itsproponents,the strengthof Bayesianmethods
lies in theclaim that they provideformal proceduresfor revising many kindsof
opinion in thelight of newdatain a direct andstraightforward way, quite unlike
the inferential proceduresof Fisherian statistics. The most important point that
has to beaccepted,however, is thejustification for the assignmentof prior
probabilities. Althoughin this simple example there might seem to be little
difficulty, in more complex situations, where probabilities, based on relative
frequency
2
, hunches,or onopinion, cannot be assigned precisely or readily,the
pictureis far less clear. Furthermore, the principle of theequal distributionof
ignorancehasitself come under a great deal of philosophicalattack. "It is rather
generally believed that he [Bayes] did not publish becausehe distrustedhis
postulate,andthoughthis scholium defective.If so he wascorrect"(Hacking,
1965, p. 201).
2
Presumablyit could beargued that our urnwith its unknown ratioof black andwhite balls
is one of aninfinite distributionof urnswith differing make-ups.
81 FISHERIAN INFERENCE
FISHERIAN INFERENCE
Whenthe justification for the probabilistic basisof inferencein the senseof
revising opinionon thebasisof datawasthoughtof at all, it was theBayesian
approach that held sway until this century. Its foundationshad come under
attack primarilyon thegrounds that probability in any practical senseof the
word mustbe basedon relative frequencyof observationsand not ondegrees
of belief. Venn (1888)wasperhapsthe most insistent spokesman for this view.
Sir Ronald Fisher(1890-1962),certainlythe most influential statisticianof all
time, set out toreplaceit. In 1930he publisheda paper that supposedly set out
his notionof fiducial probability, claiming, "Inverse probability has, I believe,
survivedso long in spiteof its unsatisfactorybasis, because its critics have until
recent timesput forward nothingto replaceit as arational theoryof learningby
experience" (Fisher, 1930, p. 531).
Thereis nodoubt thatFisher'smethods,and thecontributionsof Neyman
andPearson that have been grafted on tothem, have provided us with a set of
inferential procedures. There seems to beconsiderable doubt as towhetherhe
providedus with a coherent non-Bayesian theory. Fisher himself asserted that
the conceptof thelikelihood functionwasfundamental to his newapproachand
distinguishedit from Bayesian probability. Kendall (1963) in his obituaryof
Fishersays this:
It appearsto methat, at this point [1922],his ideas werenot very well thought out.
Certainly his expositionof themwasobscure. But, in retrospect,it becomes plain
that he wasthinking of a probabilityfunctionf(x,Q) in two different ways: as the
probability distributionof* for given 0, and asgiving some sort of permissible range
of 0 for anobservedx. To this latterhe gavethe thenameof 'fiducial probability
distribution' (later to beknown as afiducial distribution)and indoing so begana
long trainof confusion;for it is not aprobability distributionto anyonewho rejects
Bayes'sapproach, and indeed, may not be adistributionof anything. Fisher
neverthelessmanipulatedit as if it were, and thereafter maintained an attitudeof
rather contemptuous surprise towards anyone who wasperverse enoughto fail in
understandinghis argument....
The positionon both sideshasbeen restatedad nauseam, without much attempt
at reconciliationor, as Ithink, withoutan explicit recognitionof thereal point, which
is that a man's attitude towards inference, like his attitude towards religion, is
determinedby his emotional make-up,not by reasonor mathematics. (Kendall,
1963, p. 4)
Richardvon Mises (1957)is baffled also:
Fisher introducesthe term [likelihood] in order to denote something different from
probability. As hefails to give a definition for either word,i.e., he doesnot indicate
how the valueof either is to bedeterminedin a given case,we canonly try to derive
82 7. PRACTICAL INFERENCE
the intended meaningby consideringthecontext in which he usesthesewords.
I do notunderstandthemany beautifulwordsusedby Fisherand hisfollowers in
supportof thelikelihood theory. Themain argument, namely, thatp[the probability
of the hypothesis]is not avariable but an"unknown constant,"doesnot mean
anything to me.(von Mises1957, pp. 157-158)
It is temptingto leaveit at that, but some attemptat capturingthe flavor of
Fisher's position must be made. Clearly, if one hasknowledgeof a distribution
of probabilities of events, then that knowledge can beusedto establishthe
probability of an event that has not yetbeen observed,for example, the
probability thatthe next roll of two dice will producea 7(p = .167). Whatof
the situation whereaneventhasbeen observed - theroll did producea 7 - can
we sayanything about the plausibility of an event withp =. 167havingoccurred?
This is adecidedlyodd sort of question because the eventhasindeed occurred!
Beforethe draw from a lottery with 1,000,000 ticketsis made,the probability
of my winning is .000001.It will bedifficult to convincemeafter the draw, as
I clutch the winning ticket, that what hashappenedis impossible,or even very,
very unlikely to have occurredby chance,and if youcontinueto insist I shall
merely keepon showingyou myticket.
Fisher, in 1930, putsthe situationin this way:
Thereare twodifferent measuresof rational belief appropriateto different cases.
Knowing thepopulationwe canexpressour incomplete knowledgeof, or expectation
of, the samplein termsof probability; knowing the samplewe canexpressour
incomplete knowledgeof the populationin termsof likelihood. (Fisher, 1930, p.
532)
Likelihood thenis a numerical measureof rational beliefdifferent from
probability. Whetheror not thelogic of thesituationis understood, all usersof
statistical methodswill recognize the reasoningascrucial to both estimation
and hypothesis testing. The methodof maximum likelihood that Fisher pro-
pounds (althoughit had beenput forward by Daniel Bernoulliin 1777; see
Kendall, 1961) justifiesthe choiceof population parameter. The method simply
says thatthe best estimateof the parameteris the value that maximizesthe
probability of theobservations, or that whathasoccurredis themost likely thing
that could haveoccurred.A numberof writers, for example, Hogben(1957),
have stated that this assumption cannot bejustified without an appealto inverse
probability, so that Fisherdid not succeedin detaching inference from Bayes'
Theorem.Nevertheless,the basicnotion is that we canfind the likelihood of a
particularpopulation parameter, say thecorrelationcoefficient R, by defining
it as avalue thatis proportionalto theprobability that from a population with
that valuewe have obtaineda sample withan observed valuer.
83 BAYES OR p < .05?
There is nodoubt that Fisher's argument will continueto becontroversial
and that many attemptsto resolvethe ambiguitieswill be made. Dempster
(1964) is among thosewho have enteredthe fray, and Hacking's(1965)
rationalehasresultedin onewell-known statistician (Bartlett, 1966) proposing
that the resulting theorybe renamed "the Fisher-Hacking theory of fiducial
inference."
BAYES OR p < .05?
Criticism of Fisherian methods arrived almost as soon as they beganto be
adopted.In more recent years, criticismof null hypothesis significance testing
(NHST) appearsto have becomean 'area' in its own right. Berkson (1938)
probablyled theway, whenhe noted that:
if the numberof observationsis extremely large- for instanceon theorder of
200,000- thechi-squareP will besmall beyondanyusual levelof significance ...
For we mayassume that it is practically certain that any seriesof real observations
doesnot follow a normal curve with absolute exactitude in all respects. (p. 526)
Berksonwas expressingin essence what many, many other writers in the
following 60 years have observed: that when H
Q
is expressedas anexact null
hypothesis (zero difference or norelationship) then very small deviations from
this (dareone sayit?) pedantic viewwill be declared significant]
Someof themore recently-expressed doubts have been brought together by
Henkel andMorrison (1970),but thepapers they collected together were most
certainlynot thelast word. Indeed, Hunter (1997) called for a ban onNHST,
andreportedthat no lessa body thanthe American Psychological Association
has a"committee looking into whether the use of thesignificancetestshould
be discouraged"(p. 6)!
The main criticisms, endlessly repeated, areeasily listed. NHST does not
offer any way oftestingthe alternativeor research hypothesis; thenull hypothe-
sis is usually falseand when differencesor relationshipsare trivial, large
sampleswill lead to its rejection; the method discourages replication and
encourages one-shot research; the inferential model dependson assumptions
about hypothetical populations anddata that cannot be verified; andthereare
more. Someof thecriticismsarevalid andneedto befaced more carefully, even
more boldly than they are, while some seem to be'strawmen' set up to beblown
down. For example,the fact that we only reject H
Q
and do nottest H^ is not
particularly satisfying,but thenotion that that inference is faulty becausein
someway it restson characteristicsof populationsnot observedor not yet
observedis merely statinga fact of inference!Moreover, mostof thecriticism
84 7. PRACTICAL INFERENCE
would be diluted,if not eliminated,if greater attention to parametersof a test
other thanalpha level were considered, that is to say,effect size, sample size,
and power.
Even Fisher's most supportive andardent colleague, Frank Yates (1964)
said:
The mostcommonly occurringweaknessin theapplicationof Fisherianmethodsis,
I think, undue emphasis on testsof significanceandfailure to recognize that in many
typesof experimental work estimatesof thetreatment effects,togetherwith estimates
of the errorsto which theyaresubjectare thequantitiesof primary interest.To some
extent Fisher himself is to blamefor this. Thusin TheDesign of Experimentshe
wrote: "Every experimentmay besaidto exist onlyin orderto give the factsa chance
of disprovingthenull hypothesis."(p. 320)
Yates goeson to saythat the null hypothesis,as usually expressed,is
"certainly untrue"andthat:
suchexperiments[variety andfertilizer trials] are infact undertaken with thedifferent
purposeof assessingthe magnitudeof theeffects... [Fisher] did not . . . sufficiently
emphasisethe point in his writings. Scientistswere thusencouragedto expectfinal
anddefinitive answers... someof them, indeed, came to regardthe achievementof
a significant resultas an end initself, (p. 320)
Although a numberof modificationsof, or alternativesto, NHST have been
suggested (see, for example,confidenceintervals, discussed in chapter13, p..
199) by far themost popularof thesuggested replacements (rather than salvage
operationsfor the Fisherian model) is Bayesian analysis. The claim is usually
made that Bayesian methods areconcernedwith the alternative hypothesis, that
they encourage replication, that, in fact, they reflect more clearly the traditional
'scientific method.' Curiously enough, it is alsooften claimed that despitethe
so-called subjectivityof Bayesian priors, a Bayesian analysis will arrive at the
same conclusionas a'classical' analysis. This leaves the journeyman psycho-
logical researcher with nothing but thetired protest that nothing hasbeenoffered
as arewardfor changingfrom the familiar routines!
8
Sampling
and Estimation
RANDOMNESS AND RANDOM NUMBERS
In common parlance, to choose"at random"is to choose without bias, to make
the actof choosingonewithout purpose even though the eventual outcomemay
be usedfor a decision.The emphasis here is on the act rather thanthe outcome.
It is certainlynot true to saythat ordinaryfolk accept that random choice means
absenceof designin theoutcome. Politicians, examining the polls, have been
known to remark thatthe result was "in thecards."Primitive notionsof fate,
demons, guardian angels, andmodern appealsto the"will of God" often lie
behindthe drawingof lots and thetossingof coinsto "decide"if Mary loves
John,or to takeonecourseof action rather than another. It is absenceof design
in the mannerof choosing thatis important.The point mustnot belabored,but
everyday notionsof chancearestill construed, evenby themost sophisticated,
in ways thatare not too far removedfrom the random element in divination and
sortilege practicedby theoraclesandpriestswho were consultedby ourremote,
andnot-so-remote, ancestors. And, asalready noted, perhaps one of thereasons
for the delay in the developmentof the probability calculusarosefrom a
reluctanceto attemptto "second-guess" the gods.
The conceptsof randomnessandprobability are, therefore, inextricably
intertwinedin statistics.Thedifficulties inherentin defining probability, which
were discussed earlier, areonce more presented in anexaminationof random-
ness.It is commonly thought that everyone knows what randomness is. The
great statistician Jerzy Neyman (1894-1981)states, "The method of random
sampling consists,as it is known, in taking at random elementsfrom the
population whichit is intendedto study. The elements compose a sample which
is then studied" (Neyman, 1934, p. 567).
It is unlikely that thisdefinition would be given a high gradeby most teachers
of statistics. Later in this paper and, this inadequatedefinition notwithstanding,
85
86 8. SAMPLING AND ESTIMATION
it is apaperof centralandenormous importance, Neyman does note that random
samplingmeans that each element in thepopulation must have an equal chance
of being chosen, but it is not uncommonfor writers on themethodto ignore
both this simple directiveandinstructionsas to themeansof its implementation.
Of course,the use of thewords "equalchance"in thedefinition bringsus back
to what we meanby chanceandprobability. For most purposeswe fall back
on the notion of long-run relative frequency, rather than leaving these constructs
asundefined ideas that make intuitive sense. Practical tests of randomness rest
on the examinationof thedistributionof eventsin long series.It is alsothe case,
as Kendall and Babington Smith (1938) point out, that random sampling
proceduresmay follow a purposive process. For example,the numberTI is not
a random number, but itscalculation generates a seriesof digits thatmay be
random. These authors areamongthe earliestto set out thebasesof random
samplingfrom a straightforward practical viewpoint, andthey were writinga
mere60 years ago.The conceptof a formal random sampleis amodern one;
concerns about its placeandimportancein scientific inferenceandsignificance
testing paralleledthe developmentof methodsof experimental designand
technical approaches to theappraisalof data. Nevertheless, informal notionsof
randomness that imply lack of partiality and"choiceby chance"go back many
centuries.
Stigler (1977) researched the procedure knownas"the trial of thePyx," a
samplinginspection procedure that hasbeenin existenceat theRoyal Mint in
Londonfor almost eight centuries. Over a periodof time, a coin wouldbe taken
daily from the batch thathadbeen mintedandplacedin a boxcalledthe Pyx.
At intervals, sometimes separated by asmuch as 3 or 4years,the Pyx was
openedand thecontents checked andassayedin orderto ensure that the coinage
met thespecifications laid downby theCrown. Stigler quotes Oresme on the
procedurefollowed in 1280:
When the Master of the Mint hasbrought the pence, coined, blanched and made
ready,to theplaceof trial, e.g. theMint, hemustput themall at onceon thecounter
which is covered withcanvas. Then, whenthe pence have been well turned over
andthoroughly mixedby thehandsof the Master of theMint and theChanger,let
the Changertakea handful in themiddle of theheap,moving roundnine or tentimes
in onedirection or theother, until he hastakensix pounds. He must then distribute
thesetwo or three times into four heaps, sothat theyarewell mixed. (Stigler, 1977,
p. 495)
The Masterof theMint wasalloweda marginof error calledthe remedyand
had to make goodany deficit that was discovered. Although mathematical
statistics playedno part in these tests, and if they had they would have been
87 RANDOMNESSAND RANDOM NUMBERS
more precise,
1
the procedure itself mirrors modern practice.
The trial of the Pyxevenin theMiddle Ages consistedof a sample being drawn, a
null hypothesis (the standard) to betested,a two-sided alternative, and atest statistic
anda critical region(thetotal weightof thecoinsand theremedy). The problem
evencarried with itselfa lossfunction which waseasily interpretablein economic
terms. (Stigler, 1977, p. 499)
Random selections may bemadefrom real, existent universes, for example,
the populationof Ontario, or from hypothetical universes, for example, indi-
viduals who, over a period of time, have takena particular drugaspart of a
clinical trial. In thelatter case,to talk of "selection"in anyreal senseof theword
is stretching credulity, but we do use the results basedon theindividuals actually
examinedto make inferences about the potential population thatmay begiven
the drug. In the sameway, the samplesof convenience thatare used in
psychological researcharehardly ever selected randomly in the formal sense.
Undergraduate student volunteers are notlabeledasautomatically constituting
random samples, but they areoften assumedto beunbiased withrespectto the
dependent variables of interest,an assumption that hasproduced much criticism.
This latter statement emphasizes the fact thata sampling method, whatever it
is, relatesto theuniverse under study and theparticular dependent variable or
variablesof interest. A questionnaire asking about attitudes to healthandfitness
given to membersof the audienceat, say,a symphony concertmay well be
generalizableto thepopulationof thecity, but thesame instrument given to the
annual conventionof a weight watchers' club would not. These statements seem
to be soobviousand yet, as weshall see,overlooking possible sources of bias
either by accidentor designhas led tosome expensive fiascos. Unfortunately,
the method of samplingcan neverbe assuredly independent of the variable
under study. Kendall andBabington-Smith (1938) note that "The assumption
of independence must therefore be made with moreor less confidenceon a
priori grounds. It is part of thehypothesison which our ultimate expressionof
opinion is based"(p. 152).
Kendall and BabingtonSmith commenton the use of "random"digits in
random sampling, and it isworth examining these applications because most of
the present-day statistical cookbooks at leastpay lip-serviceto the procedures
by including tablesof random numbersand some instructionas totheir use.
Individual units in thepopulationarenumberedin some convenient way, and
1
Stigler notesthattheremedywasspecifiedon a perpoundbasisandthatall thecoin weightswere
combined. This,togetherwith thecentral limit theorem, almost guarantees that the Masterwould not
exceedthe remedy.
88 8. SAMPLING AND ESTIMATION
then numbers, taken from the tables,arematchedwith the individualsto select
the sample. Thisproceduremay result in a sampleof individuals numbered1,
2, 3,4, 5,6,7, 8,9, or 2,4,6,8, 10, 12,14,16, 18,20, groupings that may invite
comment because they follow immediately recognizable orders but that never-
theless couldbe generatedby a random selection. The fact that a random
selection produces a grouping that looksto bebiasedor non-representative led
to a great deal of debatein the1920sand1930s. The sequence 1,4,2,7,9, has
the appearanceof being randombut thesequence1, 4, 2, 7, 9, 1, 4, 2, 7, 9, has
not. Finitesequencesof random numbersaretherefore only locally random.
Eventhe famous tablesof onemillion random digits producedby theRAND
Corporation (1965)canonly, strictly speaking, be regardedaslocally random,
for it may bethat the sequenceof onemillion wasaboutto repeat itself. Random
sequencesmay also be "patchy."Yule (1938b),for example,after examining
Tippett's tables, which hadbeen constructed by taking numbersat randomfrom
census reports, gained the impression that they were rather patchy and pro-
ceededto apply furtherteststhat gavesome support to hisview. Tippett (1925)
usedhis tables (not then published) to examine experimentally his work on the
distributionof therange.
Simpletestsfor local randomnessareoutlinedby Kendall and Babington-
Smith, tests that Tippett's tables hadpassed,andalthough much more extensive
appraisalscan bemade todayby using computers, these prescriptions illustrate
the sort of criteria that shouldbe applied. Each digit should appear approxi-
matelythe same number of times;no digit should tendto befollowed by another
digit; there are certain expectationsin blocks of digits with regard to the
occurrenceof three, four, or five digits thatare all thesame; therearecertain
expectations regarding the gapsin thesequence between digits that are thesame.
Testsof this sortdo notexhaustthe many thatcan beapplied. Whitney(1984)
hasrecently noted that "It hasbeen said that more time hasbeen spent generating
andtesting random numbers than usingthem" (p. 129).
COMBINING OBSERVATIONS
The notion of usinga measureof theaverageas anadequateandconvenient
summaryor description of a numberof datais anacceptedpart of everyday
discourse.We speakof average incomes andaverage prices, of average speeds
and averagegasconsumption,of averagemen andaverage women. We are
referring to some middling, nonextreme value that we take as afair repre-
sentationof our observations.The easyuse of theterm doesnot reflect the
logical problems associated with the justification for the use ofparticular
measuresof the average.The term itself, we find from the Oxford English
Dictionary, refers, among other things, to notionsof sharing laboror risk, so
89 COMBINING OBSERVATIONS
old forms of theword referto work doneby tenantsfor a feudal superioror to
shared risks among merchants for lossesat sea. Modern conceptions of theword
includethe notion of a sharingor eveningout over a rangeof disparate values.
A large seriesof observationsor measurements of the same phenomenon
producesa distributionof values. Giventhe assumption that a single true value
does,in fact, exist,the presenceof different valuesin theseries shows that there
areerrors. The assumption that over-estimations are aslikely asunder-estima-
tions would provide support for the use of themiddle valueof the series,the
median,asrepresentingthe true value. The assumption that the valuewe observe
most frequently is likely the true valuejustifies the use of themode,and the
"eveningout" of thedifferent valuesis seenin the use of thearithmetic mean.
It is this latter measure that is now thestatisticof choice when observations are
combined,andthereare anumberof strandsin our history that have contributed
to thejustification for its use.The employmentof thearithmetic meanand the
use of the lawof error have sound mathematical underpinnings in thePrinciple
of Least Squares,which will be considered later. First, however, a somewhat
critical look at the use of themeanis in order.
The Arithmetic Mean
In the 1755 Philosophical Transactions, and in arevisionin Miscellaneous
Tracts of 1757, Thomas Simpson argues the casefor thearithmetic meanin An
Attempt to Showthe Advantage arisingby Takingthe Mean Of a Number of
Observationsin Practical Astronomy.Theseare valuable contributions that
discuss,for the firsttime, measurement errors in thecontextof probability and
point the waytoward the ideaof a law of facility of error. Simpson (1755)
complainsthat "somepersons,of considerable note, have been of opinion,and
even publickly maintained, that onesingle observation, taken with duecare,was
asmuchto berelied uponas theMeanof a great number" (pp. 82-83).
In the revision, Simpson (1757) statesasaxioms that positiveandnegative
errors areequally probableandthat thereareassignablelimits within which
errorscan betakento fall. He alsodiagramsthe law oferror as anisosceles
triangleandshows that the meanis nearerto thetrue value thana single random
observation.The claim of thearithmetic meanto be thebest representation of
a large bodyof datais often justified by appealto theprinciple of least squares
and the law oferror. This is high theory,and inappealingto it thereis adanger
of overlookingthe logic of the use of themean. Simply, asJohn Venn (1891),
who wasmentionedin chapter1, putsit:
Why do weresortto averagesat all?
How can asingle introductionof our own, andthat a fictitious one, possiblytakethe
90 8. SAMPLING AND ESTIMATION
placeof themany values which were actually given to us? And theanswersurely
is, that it can notpossibly do so; the onething cannot takethe placeof theother for
purposesin general,but only for this or that specific purpose, (pp. 429-430)
This seemingly obvious statement is onethat hasfrequentlybeen ignoredin
statisticsin thesocial sciences. Venn points out thedifferent kinds of averages
that can beusedfor different purposesandnotes cases where the use of anysort
of averageis misleading. Edgeworth (1887) hadprovidedan attemptto examine
the mathematical justifications for the different averagesandVenn andmany
others referto this treatment. Edgeworth's paper also examines the validity of
the leastsquaresprinciple in this context. Venn illustrates his argument with
some straightforward examples. If two people reckonedthe distanceof Cam-
bridgefrom Londonto be 50 and 60miles, in theabsenceof anyinformation
that would leadus tosuspect either of themeasures, onewould guess that 55
miles was theprobabledistance.However,if one person said that someone they
knew livedin Oxford andanother that the individual lived in Cambridge,the
most probable location would not be atsome placein between.In thelatter case,
in the absenceof anyotherinformation, onewould havea chanceat arriving at
the truth by choosingat random.
Edgeworth's paperson thebest mean represent some of his most useful
work. A particular serviceis renderedby his distinction between real or
objectivemeansandfictitious or subjective means. The former arise whenwe
use thearithmetic meanas thetrue valueunderlyinga groupof measurements
that aresubjectto error; the latter is adescriptionof a set.
The meanof observationsis acause,as it werethe source from which diverging
errorsemanate.Themeanof statisticsis adescription,a representativeof thegroup,
that quantity which, if we must in practiceput onequantityfor many, minimisesthe
error unavoidably attending such practice.
Observationsaredifferent copiesof oneoriginal; statisticsaredifferent originals
affording one'genericportrait.' (Edgeworth, 1887,p. 139)
This formal distinctionis clear. However,becausethe mathematicsof the
analysisof errorsand themanipulationsof modern statistics rest on thesame
principles, the logic of inferenceis sometimes clouded. It is Quetelet who
broughtthe mathematicsof error estimationin physical measurement into the
assessment of thedispersionof humancharacteristics. Clearly, something of the
sort hadbeen done before in theexaminationof mortality tablesfor insurance
purposes,but we seeQuetelet makinga direct statement that only partially
recognizes Edgeworth's later distinction:
Everythingoccursthenasthough there existeda type of man, from whichall other
mendiffered moreor less. Nature hasplaced beforeour eyesliving examples of
91 COMBINING OBSERVATIONS
what theoryshowsus. Each peoplepresentsits mean,and thedifferent variations
from this meanwhich may becalculateda priori. (Quetelet, 1835/1849,p. 96)
We notedin chapter1 that it was Quetelet'sview thatthe averagevalueof
mentalandmoral,aswell asphysical, characteristics represented the ideal type,
I'homme moyen (the average man) for which Naturewasconstantly striving.
Quetelet'sown pronouncements were somewhat inconsistent in that he some-
times promotedthe view of the"average being"as auniversal biological type
and at other times suggested that the averagediffered across groupsand
circumstances. Nevertheless, his methods werethe forerunnersof work that
attemptsto establish"norms" in biology, anthropology, and thepsychologyof
individual differences.One of therequirementsof a "good" psychometric test
is that it be accompaniedby norms. These norms provide standards for com-
parisonsacrossindividual test scores and, just asQuetelet's characterization of
the averageas theideal type aroused opposition and controversy, so the
establishmentof national normsandsub-group normsandracial norms pro-
duces heated debate today.
The Principle of Least Squares
In its best-known form, this famous principle states that the sum ofsquared
differencesof observationsfrom the meanof those observations is aminimum;
that is to say, it is smaller thanthe sum ofsquareddifferencesfrom any other
reference point.
Legendre(1752-1833)announcedthe Principle of Least Squaresin 1805,
but in Theoria Motus Corporum Coelestium, published in 1809, Gauss
(1777-1855)discussingthe method, refersto hiswork on it in 1795 (whenhe
was a17-year-old student preparing for his university studies). This claimto
priority upset Legendre and led tosome bitter dispute. Today the methodis most
frequentlyassociated with Gauss, who isoften identified as thegreatest mathe-
matician of all time. Thatthe method veryquickly becamea topic of much
commentaryanddiscussionmay bedemonstratedby thefact that in the 70 or
so years followingLegendre'spublicationno lessthan 193 authors produced
72 books,23 partsof books,and 313memoirs relatingto it. Merriman (1877)
providesa list of these titles together with some notes. Faced with such sources,
to say nothing of thederivations, papers, commentaries, and monographs
publishedin thelast 110 years,the following represents perhaps one of adozen
waysof commentingon itsorigins.
Using the meanto combinea set ofindependent observations was atech-
niquethat hadbeen usedin the 17th century. Gauss later examined the problem
of selecting from a numberof possible waysof combining datathe one
that producedthe least possible uncertainty about the "true value." Gauss noted
92 8. SAMPLING AND ESTIMATION
that in thecombinationof astronomical observations the mathematical treatment
dependedon themethodof combination.He approachedthe problemfrom the
standpoint that the method should lead to thecancellationof random errorsof
measurement andthat,astherewas noreasonto preferonemethod over another,
the arithmetic mean should be chosen.Having appreciated that the "best,"in
the senseof "most probable,"value couldnot beknown unlessthe distribution
of errorswasknown,he turnedto anexaminationof thedistribution, which gave
the meanas themost probable value. Gauss's approach wasbasedon practical
considerations, andbecausethe procedureshe examineddid produce workable
solutionsin astronomicalandgeodetic observations, the methodwasvindicated.
In fact, the principle of least squares, asGauss himself noted, can beconsidered
independentlyof thecalculusof probabilities.
If we haven observationsX^X
2
X
3
^ . . . X
n
, from apopulation witha mean
u,, what is theleast squares estimate of u? It is thevalueof (i that minimizes,
NOW : whereX is themeanof the nobserva-
tions.
Clearly the right-hand sideof thelast expressionis at aminimum when,
X- u,, which demonstrates the principle.
This easily obtained result provides a rationalefor estimatingthe population
meanfrom the sample mean that is intuitively sensible.The law oferror enters
the picture whenwe considerthe arithmetic meanas themost probable result.
In this casewe find that the law is infact givenby thenormal distribution, often
referredto as theLaplace-Gaussian distribution. It hasbeen statedon more than
one occasionthat Laplaceassumedthe normal law in arriving at themeanto
provide whathe describedas themost advantageous combination of observa-
tions. Laplacecertainly consideredthe casewhen positiveandnegativeerrors
areequally likely,andthesecasesreston theerror law being whatwe nowcall
"normal" in form. The threadsof theargumentare notalways easyto disentan-
gle, but one of thebetter accounts, for thosewho arepreparedto grapple with
a little mathematics,wasgiven by Glaisher as long ago as 1872. The crucial
point is that of the rationalefor the twofundamentalconstructsof statistics,
93 COMBINING OBSERVATIONS
the mean,X,and thevariance, I,(X- X) /n. Theessential fact is easilyseen.
Given thata distributionof observationsis normal in form, andgiven thatwe
know the meanand thevarianceof the distribution, thenthe distributionis
completelyanduniquely specified. All its propertiescan beascertained given
this information.
In the contextof statisticsin thesocial sciences, both the normal law and the
least squares principlearebest understoodin thecontext of thelinear model.
The model encompasses these constructs andbrings togetherin a formal sense
the mathematicsof thecombinationof observations developed for use inerror
estimationandmathematical statistics asexemplifiedby analysisof variance
andregression analysis.
Representationand Bias
The earliest examples of the use ofsamplesin social statisticswe have seenin
the work of Graunt, Petty, Halley, and theearly actuaries. These samples were
neither randomnor representative, andmistaken inferences were plentiful. In
any event, it was notuntil much later, when attempts were made to collect
information on populations,inferential exercises repeated, and theresults
compared, that critical examinations of the techniques employed could be made.
Stephan (1948) reports that Sir Frederick Morton Eden estimated the population
of Great Britainin 1800to beabout 9,000,000. This estimate, which wasbased
on the numberof births and theaverage number of inhabitantsin eachhouse(a
numberthat was obtainedby sampling),was confirmedby the first British
censusin 1801. Earlier attemptsto estimate populations had been madein
France,andLaplacein 1802 madean attemptto do sothat followeda scheme
he had devisedand publishedin 1786, a scheme that included a probability
measureof theprecisionof theestimate. Specifically, Laplace averred that the
odds were 1,161 to 1 that the error wouldnot reach halfa million. Westergaard
(1932) provides some more details of these exercises. Elsewhere in Europeand
in the United Statesthe 19th centurysawvarious censuses conducted, aswell
asattemptsto estimatethe sizeof thepopulationfrom samples.In theUnited
Statesthe Constitution providedfor a census every10 years in order to
determine Congressional representation, but aBill introduced intothe British
Parliamentin 1753to establishan annual census wasdefeatedin theHouseof
Lords.
It appears thatthe probability mathematicians and therising groupof
political arithmeticians never joined forces in the 19th century. The latter
favoredthe institution of complete censuses and, generally speaking, were not
mathematicians, and theformer were scientists who hadmany other problems
94 8. SAMPLING AND ESTIMATION
to test their mettle. Almost 100 years passed before scientific sampling proce-
dureswereproperly investigated.
Stephan (1948) lists four areas where modern sampling methods could have
beenusedto advantage: agricultural crop andlivestock estimates,economic
statistics, social surveys andhealth surveys, andpublic opinion polls.The last
will be considered herein a little more detail becauseit is in this areathat the
accuracyof forecastsis sooften and soquickly assessedandbreakdownsin
sampling procedures detected with the benefitof hindsight.
The Raleigh Starin Raleigh, North Carolina, conducted "straw votes"as
early as 1824, covering political meetings andtrying to discoverthe "senseof
the people." By theturn of thecentury, many newspapers in theUnited States
were regularly conducting opinion polls, a common method being merely to
invite membersof thepublic to clip out aballot in thepaperand tomail it in.
The same basic procedure wasfollowed by all thepublications. Thenthe large-
circulation magazines, notably Literary Digest, began to mail out ballotsto very
large numbersof people, sometimesas many as 11,000,000.In 1916 this
publicationcorrectly predictedthe electionof Wilson to thepresidencyandfrom
then until 1936 enjoyed consistent andmuch admired success. Its predictions
were very accurate.For example,in 1932its estimateof thepopular votefor
each candidatein the presidential election came to within 1.4%of the actual
outcome. In 1936 came disaster. For years Literary Digest hadconducted polls
on all kinds of issues, mailingout millions of ballots,at considerable expense,
to telephone subscribers and automobile owners. In 1936the magazine pre-
dictedthat AlfredM. Landon wouldwin thepresidencyon thebasisof thereturn
of over 2,300,000replies from over 10,000,000 mailed ballots. The record
showsthat Franklin D. Rooseveltwon thepresidency withone of thelargest
majorities in American presidential history. The reasonsfor this disastrous
mistakeare noweasyto see. Priorto 1936, preferencefor the twomajor political
parties in the United Stateswas notrelatedto level of income.In that yearit
seems that it was. The telephone subscribers and carowners (andin 1936 these
were the rather moreaffluent) who hadreceivedthe Literary Digest ballots
were,in themain, Republicans. In 1937the magazine ceased publication. Crum
(1931) andRobinson (1932, 1937) gave commentaries on someof these early
polls.
Fortune magazine fared no better in 1948, underestimating the vote for the
DemocratsandHarry Trumanby closeto 12%and, of course,failing to pick
the winner. In 1936, witha much, much smaller sample than that of Literary
Digest (less than 5,000), it had forecast Roosevelt's vote to within 1%. The
explanation for its failure in 1948 restswith the swing of both decidedand
undecided voters between the Septemberpoll and theNovember electionand
failure to correct for a geographic sampling bias. Parten(1966) has some
95 SAMPLING IN THEORY AND PRACTICE
commentaryon theseand other polls.One of themost successful polling
organizations,the AmericanInstitute of Public Opinion, headedby George
Gallup, beganits work in 1933. But theGallup poll also predicteda win for
Dewey over Trumanin 1948,andmany people have seen theLife photograph
of a victorious president holding the Chicago Daily Tribune withits famous
Type I error headline, "Dewey defeats Truman."
The result producedone of thefirst claims thatthe polls influencedthe
outcome, inducing complacency in theRepublicansand asmall turnoutof their
supporters, leading to thedefeatof the Republican candidate. Today there is
muchcontroversy overthe conductingand the use of polls. Theywill survive
becausein general theyarecorrect much moreoften than not. Their success is
due to thedevelopmentof refined sampling techniques.
SAMPLING IN THEORY AND IN PRACTICE
Chang (1976) givesa quite thorough reviewof inferential processesand
sampling theory,and it isworth sketchingin someof its developmentin the
context of survey sampling. However, from the standpointof statisticsand
samplingin psychology, thereis no doubt but that the rationaleof sampling
proceduresfor hypothesis testing rather than parameter estimation is of greater
import. The two arerelated.
The early political arithmeticians held to theview that statistical ratios - for
example, malesto females, average number of childrenperfamily, and so on -
were approximately constant, and, as aresult, proceededto draw inferences
from figures collectedin a single townor parishto whole regionsand even
countries.The early 19th centurysaw theintroductionof the law oferror by
GaussandLaplaceand anawareness that variability was animportant consid-
erationin theassessment of data. Populations are nowdefinedby two parame-
ters, the meanand thevariance.One of theearliest attemptsto put ameasureof
precision ontoa sampling exercisewasthat of Laplacein 1802.The samplehe
used was not random, althoughhe appearsto have assumed that it was.
Communes distributed across France were chosen to balanceout climatic
differences. Those having mayors known for their "zeal andintelligence" were
also selectedso that the data wouldbe themost precise. Laplace also assumed
that birth ratewashomogeneousacrossthe French population, exactly the sort
of unwarranted assumption that wasmadeby theearly political arithmeticians.
Nevertheless, Laplace estimated the population total from his figures and,
appealingto the central limit theorem (whichhe haddiscussedin 1783),
approximatedthe distributionof estimationerrorsto thenormal curve.
Survey samplingfor all kinds of social investigations owes a great dealto
Sir Arthur Lyon Bowley(1869-1957).In his time he wasrecognizedas a
96 8. SAMPLING AND ESTIMATION
pioneerin thedefinition of sampling techniques, and hismethodsandassump-
tions werethe subjectof much debate.In theevent, someof his approaches
werefound to bedefective,but hiswork focussed attention on theproblem. In
1926he summarizedthe theoryof sampling,and ashort paperof 1936 outlines
the applicationof samplingto economicand social problems.He servedon
numerous official committees that investigated the economic stateof the na-
tion,social effects of unemployment and so on, andworked directlyon many
surveys. Maunder (1972) has written a memoir that pointsup Bowley's
contributions, contributions that were somewhat overshadowed by thework of
his contemporaries Pearson, Fisher, andNeyman.He was acalm andcourteous
man, enormously concerned with social issues, and heoccupied,at theLondon
School of Economics,the first Chair devotedto statisticsin thesocial sciences.
Bowley (1936) definesthe sampling problem simply:
We arehereconcerned. . . with the investigationof thenumerical structureof an
actual andlimited universe,or "population" whichis thebetter wordfor our purpose.
Our problems are quite definitely to infer the population from the sample.
The problemis strictly analogousto that of estimatingthe proportionof the various
coloursof balls in a limited urn on thebasisof one ormore trial draws, (pp. 474-475)
In the early yearsof this century, Bowley began to examine boththe practice
and theory of survey sampling. His work helpedto highlight the utility of
probability samplingof oneform or another. Systematic selection was adopted
andadvocatedby A. N. Kiaer(1838-1919), Director of theNorwegian Bureau
of Statistics,in his examinationsof census data, but themajority of influential
statisticians, represented by theInternationalStatisticalInstitute,rejected sam-
pling, pressingfor complete enumeration. It took almost30 yearsfor theutility
andbenefitsof themethodsto beappreciated. Seng (1951) andKruskal and
Mosteller (1979) give accounts of this most interesting periodin statistical
history. The latter authors givea translationandparaphraseof theremarksof
Georgvon Mayer, Professor at theUniversity of Munich, on Kiaer's workon
the representative method, which waspresentedat ameetingof theInstitutein
Bernein 1895:
I regardasmost dangerousthe point of view found in his work. I understand that
representative samples canhave some value, but it is avalue restrictedto terrain
alreadyilluminatedby full coverage. One cannot replaceby calculationthe real
observationof facts. A sample provides statistics for theunits actuallyobserved, but
not true statisticsfor theentire terrain.
It is especially dangerousto proposerepresentative sampling in themidst of an
assemblyof statisticians. Perhaps for legislativeor administrativegoalssampling
may haveuses- but onemust never forget that it cannot replacea complete survey.
It is necessaryto addthat thereis amongus these daysa current in themindsof
97 SAMPLING IN THEORY AND PRACTICE
mathematicians that would, in many ways, haveus calculate rather thanobserve.
We must remainfirm andsay: no calculations when observations can bemade, (von
Mayer, quotedby Kruskal & Mosteller, 1979,pp. 174-175)
Oddly enough, Kiaer's work is not mathematical in thesense that modern
methodsof parameter estimation aremathematical.At the time those methods
were not fully delineatednor understood.Kiaer aimed,by a variety of tech-
niques,to producea miniatureof thepopulation, althoughhe notedasearly as
1899 the necessityfor the investigationof both the practical and theoretical
aspectsof his methods.At a meetingin 1901(a report of which waspublished
in 1903), Kiaer returnedto the theme and it was in adiscussion of his
contribution that L. von Bortkiewicz suggested that the "calculus of prob-
abilities" couldbe usedto test the efficacy of sampling. By establishinghow
muchof a differencebetween sample andpopulationcould be obtainedacci-
dentallyandchecking whether or not anobserveddifferencelay outside those
limits, the representativeness of thesample couldbe decidedon. Bortkiwiecz
did not, apparently, formulate all thenecessary tests, andothershademployed
this method, but he seemsto have beenthe first to draw the attention of
practicing statisticians to thepossibilities.
In 1903, Kiaer must have thought that the sampling argument waswon, for
a subcommitteeof theInternational Statistical Instituteproposeda resolutionat
the Berlin meeting:
The Committee, considering that the correct application of the representative
method, in a certain numberof cases,canfurnish exact anddetailedobservations
from which the resultscan begeneralized,within certainlimits, recommendsits use,
provided thatin thepublicationof theresultsthe conditions under which the selection
of theobservation unitsis madearecompletely specified.Thequestion willbekept
on the agenda, sothat a report may bepresentedin thenext sessionon theapplication
of themethodin practiceand on thevalueof theresults arrivedat. (quotedby Seng,
1951,p. 230)
What is more, a discussantat themeeting,the French statistician Lucien
March, returnedto the ideas thathad beenput forward by Bortkiewicz and
outlinedsomeof thebasicsof probability sampling (see Kruskal & Mosteller,
1979, for a short summaryof this presentation). The wayahead seemed clear.
In fact the question was, for all intentsandpurposes, shelved for more than
20 years,and it was not until 1925,at theRome sessionof theInstitute, thatthe
advantagesof thesampling method were fully recognized. Thiswas in nosmall
way due to thetheoretical workof Bowley. Bowley had suggestedin his
Presidential address to the Economic Scienceand Statistical Sectionof the
British Associationasearly as1906 thata systematic approach to theproblem
98 8. SAMPLING AND ESTIMATION
of sampling would bearfruit:
In general,two lines of analysisarepossible:we may find anempiricalformula (with
Professor Karl Pearson) which fits this classof observations [Bowleyis referringto
data thatmay not benormally distributed], and byevaluatingthe constants determine
an appropriate curve of frequency,andhence allot the chancesof possible differences
betweenour observationand theunknown true value; or we mayaccept Professor
Edgeworth's analysis of thecauseswhich would producehis generalisedlaw of great
numbers,anddeterminea priori or by experiment whether this universal law may
be expectedor is to befound in thecasein question. (Bowley, 1906, pp. 549-550)
Edgeworth's methodis basedon theCentral Limit Theorem, andBowley
explainsits utility clearly andsimply:
If quantitiesaredistributed accordingto almostany curveof frequency,... the
averageof successive groups of . . . these conformto anormal curve (the more and
more closelyas n isincreased) whose standard deviation diminishes in inverse ratio
to thenumberin each sample... If we canapply this method..., we areableto give
not only a numerical average, but areasoned estimate for thereal physical quantity
of which the averageis alocal or temporary instance. (Bowley, 1906, p. 550)
The procedure demands random sampling:
The chancesare thesamefor all the itemsof the groupsto besampled,and the way
theyare takenis absolutely independent of their magnitude.
It is frequently impossibleto covera whole areaas thecensusdoes,...but it is not
necessary.We canobtainasgood resultsas wepleaseby sampling,andvery often
quite small samplesareenough;the only difficulty is to ensure that every person or
thing has thesame chanceof inclusion in the investigation. (Bowley, 1906, pp.
551-553)
THE THEORY OF ESTIMATION
Over the next 20 years, Bowleyand hisassociates completed a numberof
surveys, and his theoretical researches produced The Measurementof the
Precision Attainedin Samplingin 1926. This paper formed part of thereport of
the International Statistical Institute, which recommended anddrew attention
to the methodsof random selectionandpurposive selection:"A number of
groupsof units areselected which together yield nearly the same characteristics
as thetotality" (p. 2). Thereport doesnot directly addressthe method of
stratified sampling, even though the techniquehadbeenin general use. This
procedure received close attention from Neymanin his paperof 1934. Bowley
hadattemptedto present a theory of purposive samplingin his 1926 report.
99 THE THEORY OF ESTIMATION
A distinctive featureof this method, according to Bowley, wasthat it was acaseof
cluster sampling.It wasassumed that the quantity under investigation wascorrelated
with a numberof characters, called controls,andthat the regressionof the cluster
meansof thequantityon thoseof each control waslinear. Clusters were to beselected
in sucha waythat averageof each control computed from the chosen clusters should
(approximately) equal its population mean.It washoped that,due to theassumed
correlations between controls and thequantity under investigation, the above method
of selection would result in a representative sample with respectto thequantity under
investigation. (Chang, 1976, pp. 305-306)
Unfortunately,a practical test of themethod (Gini, 1928) proved unsatisfactory
andNeyman's analysis concluded that it was not aconsistentnor anefficient
procedure.
As Neyman pointed out, the problemof samplingis theproblemof estima-
tion. The first forays intothe establishment of a theoryhadbeen madeby Fisher
(1921a, 1922b, 1925b), but themannerof samplinghadreceived little attention
from him. The methodof maximumlikelihood, which rested entirelyon the
propertiesof thedistributionof observations, gave the mostefficient estimate.
Any appealto thepropertiesof the apriori distribution - theBayesian approach
- wasrejectedby Fisher. Neyman attempted to clarify the situation:
We areinterestedin characteristicsof a certain population, say, n , . . . it hasbeen
usuallyassumed that the accurate solution of sucha problem requires the knowledge
of probabilitiesa priori attachedto different admissible hypotheses concerning the
valuesof thecollective characters [the parameters] of thepopulationn. (Neyman,
1934, p. 561)
He then turnsto Bowley's work,noting that whenthe populationn is known,
then questions about the sort of samples that it could producecan beanswered
from "the safegroundof classical theoryof probability" (p. 561). The second
questioninvolvesthe determination, whenwe know the sample,of the prob-
abilities a posteriori to beascribedto hypotheses concerning the populations.
Bowley's conclusions arebased:
on some quite arbitrary hypotheses concerning the probabilitiesa priori, and
Professor Bowley accompanies his results withthe following remark:"It is to be
emphasized that the inference thus formulated is basedon assumptions that are
difficult to verify andwhich are notapplicablein all cases."(Neyman, 1934, p. 562)
Neymanthen suggests that Fisher's approach (that involving the notion of
fiducial probability, althoughNeymandoesnot use theterm) "removesthe
difficulties involved in thelack of knowledgeof the apriori probability law"
100 8. SAMPLING AND ESTIMATION
(p. 562). He further suggests that these approaches have been misunderstood,
due,he thinks,to Fisher's condensed form of explanationanddifficult method
of attackingthe problem:
The form of thesolution consistsin determining certain intervals which I propose
to call confidenceintervals..., in which we mayassumearecontainedthe values
of the estimated charactersof the population, the probability of the error in a
statementof this sort being equal to or less than1 - e ,wheree is anynumber0 <
e < 1,chosenin advance.The numbers I call the confidence coefficient. (Neyman,
1934,p. 562)
Neyman's comments on Fisher's abilityto explainhis view produced,in
the discussionof his paper,the first (mild) reactionfrom Fisher. Subsequent
reactionsto Neyman's workandthat of his collaborator, Egon Pearson, became
increasingly vitriolic. The report reads:
Dr Fisher thoughtDr Neyman mustbe mistaken in thinking the term fiducial
probability [Neymanhad usedthe term "confidence coefficient"]had led to any
misunderstanding; he had notcome uponany signsof it in theliterature. WhenDr
Neyman said"it really cannot be distinguished fromthe ordinary conceptof
probability," Dr Fisher agreedwith him ... Hequalified it from the first with the
word fiducial... Dr Neymanqualified it with the word confidence. The meaning
wasevidentlythe same,and he did not wish to deny that confidence could be used
adjectivally. They wereall too familiar with it, asProfessor Bowleyhadreminded
them, in thephrase"confidence trick." (discussionon Dr Neyman'spaper, 1934,
p. 617)
From the standpoint of thefamiliar statistical procedures found in our texts,
this paperis importantfor its treatment of confidenceintervalsand itsemphasis
of the importanceof random sampling. It extended estimation from so-called
point estimation, the use of asample value to infer a populationvalue,to interval
estimation, whichassessesthe probability of a range of values. Neyman
demonstratesthe use of theMarkov
2
methodfor deriving the best linear
Andrei Andreyevich Markov(or Markoff, 1856-1922)is best knownfor his studiesof the
probabilitiesof linked chainsof events.Markov chains have been used in a varietyof social and
biological studiesin thelast30 or 40years.But Markov made many contributionsto probability theory.
If we havea random variableX, then regardless of its distribution,for anypositive number c (i.e. c >
0), theprobability, Px(X> cu), thatthe random variable X is greater thanc timesits expected valueu
x
= u doesnot exceed lie. That is, Px(X > c^) < \lc. This is knownas theMarkov Inequality. Markov
was astudentof Pafnuti LTchebycheff(sometimes spelledChebychev or Chebichev,1821-1894), who
formulatedthe Tchebycheff Inequality which states that, Px(X< n - da or X > + da) = tfwhereX
is a random variable with expected value u andvariancea
2
, and d> 0.This resultwasindependently
arrivedat by theFrench mathematician I.J. Bienayme (1796-1876).These inequalities areimportant
in the development of thedescriptionsof thepropertiesof probabilitydistributions.
THE BATTLE FOR RANDOMIZATION 101
unbiased estimators. It also contains other important ideas, in particular, a
discussionof the methodsof stratified samplingand appropriate statistical
modelsfor it. Neyman's paper marks a new era inboth the methodandtheory
of sampling, although,at thetime, it was its treatmentof the problemof
estimation that receivedthe most attention.In a senseit complementedand
supplementedthe work of Ronald Fisher that wasgoing on atRothamsted,but
it became evident that Fisher did notquite see itthat way.
THE BATTLE FOR RANDOMIZATION
Thereis nodoubt thatthe requirement that samples should be randomly drawn
wasthoughtof by thesurvey-makersas aprotection against selection bias. And
thereis alsono doubt that when sample size is largeit affords such protection,
but not, it mustbe stressed,a guarantee.
In agricultural research, it had long been recognized that reduction of
experimental error was ofcritical importance. At Rothamstedtwo methods were
available: repeatingthe experiments over many years, and multiplying the
number of plots on afield. Mercer and Hall (1911) discussthe problemin
considerable detail andgive suggestions for arrangingthe plots so that theymay
be "scattered."This was theapproach that wasabandoned, although not imme-
diately, when Fisher started his important workat theStation. Eventually, for
Fisherand hiscoworkersthe argumentfor randomizationhad aquite different
motive from that of trying to obtaina representative sample, onethat is crucial
for an appreciationof the use ofstatisticsin psychology. Fisher, although a
brilliant mathematician, was apractical statistician,and hisapproachto statis-
tics canonly be understood through his work on thedesignof experimentsand
the analysisof theresultant data. The coreof Fisher's argument rests on the
contention thatthe valueof anexperiment depends on thevalid estimationof
error, an argument that everyone would agree with. But how was theestimate
to bemade?
In nearlyall systematic arrangements of replicated plotscareis takento put theunlike
plots asclose togetheraspossible, and thelike plots consequentlyas farapart as
possible,thus introducinga flagrant violationof theconditions upon whicha valid
estimateis possible.
One way ofmaking sure that a valid estimateof errorwill beobtainedis to arrange
the plots deliberatelyat random.
The estimateof error is valid, because, if we imaginea large numberof different
results obtainedby different random arrangements, the ratio of the real to the
estimatederror, calculated afreshfor eachof these arrangements, will be actually
distributed in the theoretical distributionby which the significanceof the result is
tested. Whereasif a groupof arrangementsis chosen such that thereal errorsin this
group are on thewhole lessthanthoseappropriateto randomarrangements, it has
102 8. SAMPLING AND ESTIMATION
now beendemonstrated that the errors, asestimated, will,in sucha group, be higher
thanis usualin random arrangements, andthat, in consequence, within such a group,
the testof significanceis vitiated. (Fisher,1926b,pp. 506-507)
The contributionof Fisher thatis overwhelmingly important is thedevelop-
mentof the t and ztestsand thegeneral technique of analysisof variance.The
essenceof these proceduresis that they provide estimates of error in the
observationsand theapplicationof testsof statistical significance.These meth-
ods werenot immediately recognizedasbeinguseful for larger-scale sample
surveys,and it waspartly the work of Neyman (mentioned earlier) andothers
in the mid-1930s that, ironically, introduced them to this area.
Argumentsabout randomized versus systematic designs began in themid-
1920s. Mostly they revolved around the issueof what to dowhen therewas an
unwantedassignmentof treatmentsto experimental units, that is, when the
assignmenthad apattern thatthe researcher either knew or suspected might
confoundthe treatments. Fisher argued very strongly against the use ofsystem-
atic designs,on thebasisof theory,but hisargumentwas notwholly consistent.
Somehadsuggested that if a random design produced a pattern thenit should
be discardedandanother random assignment drawn. Of course,the subjectivity
introducedby this sortof procedureis precisely that of thedeliberate balancing
of the design. And howmany draws might one beallowed?
Most experimenterson carryingout arandom assignment of plots will be shocked
to find out how farfrom equallythe plots distribute themselves... if the experimenter
rejectsthe arrangement arrived at bychanceasaltogether "too bad," or in other ways
"cooks" the arrangementto suit his preconceived ideas, he will either (and most
probably)increasethe standarderror asestimatedfrom theyields; or, if his luck or
his judgment is bad, he will increasethe real errors while diminishing his estimate
of them. (Fisher, 1926b, pp. 509-510)
But even Fisher never quite escapedthe difficulty. Savage (1962) talked
with him:
"What would you do," I had asked, "if, drawinga Latin Squareat randomfor an
experiment,you happenedto draw a Knut Vik square?"Sir Ronaldsaidhe thought
he would draw againandthat, ideally, a theory explicitly excluding regular squares
shouldbedeveloped. (Savage, 1962, p. 88)
Studentsandteachers cursing their way through statistical theory andpractice
shouldtakesome comfortfrom the inconsistencies expressed by themaster.
The debate reachedits height in argument between "Student" andFisher.
"Student"consistently advocated systematic arrangements. In a letter to Egon
Pearson (Pearson, 1939a) written shortly before his death in 1937, "Student"
THE BATTLE FOR RANDOMIZATION 103
commentson work by Tedin (1931) thathad examinedthe outcomes when
sytematic,as opposedto random,Latin squares were used in experimental
designs. The "Knight's move" Latin squarehe prefers aboveall others:"It is
interestingas anillustration of what actually happens when we depart from
artificial randomisation:1 would Knight's move every time!" (quotedby E. S.
Pearson, 1939a, p. 248).
3
Over the previous yeara seriesof papers, letters, andlettersof rebuttalhad
comeforth from "Student"andFisher. "Student"wasadamantto theend,and
Fisher reiteratedhis claim that valid error estimates cannot be computedin
arranged designs andthat in such casesthe testof significanceis madeineffec-
tive. Picard (1980) describes anddiscussesthe argumentandalso examinesthe
contributionsof PearsonandYates,who hadsucceeded Fisher at Rothamsted.
Others enteredthe debate.Jeffreys(1939)is puzzled:
Reading "Student's"paper [of 1937] and Fisher's Designof Experiments I find
myself in almost complete agreement with both; and Ishould therefore have expected
themto agreewith each other.
But it seemsto methat "Student"is wrong in regarding Fisher as anadvocateof
extremerandomness, andpossibly Fisher has notsufficiently emphasizedthe amount
of systemin his methods. (Jeffreys, 1939, p. 5)
Jeffreysmakesthe point thatomitting to take account of relevant information
makes avoidable errors:
The bestprocedureis to designthe work so as todetermineit [the error] asaccurately
aspossibleand not toleaveit to chance whether it can bedeterminedat al l . . . . The
hypothesis is a considered proposition.. . The argument is inductive and not
deductive; it is not dealt withby consideringan estimable error that hasnothingto
do with it. (Jeffreys, 1939,p. 5)
As ever,Jeffreys' argumentis aparagonof logic, and itnotes that Fisher's
adviceto balanceor eliminatethe larger systematiceffects as accuratelyas
possibleandrandomizethe rest "sumsup thesituation very well"(p. 7). This
is the prescription that the designof experimentsfollows today.
E. S. Pearson (1938b) attempted to expandon and toclarify "Student's"
stand,but heclearly understoodthe view of theopposition. Nevertheless, he
concluded that balanced layouts could give some slight advantage.
An illustration of "Two Knight's moves"would be
DE A B C
B C D E A
E A B C D
C D E A B
A B O D E
104 8. SAMPLING AND ESTIMATION
Yates (1939),in a lengthy paper also goes over the whole of "Student's"views
on thematter,but hisconclusion supports the essenceof Fisher's views:
The conclusionis reached that in caseswhere Latin square designs can beused,and
in manycaseswhere randomized blocks have to beemployed,the gain in accuracy
with systematic arrangements is not likely to besufficiently great to outweighthe
disadvantages to which systematic designs aresubject.
On theother hand, systematic arrangements may incertaincasesgive decidedly
greateraccuracy than randomized blocks, but it appears that in suchcasesthe use of
the modern devicesof confounding, quasi-factorial designs, or split-plot Latin
squares whicharemuch more satisfactory statistically, arelikely to give a similar
gain in accuracy.(Yates,1939, p. 464)
This bringsus to theapproachesof thepresent day.
The realization that sampling wasimportantin psychologicalresearch, and
that its techniqueshadbeen much neglected, waspresentedto thedisciplineby
McNemar (1940a).In anextensive discussion, he pointsout situations that are
still with us:
One wonders, for instance, how many psychometric test scores for policeman,
firemen, truck driversetal. have been interpreted by theclinician in termsof college
sophomorenorms.
In psychological research we aremorefrequentlyinterestedin makingan inference
regardingthe likenessor differenceof two differentially defineduniverses, suchas
two racial groups,or anexperimentalvs. acontrol group. The writer venturesthe
guessthat at least90% of theresearchin psychology involves such comparisons. It
is not only necessaryto considerthe problemof samplingin thecaseof experimental
andcontrol groups,but alsoconvenientfrom the viewpoint of both good experimen-
tation andsound statisticsto do so.(McNemar, 1940a,p. 335)
This paper, which, regrettably, is not among those most widely cited in the
psychological literature, should be required readingfor all those embarkingon
researchin any aspectof the social sciences. Its closing remarks contain a
prediction thathasbeen most certainly fulfilled andwhose content is dealt with
shortly:
The applicability in psychologyof certainof ProfessorR. A. Fisher's designs should
be examined. Eventually, the analysisof variancewill come intouse inpsychological
research. (McNemar,1940a,p. 363)
For themoment,no more needsto besaid.
9
Sampling Distributions
Large setsof elementary events arecommonly called populationsor universes
in statistics,but the set theory term sample space is perhapsmoredescriptive.
The term population distribution refers to thedistributionof thevaluesof the
possible observationsin the sample space. Although the characteristicsor
parametersof thepopulation(e.g.,the mean,\i, or thestandard deviation, a)
are of both practicaland theoretical interest, these values are rarely, if ever,
known precisely. Estimatesof the valuesare obtainedfrom corresponding
sample values,the statistics. Clearly,for a sampleof a given size drawn
randomly froma samplespace,a distributionof valuesof a particular summary
statistic exists. This simple statement defines a sampling distribution.In statis-
tical practiceit is thepropertiesof these distributions that guides our inferences
about propertiesof populationsof actualor potential observations. In chapter
6 thebinomial, the Poisson,and thenormal distributionswere discussed. Now
that samplinghasbeen examinedin some detail, three other distributions and
the statistical tests associated with them arereviewed.
THE CHI SQUARE DISTRIBUTION
The development of the j^ (chi-square) test of "goodness-of-fit" represents one
of the most important breakthroughs in the history of statistics, certainlyas
important as thedevelopmentof themathematical foundations of regression.
The fact that both creationsareattributableto the work of oneman,
1
Karl
MacKenzie (1981) givesa brief accountof Arthur Black (1851-1893),a tragic figure who,on his
death, left a lengthy, and nowlost, manuscript, Algebraof Animal Evolution,which was sent to Karl
Pearson. "Pearsonstartedto read it, but realized immediately that it discussed topics very similar to
thosehe wasworking on, anddecidednot toreadit himself but to sendit to Francis Galtonfor his advice"
(p. 99).Of greatinterestis that buried among Black's notebooks, which have survived, is a derivation
of thechi-square approximation to themultinomial distribution.
105
106 9. SAMPLING DISTRIBUTIONS
Fig. 9.1 Chi - Square for 4 and 10 Degrees of Freedom
Pearson, is impressiveattestationto hisrole in thediscipline.Thereare anumber
of routesby which the testcan beapproached, but thepath thathasbeen followed
thus far is continuedhere.This path leads directlyto thework of Pearsonand
Fisher, who did notmake use, and, it seems, werein general unaware, of the
earlier workon goodness-of-fitby mathematicians in Europe. Before looking
at thedevelopmentof thetest of goodness-of-fitthe structureof thechi-square
distribution itselfis worth examining. Figure9.1 showstwo chi-square distri-
butions. Givena normally distributed population of scoresY with a meanu, and
a variancea
2
, suppose that samples of sizen = 1 aredrawnfrom the distribution
andeach scoreis convertedto its corresponding standard score 2.
2
z - (7-fj,)/a and x =(Y-\\) /G definesthechi-square distribution with
one degreeof freedom. If samplesof n = 2 aredrawn, thenx,
2
is given by
2 -, 2 T
(Y] - n) /a + (F
2
- u,) /a". In fact if n independent measures aretaken ran-
domly from a distribution with mean\i, andvariancea
2
, %
2
is definedas the
sum of thesquaredstandardizedscores:
CHI-SQUARE 107
This is theessenceof thedistribution
2
that Pearson used in his formulationof
the testof goodness-of-fit. Why wassucha test so necessary?
Gamesof chanceandobservational errors in astronomyandsurveying were
subject to therandomprocessesthat led scientistsin the 18th andearly 19th
centuriesto theexaminationof error distributions.The quest was fora sound
mathematical basisfor the exerciseof estimating true values. Simpson intro-
duced a triangular distributionand Daniel Bernoulli in 1777 suggesteda
semicircular one.In the absenceof empirical data, these distributions, estab-
lished on a priori grounds, were somewhat arbitrarily regarded ashaving no
more and noless claimto accuracyandutility. But the 19th centurysaw the
normal law established.It hadpowerful credentials because of thefameof its
two main developers, Gauss andLaplace. Startingfrom the assumption that the
arithmetic mean represented the true value, Gauss showed that the error distri-
bution was normal. Laplace, startingfrom the view that every individual
observation arisesfrom a very great number of independently acting random
factors (the essence of the central limit theorem), cameto the same result.
Gauss's proof of themethodof least squares further establishedthe importance
of the normal distribution, andwhen,in 1801,he usedtheinitial data collected
from observationson a newplanet, Ceres,
3
to accurately predict where it would
be observed laterin theyear, these procedures, aswell asGauss'sreputation,
were firmly establishedin astronomy.
In astronomyaswell as inbiology andsocial science, the Laplace-Gaussian
distributionwasindeed law,it wasindeed regardedasnormal. This prescription
led to Quetelet'suse of it as a"double-edged sword" (see chapter 1) and led to
manyastronomers using it as areasonto reject observations that were consid-
ered to be doubtful, for example Peirce (1852). Quetelet's procedure for
establishingthe "fit" of the normal curvewas thesameasthat of the early
astronomers.The tabulation,andlater the graphing,of observedand expected
frequenciesled totheir being comparedby nothing more than visual inspection
(see, e.g., Airy, 1861).
2
A history of themathematicsof the x
2
distributionwould includethedevelopment of thegamma
function by theFrench mathematician 1. J. Bienayme(1796-1876),who, in the 1850s,found a statistic
that is equivalentto thePearsonX in the contextof least squares theory. Pearson wasapparentlynot
awareof his work; nor wereF. R. Helmertand E.Abbe, who,in the 1860sand 1870s, also arrived at the
X distributionfor the sum ofsquaresof independent normal variates. Long after the testhadbecome
commonlyused,von Mises (1919)linked Bienayme's workto thePearsonX2. Detailsof this aspectof
the testanddistribution's historyaregivenby Lancaster (1966), Sheynin (1966), andChang (1973).
3
Cereswas the firstdiscovered"planetoid"in theasteroid belt. Gauss also determined the
orbit of Pallas, another planetoid.
108 9. SAMPLING DISTRIBUTIONS
Therewassome dissent. Egon Pearson (1965) notes:
As a reactionto this view among astronomers I rememberhow SirArthur Eddington
in his Cambridgelectures about 1920on theCombinationof Observationsusedto
quotethe remark that"to saythat errors must obey the normal law means taking
awaythe right of thefree-born Englishmanto makeanyerror hedarn wellpleases!"
p. 6)
Karl Pearson'sfirst statistical paper (1894) was on theproblemof interpret-
ing a bimodal distributionas twonormal distributions, the problem thathad
arisenas aresult of Weldon's discovery that the distributionof the relative
frontal breadthsof his sampleof Naples crabswas adouble-humped curve. This
paper introducedthe methodof momentsas ameansof fitting a theoretical curve
to a set ofobservations.As Egon Pearson (1965) states it:
The question"doesa Normal curvefit this distributionandwhat does this mean if it
doesnot?"wasclearly prominent in their discussions.Therewerethreeobvious
alternatives;
(a) The discrepancybetweentheory andobservationis nomore than mightbe
expectedto arisein random sampling.
(b) The dataareheterogeneous, composedof two or more Normal distributions.
(c) The dataarehomogeneous, but thereis real asymmetryin thedistributionof
the variable measured.
The conclusion(c) mayhave been hard to accept, suchwas theprestige surrounding
the Normal law. (p. 9)
It appearsfrom Yule's lecture notes
4
that Karl Pearson probably was em-
ploying a procedure that used the ratio of anabsolute deviation from expectation
to its standard error to examinethe recordof 7,000 (actually 7,006) tosses of 12
dice madeby Mr Hull, a clerk at University College. Thiswas therecord (see
chapter 1) that Weldon said Karl Pearson hadrejectedas"intrinsically incred-
ible." Yule's notes also contain an empirical measureof goodnessof fit that
Egon Pearson saysmay be setdown roughly as R = I\O - 7]/ir, where
\O - 7] is theabsolute valueof the difference betweenthe observedand
theoreticalfrequencyand T thetotal theoretical frequency,thoughit shouldbe
mentionedthat the actual notesdo not containthe formula in this form. This
expression mean absolute error was in useduring the latter yearsof the 19th
century,andBowley usedit in the first edition of his textbookin 1902.
Karl Pearson's second statistical paper (1895) on asymmetricalfrequency
curves occupiedthe attentionof thebiologists,but thequestionof thebiological
meaningof skewed distributionswas not onethat, at the time, was in the
These notesarereproducedin Biometrikas Miscellanea(Yule, 1938a).
CHI-SQUARE 109
forefront of Pearson'sthoughts. Interestingly enough, Pearson did not use the
mean absolute error as atest of fit in any of hiswork. His preoccupationwas
with the developmentof a theoryof correlation,and it was inthis context that
he solvedthe goodness-of-fit problem. The 1895 paperand twosupplements
that followed in 1901 and 1916b introduceda comprehensive systemof fre-
quency curves that pointed a way tosampling distributions that arecentralto
the use ofstatisticaltests,but it was a waythat Pearson himself did not fully
develop.
Pearson's(1900a) seminal paper begins, "The object of this paper is to
investigatea criterion of the probability on anytheoryof an observed systemof
errors, and toapply it to the determinationof goodnessof fit in the caseof
frequency curves'" (p. 157).
Pearson takesa systemof deviationsx
}
, x
2
, . . ., x
n
from themeansof n
variables with standard deviations a,, cr
2
, . . . , a
n
andwith correlations r,
2
, r
13
,
an
d 7*23 > , '"n-i.n derivesx
2
as "the equationto a generalized 'ellipsoid,' all
over the surfaceof which thefrequencyof thesystemof errorsor deviations jc,,
x
2
,..., x
n
is constant"(p. 157).
It was Pearson's derivation of the multivariate normal distribution that
formedthe basisof the x
2
test. LargeJCj's represent large discrepancies between
theory andobservationand inturn would give large values of x,
2
. But x.
2
can
be madeto becomea test statisticby examiningthe probability of a systemof
errors occurring witha frequencyasgreator greater thanthe observed system.
Pearsonhadalready obtained an expressionfor themultivariatenormal surface,
and hehere givesthe probability of n errors whenthe ellipsoid, mentioned
earlieris squeezedto becomea sphere,which is thegeometric representation
of zero correlationin themultivariate space, andshowshow onearrivesat the
probability for a given valueof x
2
. When we compare Pearson's mathematics
with Hamilton Dickson'sexaminationof Gallon'selliptic contours, seenin the
scatterplotof two correlated variables while Galton waswaiting for a train, we
see how farmathematical statistics hadadvancedin just 15 years.
Pearsonconsidersan (n+l)-fold grouping with observed frequencies,
mi', m
2
', w
3
', . . . , w
n
', w
n
+ /, andtheoretical frequencies known a priori, m\,
m
2
, w
3
, . . . , w
n
, m
n
+,. 2w = Iw' = N, is thetotal frequency, and e = m' -m
is the error. The total errorZe (i.e., e
}
+ e
2
+ e
3
+... + e
n+]
) is zero. The degrees
of freedom,asthey are nowknown,follow; "Hence onlyn of the n + 1errors
arevariables;the n + 1th isdetermined when the first n areknown,and inusing
formula (ii) [Pearson'sbasicx.
2
formula] we treat onlyof n variables" (Pearson,
1900a,pp.160-161).
Starting withthe standard deviation for therandom variationof error and
the correlation between random errors, Pearson uses a complex trigonometric
110 9. SAMPLING DISTRIBUTIONS
transformationto arrive at aresult"of very great simplicity":
Chi-square (thestatistic)is theweightedsum ofsquared deviations.
Pearson (1904b, 1911) extended the use of thetestto contingency tables and
to the twosamplecaseand in1916(a) presented a somewhat simpler derivation
thanthe onegiven in 1900a,a derivation that acknowledges the work of Soper
(1913). The first 20 yearsof this century brought increasing recognition that
the test was of thegreatest importance, starting with Edgeworth , who wrote to
Pearson, "I haveto thankyou for your splendid methodof testingmy mathe-
matical curvesof frequency. That x? of yoursis one of themost beautiful of
the instruments that you have addedto theCalculus" (quotedby Kendall, 1968,
p. 262).
And even Fisher, who bythat timehadbecomea fierce critic: "The testof
goodnessof fit was devisedby Pearson,to whose labours principallywe now
owe it, that the test may readily be appliedto agreat varietyof questionsof
frequencydistribution" (Fisher, 1922b, p. 597).
In the next chapter the argument that arose between Pearson andYule on the
assumptionof continuous variation underlying the categoriesin contingency
tablesis discussed. When Fisher's modifications andcorrectionsto Pearson's
theory were accepted,it was Yule who helpedto spreadthe word on the
interpretationof the newideaof degreesof freedom.
The goodness-of-fit test is readily applied whenthe expected frequencies
basedon some hypothesisareknown. For example, hypothesizing that the
expected distributionis normal witha particular meanandstandard deviation
enablesthe expected frequency of any valueto bequickly calculated. Today
X
2
is perhaps moreoften appliedto contingency tables, where the expected
valuesarecomputedfrom the observed frequencies.
This now routine procedure forms the basisof one of themost bitter disputes
in statisticsin the 1920s.In 1916 Pearson examined the questionof partial
contingency. The fixed n in agoodness-of-fit test imposes a constrainton the
frequenciesin thecategories; only k - 1 categories are free to vary. Pearson
realizedthat in thecaseof contingencytables additional linear constraints were
placed on thegroup frequencies, but heargued that these constraints did not
allow for a reductionin thenumberof variablesin the casewherethe theoretical
distribution wasestimatedfrom the observed frequencies. Other questions had
also been raised.
Raymond Pearl(1879-1940),an American researcherwho was at the
Biometric Laboratoryin the mid-191Os, pointedout some problemsin the
CHI-SQUARE 111
applicationof X
2
in 1912, noting that some hypothetical data he presents clearly
showan excellent "fit" between observed andexpected frequency but that the
valueof x
2
wasinfinite! Pearson,of course, replied, but Pearlwasunmoved.
I have earlier pointedout other objectionsto the x
2
test ... I have never thought it
necessaryto makeany rejoinderto Pearson'scharacteristically bitter replyto my
criticism, nor do Iyet. The x
2
test leadsto this absurdity. (Pearl, 1917, p. 145)
Pearl repeatsthe argument that essentially notes that in caseswhere thereare
small expected frequencies in thecells of thetable,the valueof X
2
can begrossly
inflated.
Karl Pearson (1917), in a reproof that illustrates his disdainfor thosehe
believed hadbetrayedthe Biometric school, respondsto Pearl: "Pearl . . .
provides a striking illustrationof how the capable biologist needs a long
continuedtraining in thelogic of mathematics before he ventures intothe field
of probability" (p. 429).
Pearl had infact raised quite legitimate questions about the applicationof
the X
2
test, but in thecontext of Mendelian theory,to which Pearsonwas
steadfastly opposed. A close readingof Pearl's papers perhaps reveals that he
hadnot followed all Pearson's mathematics, but thequestionshe hadraised were
germane. Pearson's response is theapotheosisof mathematical arrogance that,
on occasion, frightens biologists andsocial scientists today:
Shortly Dr Pearl'smethodis entirelyfallacious,as anytrained mathematician would
have informedDr Pearlhad hesought advice before publication. It is most regret-
table that such extensions of biometric theory shouldbe lightly published, without
any duesenseof responsibility,not solely in biological but in psychological journals.
It canonly bring biometry into contempt as ascienceif, professinga mathematical
foundation,it yet showsin its manifestations most inadequate mathematical reason-
ing. (Pearson, 1917, p. 432)
Social scientists beware!
In 1916 Ronald Fisher, then a schoolmaster, raised his standardand made
his first foray into whatwas tobecomea war. The following correspondence
is to befound in E. S.Pearson (1968):
Dear Professor Pearson,
Thereis anarticle by Miss Kirstine Smithin thecurrent issueof Biometrika which,
I think, ought not topass without comment. I enclosea short note uponit.
Miss Kirstine Smith proposes to use theminimum valueof x
2
as acriterion to
determinethe best formof thefrequencycur ve; . . .It shouldbe observed that x
canonly be determined whenthe materialis grouped into arrays, andthat its value
112 9. SAMPLING DISTRIBUTIONS
dependsuponthe mannerin which it is grouped.
Thereis ... something exceedingly arbitrary in a criterion which depends entirely
uponthe mannerin which the datahappensto begrouped, (pp. 454-455)
Pearson replied:
DearMr Fisher,
I amafraid that I don't agreewith your criticismof FrokenK. Smith (sheis apupil
of Thiele'sand one of themost brilliant of theyounger Danish statisticians). . . .
your argumentthat x
2
varies withthegroupingis of course well known... What we
haveto determine, however, is with givengrouping which method gives the lowest
X
2
. (p. 455)
Pearson asks for a defenseof Fisher's argument before considering its publica-
tion. Fisher's expanded criticism received short shrift. After thanking Fisher
for a copy of his paperon Mendelian inheritance (Fisher, 1918), he hopesto
find time for it,
5
but pointsout that he is"not a believerin cumulative Mendelian
factorsasbeingthe solutionof theheredity puzzle"(p. 456). He then rejects
Fisher's paper.
Also I fear thatI do not agreewith your criticismof Dr Kirstine Smith's paper and
under present pressure of circumstances must keep the little spaceI have in
Biometrika free from controversy which can only wastewhat power I have for
publishing original work.(p. 456)
Egon Pearson thinks that we canaccepthis father's reasons for these rejections;
the pressureof war work, the suspensionof manyof his projects,the fact that
he wasover 60 yearsold with much workunfinished,and hismemoryof the
strainthat hadbeen placedon Weldonin the rowwith Bateson.But if he really
wastrying to shun controversyby refusingto publish controversial views, then
he most certainlydid not succeed. Egon Pearson's defense is entirely under-
standablebut far toocharitable. Pearson hadcensored Fisher's work before and
appearsto have been trying to establishhis authority overthe youngerman and
over statistics. Evenhis offer to Fisher of an appointmentat the Galton
Laboratory mightbe viewed in this light. FisherBox (1978) notes that these
experiencesinfluencedFisherin his refusal of theoffer, in thesummerof 1919,
recognizing that, "nothing would be taughtor publishedat theGalton laboratory
5
Here Pearsonis dissembling. Fisher's paper wasoriginally submittedto theRoyal Society
and,althoughit was notrejected outright, the Chairmanof theSectional Committee for Zoology
"had beenin communication withthe communicatorof thepaper,who proposedto withdraw it."
Karl Pearsonwas one of the tworeferees. Norton and Pearson (1976) describe the eventand
publishthe referees' reports.
CHI-SQUARE 113
without the approvalof Pearson"(p. 82).
The last strawfor Fisher camein 1920 whenhe senta paperon theprobable
error of thecorrelationcoefficient to Biometrika:
Dear Mr Fisher,
Only in passingthrough Town todaydid I find your communicationof August 3rd.
I am very sorryfor thedelayin answering ...
As therehasbeena delayof three weeks already, and as Ifear if I could givefull
attentionto your paper, whichI cannotat thepresent time,I shouldbe unlikely to
publish it in its presentform ... I would preferyou published elsewhere... I am
regretfully compelledto excludeall that I think erroneouson my ownjudgment,
becauseI cannotafford controversy,(quotedby E .S.Pearson,1968, p. 453)
Fishernever again submitted a paperto Biometrika,and in1922(a) tackled
the X
2
problem:
This short paper withall its juvenile inadequacies, yet didsomethingto breakthe
ice. Any readerwho feelsexasperatedby its tentativeandpiecemeal character should
remember that it had to find its way topublicationpast critics who, in the first place,
could not believe thatPearson'swork stoodin needof correction,andwho, if this
had to beadmitted, were sure that they themselves had corrected it. (Fisher's
commentin Bennett, 1971, p. 336)
Fishernotes that he is notcriticizing the generaladequacyof the X
2
testbut
that he intendsto show that:
the valueof n' with which the table shouldbe enteredis not nowequalto thenumber
of cells but to onemore thanthe numberof degreesof freedomin the distribution.
Thusfor a contingency tableof r rows and ccolumnswe should taken' = (c- \)(r
- 1) + 1insteadof n' = cr. This modificationoften makesa very great difference
to theprobability (P) that a givenvalueof x
2
should have been obtained by chance.
(Fisher, 1922a,p.88)
It shouldbe noted that Pearson entered the tablesusingn' = v + 1, where
' is the numberof variables(i.e., categories) and v iswhat we nowcall degrees
of freedom. The modern tables areenteredusing v = (c - 1 )(r - 1). The use of
' to denote sample size and n todenote degrees of freedom, even thoughin
many writings n wasalso usedfor samplesize,sometimes leads to frustrating
readingin these early papers.
It is clear that Pearson did not recognise that in all caseslinear restrictions imposed
upon the frequenciesof thesampled population, by our methodsof reconstructing
as
that population, have exactly the sameeffect uponthe distributionof x have
restrictions placed upon the cell contentsof thesample. (Fisher, 1922a, p. 92)
114 9. SAMPLING DISTRIBUTIONS
Pearsonwasangeredandcontemptuouslyrepliedin the pagesof Biometrika.
He reiteratesthe fundamentalsof his 1900 paperandthen says:
The processof substitutingsampleconstantsfor sampled populationconstantsdoes
not mean thatwe selectout of possible samples of sizen, those which have precisely
the same valuesof theconstantsas theindividual sample under discussion. ... In
using the constantsof the given sampleto replacethe constantsof the sampled
population,we in nowise restrictthe original hypothesisof free random samples
tied down onlyby their definite size.We certainlydo not byusing sample constants
reducein any way therandom sampling degrees of freedom.
The abovere-descriptionof what seemto mevery elementaryconsiderations
would be unnecessaryhad not arecent writerin theJournal of the Royal Statistical
Societyappearedto have wholly ignored them... thewriter hasdoneno serviceto
the scienceof statisticsby giving it broad-cast circulation in thepagesof the Journal
of the Royal Statistical Society. (K.Pearson, 1922, p. 187)
And on and on, neverreferringto Fisherby namebut only as "mycritic" or "the
writer in the Journalof the Royal Statistical Society,'"until thefinal assault when
he accuses Fisher of a disregardfor the natureof the probable error:
I trust my critic will pardonme for comparinghim with Don Quixote tiltingat the
windmill; hemust eitherdestroyhimself,or thewhole theoryof probableerrors,for
they areinvariably basedon using sample values for thoseof thesampled population
unknownto us. Forexample hereis anargumentfor Don Quixoteof thesimplest
nature... (K. Pearson, 1922, p. 191)
The editorsof the Journal of the Royal Statistical Society turned tail andran,
refusing to publish Fisher'srejoinder. Fisher vigorously protested, but to no
avail, and heresignedfrom the Society. There wereother questions about
Pearson's paper that he dealtwith in Bowley'sjournal Economicain 1923and
in the Journalof the Royal Statistical Society (Fisher,1924a),but the endcame
in 1926 when, using data tables that hadbeenpublishedin Biometrika, Fisher
calculatedthe actual average value of x
2
which he hadproved earlier should
theoreticallybe unity andwhich Pearsonstill maintained shouldbe 3. Ineverycase
the averagewascloseto unity, in no casenearto 3. . . . Therewas noreply. (Fisher
Box, 1978, p. 88,commentingon Fisher, 1926a)
THE t DISTRIBUTION
W. S. Gosset was aremarkable man, not theleast becausehe managedto
maintainreasonably cordial relations with both PearsonandFisher,and at the
same time.Nor did heavoid disagreeingwith themon various statistical issues.
He wasborn in 1876 at Canterburyin England,andfrom 1895to 1899 was
THE t DISTRIBUTION 115
at NewCollege, Oxford, wherehe took a degreein chemistryandmathematics.
In 1899 Cossetbecamean employeeof Arthur Guinness,Son andCompany
Ltd., the famous manufacturers of stout. He was one of the first of the scientists,
trainedeither at Oxford or Cambridge,that the firm hadbegun hiring(E. S.
Pearson, 1939a). His correspondencewith Pearson, Fisher, andothers shows
him to have beena witty andgenerousmanwith a tendencyto downplayhis
role in the developmentof statistics. Comparedwith the giantsof his day he
published very little,but hiscontributionis of critical importance. As Fisher
putsit in his Statistical Methodsfor ResearchWorkers:
The studyof theexact distributions of statistics commences in 1908 with"Student's"
paper TheProbable Error of the Mean. Oncethe true natureof the problemwas
indicated,a large number of sampling problems were within reachof mathematical
solution. (Fisher, 1925/1970, p. 23)
The breweryhad apolicy on publishingby its employees that obliged Gosset
to publishhis work underthe nomdeplume"Student."In essence, the problem
that "Student"tackled was thedevelopmentof a statistical test that could be
applied to small samples.The natureof the process of brewing, with its
variability in temperatureandingredients, means that it is not possibleto take
large samples over a long run.In a letter to Fisherin 1915,in which he thanks
Fisher for theBiometrika paper that begins the mathematical solutionto the
small sample problem, "Student"says:
The agricultural (and indeed almost any) Experiments naturallyrequireda solution
of the mean/S.D. problem, and theExperimental Brewerywhich concerns such
thingsas theconnection between analysis of malt or hops,and thebehaviourof the
beer,andwhich takesa day toeachunit of theexperiment, thus limiting the numbers,
demandedan answerto such questions as, "If with a small numberof casesI get a
valuer, what is theprobability that thereis really a positive correlationof greater
valuethan (say) .25?" (quoted by E. S.Pearson, 1968, p. 447)
Egon Pearson (1939a) notes that in his first fewyearsat Guinness, Gosset
wasmakinguse ofAiry's Theoryof Errors (1861), Lupton's Notes on Obser-
vations(1898),andMerriman'sTheMethodof Least Squares (1884). In 1904
he presenteda report to his firm that stated clearly theutility of theapplication
of statisticsto thework of thebreweryandpointedup theparticulardifficulties
that mightbe encountered. The report concludes:
We have beenmet with thedifficulty that noneof our books mentions the odds, which
areconveniently accepted asbeingsufficient to establishany conclusion,and itmight
be of assistanceto us toconsult some mathematical physicist on thematter, (quoted
by Pearson,1939a,p.215)
116 9. SAMPLING DISTRIBUTIONS
A meetingwas infact arranged with Pearson, andthis took placein the
summerof 1905.Not all of Gosset'sproblems were solved, but asupplement
to his report and asecond report in late August 1905 produced many changes
in the statisticsusedin thebrewery. The standard deviation replaced the mean
error, andPearson's correlation coefficient becamean almost routine procedure
in examining relationships among the many factors involved with brewing. But
onefeatureof thework concerned Cosset: "Correlation coefficientsareusually
calculatedfrom large numbersof cases,in fact I havefound only onepaperin
Biometrikaof which the casesare as few innumberasthoseat which I have
been working lately" (quoted by Pearson, 1939a, p. 217).
Gosset expressed doubt about the reliability of theprobable error formula
for the correlationcoefficient whenit was appliedto small samples. He went to
London in September 1906to spenda year at theBiometric Laboratory.His
first paper, published in 1907, derives Poisson's limit of the binomial distribu-
tion andappliesit to theerror in sampling when yeast cells arecountedin a
haemacytometer.But hismost important work during that year was theprepa-
ration of his twopaperson theprobable error of themeanand of thecorrelation
coefficient, both of which werepublishedin 1908.
The usual methodof determining that the meanof thepopulation lies withina given
distanceof themeanof thesample,is to assumea normal distribution about the mean
of the sample witha standarddeviation equalto
5
/Vn, where s is thestandard
deviationof thesample,and to use thetablesof theprobability integral.
But, as wedecreasethe valueof the numberof experiments,the valueof the
standard deviationfound from the sampleof experiments becomes itself subject to
increasing error, until judgements reached in this way become altogether misleading.
("Student,"1908a,pp. 1-2)
"Student"setsout what the paper intendsto do:
I. The equationis determinedof thecurve which represents the frequency distribu-
tion of standard deviations of samples drawnfrom a normal population.
II. Thereis shownto be nokind of correlation betweenthe meanand thestandard
deviation of sucha sample.
III. The equationis determinedof thecurve representingthe frequency distribution
of a quantityz, which is obtainedby dividing the distance betweenthe meanof the
sampleand themeanof thepopulationby thestandard deviationof the sample.
IV. The curve foundin I. is discussed.
V. The curvefound in III. is discussed.
VI. The twocurvesarecompared with some actual distributions.
VII. Tablesof thecurves foundin III. aregiven for samplesof different size.
VIII and IX. Thetablesareexplainedandsome instances aregiven of their use.
X. Conclusions.
("Student,"1908, p. 2)
THE t DISTRIBUTION 117
"Student"did not providea proof for thedistributionof z. Indeed,he first
examined this distribution, andthat of s, byactually drawing samples (of size4)
from measurements, made on 3,000 criminals, takenfrom data usedin a paper
by Macdonell (1901).The frequencydistributionsfor s and zwere thus directly
obtained,and themathematical work came later. There hasbeen comment over
the yearson thefact that "Student's"mathematical approachwas incomplete,
but this shouldnot detractfrom his achievements. Welch (1958)maintainsthat:
The final verdict of mathematical statisticians will, I believe,be that they have lasting
value. They havethe rarequality of showingus how anexceptional man wasable
to make mathematical progresswithout payingtoo much regardto therules. He
fortified what he knew with some tentative guessing, but this was backedby
subsequent careful testing of his results, (pp.785-786)
In fact "Student"hadgiven to future generationsof scientists,in particular
social and biological scientists,a new andpowerful distribution.The z test,
which was tobecomethe ttest,
6
led the way for allkinds of significance tests,
and indeedinfluenced Fisheras hedeveloped that most useful of tools, the
analysisof variance. It is alsothe case that the 1908 paperon theprobable error
of the mean ("Student," 1908a) clearly distinguishedbetween what we nowcall
sample statisticsand population parameters,a distinction thatis absolutely
critical in modern-day statistical reasoning.
The overwhelminginfluence of the biometriciansof Gower Streetcan
perhaps partly account for thefact that "Student's" workwasignored. In 1939
Fishersaid that"Student's"work wasreceivedwith "weighty apathy," and, as
late as 1922, Gosset, writingto Fisher,andsendinga copy of his tables, said,
"You are theonly manthat's everlikely to usethem!" (quotedby Fisher Box,
1981, p. 63). Pearson was, to say theleast, suspiciousof work using small
samples.It was theassumptionof normality of thesamplingdistributionthat led
to problems,but thebiometricians never used small samples,and"only naughty
brewers taken sosmall that the difference is not of theorder of the probable
error!" (Pearson writingto Gosset, September 1912, quoted by Pearson, 1939a,
p. 218).
Someof "Student's"ideashadbeen anticipatedby Edgeworthasearly as
1883 and,as Welch (1958) notes, one might speculateas towhat Gosset's
reaction would have been had hebeen awareof this work.
Gosset'spaperon theprobable error of thecorrelation coefficient ("Student,"
1908b) dealt withthe distributionof the r values obtained when sampling from
6
Eisenhart (1970) concludes that theshift from z to / was due toFisherandthat Gosset chose
/ for the newstatistic. In their correspondence Gosset used t for his owncalculationsand x forthose
of Fisher.
118 9. SAMPLING DISTRIBUTIONS
a populationin which the twovariables were uncorrelated, that is, R = O
7
. This
endeavorwasagain basedon empirical sampling distributions constructed from
the Macdonell dataand amathematical curve fitted afterwards. With charac-
teristic flair, he says thathe attemptedto fit a Pearson curveto the "no
correlation" distributionand cameup with a Type II curve. "Workingfrom
2
0
2
(n-4)/2
y= yoC\. -x ) for samplesof 4, I guessedthe formula y = y
Q
(\ -x )
and proceededto calculatethe moments" ("Student,"1908b,p. 306).
He concludeshis paperby hoping thathis work "may serveasillustrations
for the successful solver of theproblem"(p. 310). And indeedit did, for Fisher's
paperof 1915 showed that "Student" hadguessed correctly. Fisher hadbeenin
correspondence with Gosset in 1912,sendinghim a proof that appealedto
-dimensional spaceof thefrequency distributionof z. Gosset wantedPearson
to publishit.
DearPearson,
I am enclosing a letter which givesa proof of my formulae for the frequency
distribution of z(=xls),... Would you mind lookingat it for me; Idon'tfeel at home
in morethanthreedimensions evenif I could understandit otherwise.
It seemedto methat if it's all right perhapsyou might like to put theproof in a note.
It's so nice andmathematical that it might appealto some people, (quoted by E. S.
Pearson,1968,p. 446)
In fact the proof was notpublished then, but again approachingthe mathe-
matics througha geometrical representation, Fisher derived the samplingdis-
tribution of thecorrelation coefficient, andthis, together withthe derivationof
the 2distribution,waspublishedin the 1915 paper.The sampling distribution
of r was,of course,of interestto Pearsonand hiscolleagues; after all, r was
Pearson'sstatistic. Moreover, the distribution followsa remarkable systemof
curves, witha variety of shapes that depart greatly from the normal, depending
on n and thevalueof thetrue unknown correlation coefficient R, or, as it is now
generally known, p. Pearsonwasanxiousto translate theory into numbers, and
the computationof thedistributionof r wascommencedandpublishedas the
"co-operativestudy" of Soper, Young, Cave, Lee,and K. Pearsonin 1917.
Although Pearsonhadsaid thathe would send Fisher the proofsof the paper
(the letteris quotedin E. S.Pearson, 1965), there is, apparently,no record that
he in fact received them.E. S.Pearson (1965) suggests that Fisher did notknow
"until late in the day" of thecriticism of his particular approachin a sectionof
the study,"On theDeterminationof the'Most Likely' Valueof the Correlation
7
R wasGosset's symbol for thepopulation correlationcoefficient, p (the Greek letter rho)
appearsto have beenfirst usedfor this valueby Soper(1913). The important pointis thata different
symbol wasusedfor sample statistic andpopulationparameter.
THE t DISTRIBUTION 119
in the Sampled Population." Egon Pearson argues that his father's criticism,
for presumablythe elder Pearsonhadtakena major rolein thewriting of the
"co-operativestudy," was amisunderstanding based on Fisher'sfailure to
adequately define what he meantby "likelihood" in 1912and thefailure to make
clear that it was not basedon the Bayesian principleof inverse probability.
FisherBox (1978) says, "their criticismwas asunexpectedas it wasunjust,and
it gavean impressionof something less than scrupulous regardfor a new and
therefore vulnerable reputation" (p. 79).
Fisher's response, published in Metron in 1921 (a),was thepaper, mentioned
earlier, that Pearson summarily rejected because he could not afford contro-
versy. In this paper Fisher makes use of the r =tanh z transformation, a
transformation that hadbeen introduced, a little tentatively,in the 1915 paper.
Its immense utilityin transformingthe complex systemof distributionsof r to
a simple functionof z, which is almost normally distributed, made the laborious
work of theco-operative study redundant. In a paper publishedin 1919, Fisher
examinesthe dataon resemblancein twins thathadbeen studiedby EdwardL.
Thorndikein 1905. Thisis theearliest exampleof theapplicationof Fisherian
statisticsto psychological data,for thetraits examined were both mental and
physical. Fisher lookedat thequestionof whether there wereany differences
between the resemblancesof twins in different traits and here usesthe z
transformation:
When the resemblances have been expressed in termsof the new variable, a
correlation table may beconstructedby picking out every pairof resemblances
betweenthe same twinsin different traits. The valuesare nowcentered symmetri-
cally abouta meanat 1.28, and thecorrelationis found to be-.016 .048, negative
but quite insignificant.The result entirelycorroboratesTHORNDIKE'S conclusions
as to thespecializationof resemblance. (Fisher, 1919, p. 493)
These manipulations and thedevelopment of thesame general approach to
the distributionof the intraclass correlation coefficient in the 1921 paperare
important for the fundamentalsof analysisof variance.
From the point of view of the present-day student of statistics,Fisher's
(1925a) paperon theapplicationsof the tdistributionis undoubtedlythe most
comprehensible. Here we see,in familiar notationandmost clearly stated, the
utility of the tdistribution,a proof of its "exactitude... for normal samples"
(p. 92), and theformulaefor testingthe significanceof a difference between
meansand thesignificanceof regression coefficients.
Finally the probability integral withwhich we areconcernedis of valuein calculating
the probability integral of a wider classof distributionswhich is relatedto "Student's"
distribution in thesame manner as that of X2 is relatedto thenormal distribution.
120 9. SAMPLING DISTRIBUTIONS
This wider classof distributions appears(i) in the study of intraclasscorrelations
(ii) in thecomparisonof estimatesof thevariance,or of thestandard deviation from
normal samples(iii) in testingthe goodnessof fit of regressionlines (iv) in testing
the significanceof a multiple correlation,or (v) of acorrelation ratio. (Fisher, 1925a,
pp. 102-103)
These monumental achievements were realized in less than10 yearsafter
Gosset's mixture of mathematical conjecture, intuition, and thepractical neces-
sities of his work led the way to the t distribution.
From 1912, Gosset andFisherhadbeenin correspondence (although there
weresome lengthy gaps), but they did not actually meetuntil September 1922,
when Gosset visited Fisher at Harpenden. Fisher Box (1981) describes their
relationshipandreproduces excerpts from their letters.At the end ofWorld War
I, in 1918, Gosset did not even knowhow Fisherwasemployed,andwhenhe
learnedthat Fisherhadbeena schoolmaster for thedurationof the war and was
looking for a job wrote, "I hear that Russell [the head of the Rothamsted
Experimental station] intends to get astatistician soon, when he getsthe money
I think, and it might be worth while to keep your ears open to news from
Harpenden" (quotedby Fisher Box, 1981, p. 62).
In 1922, work beganon thecomputationof a new set of t tables using values
of t = z\n - 1 rather thanz, and thetables were entered with the appropriate
degreesof freedom rather than n. FisherandGosset both worked on the new
tables,andafter delays,andfits andstarts,and thecheckingof errors,the tables
Fig. 9.2 The Normal Distribution and the t
Distribution for df = 10 and df = 4
THE F DISTRIBUTION 121
were publishedby "Student" in Metron in 1925. Figure9.2 compares two?
distributionsto thenormal distribution. Fisher's "Applications" paper, men-
tioned earlier, appeared in thesame volume, but it had,in fact, been written quite
early in 1924. At that time Fisher wascompletinghis 1925 bookand needed
tables. Gosset wanted to offer the tablesto Pearsonandexpressed doubts about
the copyright, because the firsttableshadbeen published in Biometrika. Fisher
went aheadandcomputedall the tableshimself, a taskhe completedlater in the
year (Fisher Box, 1981).
Fisher sentthe "Applications" paperto Gossetin July 1924. He says that
the noteis:
larger thanI hadintended,and tomakeit at all complete shouldbe larger still,but I
shall not have timeto makeit so, as I amsailing for Canadaon the25th, andwill not
be backtill September, (quoted by Fisher Box, 1981, p. 66)
The visit to Canadawas madeto presenta paper (Fisher, 1924b) at the
International Congress of Mathematics, meeting that year in Toronto. This paper
discussesthe interrelationshipsof X
2
,z, and /. Fisherwasonly 35 years old,and
yet thefoundationsof his enormousinfluenceon statistics werenow securely
laid.
THE F DISTRIBUTION
The Toronto paperdid not, in fact, appearuntil almost4 yearsafter the meeting,
by which timethe first edition of Statistical Methodsfor ResearchWorkers
(1925) hadbeen published. The first use of ananalysisof variance technique
was reported earlier (Fisher & Mackenzie, 1923),but this paper was not
encounteredby many outsidethe areaof agricultural research.It is possible
that if the mathematicsof theToronto paperhadbeen includedin Fisher's book
then muchof the difficulty that its first readershad with it would have been
reducedor eliminated,a point that Fisher acknowledged.
After a general introductionon error curvesand goodness-of-fit, Fisher
examinesthe x,
2
statisticandbriefly discusseshis (correct) approachto degrees
of freedom. He then pointsout that if a numberof quantitiesx1,, . . . , * are
distributedindependentlyin the normaldistribution with unit standard devia-
tion, thenx.
2
= Lx is distributedas"the Pearsonian measure of goodnessof
fit." In fact, Fisher uses S(x
2
) to denotethe latter expression, but here,and in
what follows on the commentaryon this paper,the more familiar modern
notationis employed. Fisher refers to "Student's"work on theerror curveof
the standard deviationof a small sample drawnfrom a normal distributionand
showsits relationto .
2
-
122 9. SAMPLING DISTRIBUTIONS
wheren - 1 is thenumberof degreesof freedom(one less thanthe numberin
2
the sample)and s
2
is thebest estimatefrom the sampleof thetrue variancecr.
2
For the generalz distribution, Fisherfirst pointsout that s,
2
and s
2
are
misleadingestimatesof a, and a
2
when sample sizes, drawn from normal
distributions,aresmall:
The only exacttreatmentis to eliminatetheunknown quantitiesa, and cr
2
from the
distribution by replacingthe distributionof s bythat of log s, and soderiving the
distribution of log s
t
/s
2
. Whereasthe samplingerrorsin s, areproportional to Oj
the sampling errorsof log s, depend only uponthe sizeof thesample from which
5, wascalculated. (Fisher, 1924b, p. 808)
thenz will bedistributed aboutlog a, /a
2
asmode,in a distribution which depends
wholly uponthe integersn
l
and
2
. Knowing this distributionwe cantell at onceif
an observed valueof z is or is notconsistent withany hypothetical valueof theratio
CT, /o
2
. (Fisher, 1924b, p. 808)
The casesfor infinite andunit degreesof freedomarethen considered.In
the latter casethe "Student" distributions aregenerated.
In discussingthe accuracyto beascribedto themeanof a small sample,"Student"
took the revolutionary stepof allowing for the random sampling variationof his
estimateof thestandard error.If the standard error were known with accuracy the
deviationof anobserved valuefrom expectation (sayzero),divided by the standard
error would be distributed normally with unit standard deviation; but if for the
accuratestandarddeviationwe substituteanestimatebasedon n - 1 degreesof
freedomwe have
consequentlythe distributionof t is given by putting n, [degreesof freedom] = 1
andsubstitutingz = V* log t
2
. (Fisher, 1924b,pp. 808-809)
THE F DISTRIBUTION 123
For thecaseof aninfinite numberof degreesof freedom,the tdistribution
becomesthe normal distribution.
In the finalsectionsof thepaperthe r - tanhz transformationis appliedto
the intraclass correlation, an analysisof variance summary table is shown,and
z=log
e
s
}
/s
2
is thevalue thatmay beusedto testthe significanceof theintraclass
correlation. These basic methods arealso shownto leadto testsof significance
for multiple correlationand r\ (eta),the correlation ratio.
The transition from z to F was notFisher's work.In 1934, GeorgeW.
Snedecor,in the first of anumberof texts designedto makethe techniqueof
analysisof variance intelligibleto awider audience, definedF as theratio of
the larger mean squareto thesmaller mean square taken from the summary
table, usingthe formula z = '/2log
e
F. Figure 9.3 showstwo F distributions.
SnedecorwasProfessorof MathematicsandDirector of theStatistical Labora-
tory at Iowa State Collegein the United States. This institution was largely
responsiblefor promotingthe valueof themodern statistical techniques in North
America.
Fisher himself avoidedthe use of thesymbol F becauseit was notusedby
P. C. Mahalonobis,an Indian statisticianwho hadvisited Fisherat Rothamsted
in 1926,and who hadtabulatedthe valuesof thevariance ratioin 1932,andthus
established priority. Snedecor did not know of the work of Mahalonobis,a
clear-thinkingand enthusiastic worker who becamethe first Director of the
Indian Statistical Institutein 1932. Todayit is still occasionallythe case that the
ratio is referredto as"Snedecor'sF " in honorof both Snedecor andFisher.
Fig 9.3 The F Distributions for 1,5 and 8,8 Degrees of Freedom
124 9. SAMPLING DISTRIBUTIONS
THE CENTRAL LIMIT THEOREM
The transformationof numerical datato statisticsand theassessment of the
probability of theoutcome beinga chance occurrence, or theassessment of a
probable rangeof values within whichthe outcome will fall, is a reasonable
general descriptionof the statistical inferential strategy. The mathematical
foundationsof theseprocedures rest on aremarkable theorem, or rathera set of
theorems, that aregrouped together as theCentral Limit Theorem. Much of the
work that led tothis theoremhasalready been mentioned but it is appropriate
to summarizeit here.
In the most general terms, the problemis to discover the probability
distribution of the cumulativeeffect of many independently acting, andvery
small, randomeffects.The centrallimit theorem brings together the mathemati-
cal propositions that show that the required distributionis thenormal distribu-
tion. In other words,a summaryof a subsetof randomobservations^,^,..
. ,X
n
, saytheir sum'LX=X
]
+ X
2
+... + X
n
or their mean,(Ud/n has asampling
distribution that approaches the shapeof thenormal distribution. Figure9.4 is
an illustration of a sampling distributionof means. This chapter hasdescribed
distributionsof other statistical summaries, but it will have been noted that, in
one way oranother, those distributions have links with the normal distribution.
Fig. 9.4 A Sampling Distribution of Means (n = 6). 500 draws
from the numbers 1 through 10.
THE CENTRAL LIMIT THEOREM 125
The history of thedevelopment of thetheoremhasbeen brought together in
a useful andeminently readable little book by Adams (1974). The story begins
with the definition of probability aslong-run relative frequency, the ratio of the
numberof waysan eventcanoccurto thetotal numberof possible events, given
that the eventsareindependent andequallylikely. The most important contri-
bution of James Bernoulli'sArs Conjectandi is the first limit theoremof
probability theory. The logic of Bernouilli's approachhasbeenthe subjectof
some debate, and weknow thathe himself wrestled withit. Hacking (1971)
states:
No onewrites dispassionatelyaboutBernoulli. He hasbeenfathered with the first
subjective conceptof probability, and with a completely objectiveconcept of
probability asrelative frequency determined by trials on achanceset-up. He has
beenthought to favour an inductive theoryof probability akinto Carnap's.Yet he
is saidto anticipateNeyman'sconfidence interval technique of statistical inference,
which is quite opposedto inductive probabilities.In fact Bernoulli was, likeso many
of us, attractedto many of these seemingly incompatible ideas, and he wasunsure
whereto resthis case.He left his book unpublished, (pp. 209-210)
But Bernoulli's mathematics are not inquestion. Whenthe probability^ of
an eventE is unknownand asequenceof n trials is observedand theproportion
of occurrencesof is
n
then Bernoulli maintainedthat an "instinct of nature"
causesus to use
n
as anestimateof p. Bernoulli'sLaw of Large Numbers
shows thatfor anysmall assigned amount 8, |p - E
n
\ < e increasesto 1 as
increasesindefinitely:
It is concluded thatin 25,550trials it is more thanonethousand timesmorelikely
that the r/t [the ratioof what Bernoulli calls "fertile" events to thetotal, that is, the
event of interest herecalled ] will fall between 31/50and29/50than thatr/t will
fall outside these limits. (Bernoulli, 1713/1966, p. 65)
The aboveresults holdfor known p. If/? is 3/5, thenwe can bemorally certainthat
... thedeviation ... will beless than 1/50.But Bernoulli's problemis theinverse
of this. When/?is unknown, can hisanalysis tell whenwe can bemorally certain
that someestimateof p isright? Thatis theproblemof statistical inference. (Hacking,
1971,p. 222)
The determination of/?was ofcoursethe problem tackledby De Moivre.
He usedthe integrale~*
2
to approximateit, but asAdams (1974)andothers have
noted, thereis nodirect evidence that implies that De Moivre thoughtof what
is now called the normal distribution as aprobability distribution as
such. Simpson introduced the notion of a probability distributionof obser-
vations,andothers, notably Joseph Lagrange (1736-1813)andDaniel Bernoulli
126 9. SAMPLING DISTRIBUTIONS
(1700-1782), who wasJames's nephew, elaborated laws of error.The late 18th
centurysaw theculminationof this workin the development of the normallaw
of frequencyof error and itsapplicationto astronomical observations by Gauss.
It wasLaplace's memoir of 1810 that introduced the central limit theorem, but
the nub of hisdiscoverywas describedin nonmathematical language in his
famous Essai publishedas theintroductionto the third editionof Theorie
AnalytiquedesProbabilitesin 1820:
The general problem consists in determiningthe probabilities that the valuesof one
or several linear functions of theerrorsof a very great number of observationsare
contained withinany limits. The law of thepossibility of theerrorsof observations
introduces intothe expressionsof these probabilities a constant, whose value seems
to requirethe knowledgeof this law, whichis almost always unknown. Happily this
constantcan bedeterminedfrom the observations.
Thereoften existsin theobservationsmanysourcesof errors:...The analysis
which I have used leads easily, whatever the numberof thesourcesof error may be,
to thesystemof factors which givesthe most advantageous results, or thosein which
the same erroris less probable than in anyother system.
I ought to make herean important remark. The small uncertainty thatthe
observations, when they are notnumerous, leavein regardto thevaluesof the
constants... rendersa little uncertainthe probabilities determined by analysis. But
it almost alwayssuffices to know if the probability, thatthe errorsof the results
obtainedarecomprised within narrow limits, approaches closely to unity; andwhen
it is not, it sufficesto know up towhat pointthe observations should be multiplied,
in orderto obtaina probability such that no reasonable doubt remains... The analytic
formulaeof probabilitiessatisfy perfectlythis requirement;. . . They arelikewise
indispensablein solving a great numberof problemsin the natural and moral
sciences. (Laplace, 1820, pp. 192-195in thetranslationby Truscott& Emory, 1951)
Theseareelegantandclear remarksby ageniuswho succeededin makingthe
laborsof many years intelligible to awide readership.
Adams(1974) givesa brief accountof the finalformal development of the
abstract Central LimitTheorem.The Russian mathematician Alexander
Lyapunoff (1857-1918), a pupil of Tchebycheff, provided a rigorous
mathematicalproof of the theorem.His attentionwas drawnto theproblem
when he waspreparing lecturesfor a coursein probability theory,and his
approachwas atriumph. His methodsandinsights led,in the 1920s,to many
valuablecontributionsandeven morepowerful theoremsin probability mathe-
matics.
10
Comparisons, Correlations
and Predictions
COMPARING MEASUREMENTS
Sincethe time of De Moivre, the variables that have been examined by workers
in the field of probability have expressed measurements asmultiplesof a variety
of basic units that reflect the dispersionof therangeof possiblescores.Today
the chosen unitsareunits of standard deviation,and thescoresobtainedare
called standard scoresor z scores. Karl Pearson usedthe term standard
deviation andgaveit the symbola (thelower caseGreek letter sigma) in 1894,
but the unit wasknown (althoughnot in itspresent-day form) to DeMoivre. It
correspondsto that pointon theabscissaof a normal distribution such that an
ordinateerectedfrom it would cut thecurveat itspoint of inflection, or, in simple
terms, the point wherethe curvatureof the function changesfrom concaveto
convex. Sir GeorgeAiry (1801-1892)namedaV2 themodulus (although this
term had been used,in passing,for VK by De Moivre as early as 1733)and
describeda variety of other possible measures, including the probableerror, in
1875.' This latterwas theunit chosenby Gallon (whose workis discussed later),
althoughhe objected stronglyto thename:
It is astonishing that mathematicians, who are themost preciseandperspicaciousof
men, havenot long since revolted against this cumbrous, slip-shod, andmisleading
phrase Moreover the term Probable Error is absurd when applied to thesubjects
1
The probable erroris definedas onehalf of thequantity that encompasses themiddle50% of a
normal distributionof measurements. It is equivalentto whatis sometimes called the semi-interquartile
range (that portionof thedistribution betweenthe first quartileor thetwenty-fifth percentile,and the
third quartileor theseventy-fifth percentile, dividedby two). Theprobable erroris 0.67449timesthe
standard deviation. Nowadays, everyone follows Pearson (1894), who wrote, "I have alwaysfound it
more convenient to work with the standard deviation than with the probable erroror themodulus,in
termsof which the error function is usuallytabulated" (p. 88).
127
128 10. COMPARISONS, CORRELATIONS AND PREDICTIONS
now in hand, suchas Stature,Eye-colour, Artistic Faculty,or Disease.I shall
therefore usuallyspeakof Prob. Deviation. (Galton, 1889, p. 58)
This objection reflects Galton's determination, andthat of his followers,to
avoid the use of theconcept of error in describingthe variationof human
characteristics.It also foreshadowsthe well-nigh complete replacement of
probable error with standard deviation, and law of frequency of error with
normal distribution, developments that reflect philosophical dispositions rather
thanmathematical advance. Figure 10.1 shows the relationship between stand-
ard scoresandprobableerror.
Perhapsa word or two of elaborationandexamplewill illustratethe utility
of measurements made in units of variability. It may seem triteto makethe
statement that individual measurements andquantitative descriptions aremade
for the purposeof making comparisons. Someone who pays $1,000for a suit
hasbought an expensive suit, aswell ashaving paida greatdeal more thanthe
individual who haspickedup acheapoutfit for only $50.The labels"expensive"
and "cheap"areapplied because the suit-buyingpopulation carries around with
it some notionof the average priceof suitsandsome notionof the rangeof
FIG. 10.1 TheNormal Distribution - Standard
Scores and the Probable Error
GALTON'S DISCOVERY OF REGRESSION 129
pricesof suits. This obviousfact would be made even more obvious by the
reactionto anannouncement that someone hadjust purchaseda new car for
$1,000. Whatis a lot ofmoneyfor a suit suddenly becomes almost trifling for
a brandnew car. Again this judgment depends on aknowledge, whichmay not
be at allprecise,of theaverage price, and theprice range,of automobiles. One
can own anexpensive suitand acheapcar andhave paidthe same absolute
amount for both, althoughit must be admitted that sucha juxtaposition of
purchasesis unlikely! The point is that these examples illustrate the fundamental
objectiveof standardscores,the comparisonof measurements.
If the mean priceof carsis $9,000and thestandard deviationis $2,500 then
our $1,000car has anequivalentz scoreof ($1,000- $9,000)/$2,500or-3.20.
If suit prices havea meanof $350and astandard deviationof $150, thenthe
$1,000.00suit has anequivalentz scoreof ($1,000- $350)/$150or +4.33.
We havea very cheapcar and avery, very expensive suit. It might be added
that, at thetime of writing, thesefigures were entirely hypothetical.
These simple manipulations arewell-known to usersof statistics, and, of
course, when they areappliedin conjunctionwith the probability distributions
of the measurements, they enable us toobtainthe probability of occurrenceof
particularscoresor particular rangesof scores. Theyareintuitively sensible.
A more challenging problemis that of thecomparisonof setsof pairsof scores
andof determininga quantitative description of therelationship between them.
It is aproblem thatwassolvedduring the secondhalf of the 19th century.
GALTON'S DISCOVERY OF "REGRESSION"
Sir Francis Gallon (his workwasrecognizedby theawardof a knighthoodin
1909) hasbeen describedas aVictorian genius. If we follow Edison's oft-
quoted definition
2
of this condition then therecan be noquarrel with the
designation,but Galtonwas a man of greatflair aswell asgreat energyand his
ideasandinnovations were many andvaried. In a long life he produced over
300 publications, including17 books. But,in one ofthoseodd happeningsin
the history of human invention,it was Galton's discoveryand subsequent
misinterpretationof a statistical artifact that marks the beginningof the tech-
niqueof correlationas we nowknow it.
1860was theyearof thefamous debateon Darwin's work between Thomas
Huxley (1825-1895)andBishop Samuel Wilberforce (1805-1873).This en-
counter, which heldthe attentionof thenation, took placeat themeetingof the
British Association (forthe Advancementof Science)held thatyearat Oxford.
Galton attendedthe meeting,andalthoughwe do notknow what rolehe had at
it, we doknow thathe later becamean ardent supporter of his cousin's theories.
In 1932, Thomas Alva Edison said that "Genius is onepercent inspiration andninety-nineper cent
perspiration."
130 10. COMPARISONS, CORRELATIONS AND PREDICTIONS
The Origin of Species,he said, "madea marked epochon mydevelopment,as
it did in that of human thoughtgenerally"(Galton,1908, p. 287).
Cowan (1977) notes that, although Galton had made similar assertions
before,the impact of TheOrigin must have been retrospective, describing his
initial reactionto thebook as,"pedestrianin theextreme."Shealso asserts that
"Galton never really understood the argumentfor evolutionby natural selection,
nor was heinterestedin theproblemof thecreationof new species"(Cowan,
1977, p. 165).
In fact, it is apparentthat Galtonwasquite selectivein usinghis cousin's
work to supporthis ownview of themechanisms of heredity.
The Oxford debatewas not just a debateabout evolution. MacKenzie (1981)
describes Darwin's book as the"basicwork of Victorian scientific naturalism"
(p. 54), the notion of theworld, and thehuman species and itsworks,aspart of
rational scientific nature, needing no recourseto thesupernatural to explainthe
mysteriesof existence. Naturalismhas itsorigins in the rise of sciencein the
17th and18th centuries, and itsopponents expressed their concern, because of
its attack, implicitandovert,on traditional authority.As one19th century writer
notes,for example:
Wider speculationsas tomorality inevitably occuras soonas thevision of God
becomesfaint; whenthe Almighty retires behind second causes, insteadof beingfelt
as animmediate presence, and hisexistencebecomesthe subjectof logical proof.
(Stephen,1876, Vol II, p. 2).
The return to nature espousedby writers suchasJean Jacques Rousseau
(1712-1778)and hisfollowers would haverid theworld of kings and priests
and aristocrats whose authority rested on tradition and instinct rather than
reason,andthus, they insisted, have brought about a simple"natural" stateof
society. MacKenzie (1981), citing Turner (1978), adds a very practical noteto
this philosophy.The battlewas notjust about intellectual abstractions but about
who should have authorityand control and whoshouldenjoy the material
advantages that flow from the possessionof that authority. Scientific naturalism
was theweaponof themiddle classin its strugglefor powerandauthority based
on intellect andmerit andprofessional elitism, and not onpatronageor nobility
or religious affiliation.
These ideas,and the newbiology, certainly turned Galton away from
religion, aswell asproviding him with an abiding interestin heredity. Forrest
(1974) suggests that Gallon'sfascination for,andwork on, heredity coincided
with the realization that his ownmarriage wouldbe infertile. This alsomay have
been a factor in the mental breakdown that he suffered in 1866. "Another
possibleprecipitating factorwas theloss of his religious faith which left him
with no compensatory philosophy until his programmefor theeugenic improve-
ment of mankind becamea future article of faith" (Forrest,1974, p. 85).
GALTON'S DISCOVERY OF REGRESSION 131
During the years 1866to 1869 Galtonwas ingenerally poor health, but he
collectedthe material for,andwrote, one of hismost famous books, Hereditary
Genius, whichwas publishedin 1869. In this bookand inEnglish Men of
Science, published in 1874, Galton expounds andexpands uponhis view that
ability, talent, intellectual power, and accompanying eminence areinnately
rather than environmentally determined. The corollary of this view was,of
course, that agencies of social control shouldbe established that would encour-
age the"best stock" to have children,and todiscourage those whom Galton
describedashavingthe smallest quantities of "civic worth" from breeding. It
hasbeen mentioned that Galton was acollector of measurements to a degree
that wasalmostcompulsive,and it iscertainthat he was not happy withthe fact
that he had had to use qualitative rather than quantitative data in supportof his
argumentsin the twobooks just cited.
MacKenzie (1981) maintains that:
the needsof eugenicsin largepart determinedthe content of Galton'sstatistical
theory....If the immediateproblemsof eugenicsresearchwereto besolved,a new
theoryof statistics,different from that of thepreviouslydominanterror theoristshad
to beconstructed. (MacKenzie,1981, p. 52)
Galton embarkedon acomparative study of thesizeandweight of sweetpea
seeds over two generations,
3
but, as helater remarked, "It was anthropological
evidence that I desired, caringonly for theseedsasmeansof throwing lighton
heredityin man. I tried in vain for a long andweary timeto obtainit in sufficient
abundance" (Galton, 1885a, p. 247).
Galton beganby weighing,andmeasuringthe diametersof, thousandsof
sweetpeaseeds. He computedthe meanand theprobable errorof theweights
of theseseedsandmadeup packetsof 10 seeds, each of theseeds being exactly
the same weight.The smallest packet contained seeds weighing the mean minus
three timesthe probable error, the next the meanminustwice the probable error,
andso on up topackets containing the largest seeds weighing the mean plus
three timesthe probable error. Sets of theseven packets were sent to friends
acrossthe lengthandbreadthof Britain with detailed instructions on howthey
wereto beplantedandnurtured. There were two crop failures, but theproduce
of seven harvests provided Galton with the datafor a Royal Institution lecture,
"Typical Lawsof Heredity,"given in 1877. Complete data for Gallon's experi-
ment are notavailable,but heobserved what he statedto be asimplelaw that
connected parent andoffspring seeds.The offspring of eachof the parental
In Memoriesof My Life (1908), Galton says that he determinedon experimenting with sweet peas
in 1885andthatthe suggestionhadcometo him from Sir Joseph Hooker (the botanist, 1817-1911)and
Darwin. But Darwin haddied in 1882and theexperiments must have been suggested andwere begun
in 1875. Assuming that this is notjust a typographical error, perhaps Galton wasrecallingthe dateof
his important paperon regressionin hereditary stature.
132 10. COMPARISONS, CORRELATIONS AND PREDICTIONS
weight categorieshadweights that were what we would now call normally
distributed, and,the probable error(we would now calculatethe standard
deviation)was thesame. However, the mean weight of each groupof offspring
was not asextremeas theparental weight. Large parent seeds produced larger
than average seeds, but meanoffspring weight was not aslargeas parental
weight. At the other extremesmall parental seeds produced, on average, smaller
offspring seeds,but themeanof theoffspring wasfound not to be assmall as
that of theparents. This phenomenon Galton termed reversion.
Seeddiameters, Galton noted, aredirectly proportionalto their weight,and
showthe sameeffect:
By family variability is meantthe departureof thechildrenof thesameor similarly
descendedfamilies, fromthe ideal mean typeof all of them. Reversionis the
tendencyof that ideal mean filial typeto departfrom the parenttype, "reverting"
towards whatmay beroughly andperhapsfairly describedas theaverageancestral
type. If family variability hadbeenthe only processin simple descent that affected
the characteristicsof a sample,the dispersionof therace fromits mean ideal type
would indefinitely increase withthe numberof generations;but reversionchecks
this increase, andbrings it to astandstill. (Galton, 1877, p. 291)
In the 1877 paper, Galton gives a measureof reversion,\v\uchhe symbolizedr,
andarrivesat anumberof thebasic properties of what we nowcall regression.
Some years passed before Galton returned to thetopic, but during those years
he devised waysof obtainingthe anthropometric datahe wanted. He offered
prizesfor themost detailed accounts of family historiesof physicalandmental
characteristics, character andtemperament, occupations andillnesses, height
and appearance,and so on, and in1884he opened,at his ownexpense,an
anthropometric laboratoryat theInternational Health Exhibition.
For a small sum of money, members of the public were admittedto the
laboratory. In return the visitors receiveda record of their various physical
dimensions, measures of strength, sensory acuities, breathing capacity, color
discrimination, andjudgmentsof length. 9,337 persons were measured, of
whom 4,726 were adult males and 1,657 adult females. At the end of the
exhibition, Galton obtaineda site for the laboratoryat theSouth Kensington
Museum,where data continued to becollectedfor closeto 8 more years. These
data formed part of a numberof papers. They were, of course,not entirelyfree
from errorsdue toapparatus failure, the circumstancesof the datarecording,
andother factors that arefamiliar enoughto experimentalistsof thepresent day.
Some artifacts might have been introduced in other ways. Galton (1884)
commented, "Hardlyany trouble occurredwith the visitors, thoughon some
few occasions rough persons entered the laboratorywho were apparentlynot
altogethersober"(p. 206).
In 1885, Gallon's Presidential Address to theAnthropological Sectionof
FIG 10.2 Plate from Galton's 1885a paper (Journal of the
Anthropological Institute).
FIG. 10.3 Plate from Galton's 1885a paper (Journal of the
Anthropological Institute).
GALTON'S DISCOVERY OF REGRESSION 135
the British Association (1885b), meeting that year in Aberdeen, Scotland,
discussedthe phenomenonof what he nowtermed regression toward mediocrity
in human hereditary stature. An extended paper (1885a) in theJournal of the
Anthropological Institute gives illustrations, which arereproducedhere.
First it may benoted that Galton used a measureof parental heights that he
termedthe height of the"mid-parent." He multiplied the mother's height by
1.08 andtook the meanof theresulting valueand thefather's height to produce
the mid-parent value.
4
He found that a deviationfrom mediocrityof oneunit
of height in theparentswasaccompaniedby adeviation,on average,of about
only two-thirds of a unit in the children (Fig. 10.2). This outcome paralleled
what he hadobservedin thesweetpeadata.
Whenthe frequenciesof the(adult) children's measurements were entered
into a matrix against mid-parent heights, the data being"smoothed"by comput-
ing the meansof four adjacent cells, Galton noticed that values of the same
frequencyfell on aline that constitutedan ellipse. Indeed, the data produceda
seriesof ellipsesall centeredon themeanof themeasurements. Straight lines
drawnfrom this centerto pointson theellipse that were maximally distant (the
points of contactof thehorizontalandvertical tangents- thelines YN and XM
in Fig. 10.3) producethe regressionlines,ON and OM, and theslopesof these
lines give theregression values of \ andj .
The elliptic contours, which Galton said he noticed whenhe waspondering
on hisdatawhile waitingfor a train, arenothing more thanthe contour lines
that areproducedfrom the horizontal sectionsof thefrequency surface gener-
atedby two normal distributions (Fig. 10.4).
The time had nowcomefor some serious mathematics.
All the formulaefor Conic Sections having long since goneout of myhead,I went
on my return to London to theRoyal Institutionto read themup. Professor,now Sir
James,Dewar,camein, andprobably noticingsignsof despairon my face, asked
me what I wasabout;thensaid,"Why do youbother over this?My brother-in-law,
J. Hamilton Dicksonof Peterhouse loves problems andwants new ones. Sendit to
him." I did so, under the form of a problemin mechanics,and hemost cordially
helped me byworking it out, asproposed,on thebasisof theusuallyacceptedand
generally justifiable GaussianLaw of Error. (Galton, 1908,pp. 302-303)
I may bepermittedto saythat I neverfelt sucha glow of loyalty andrespecttowards
the sovereigntyandmagnificent swayof mathematical analysis aswhenhis answer
Galton (1885a) maintains that this factor "differs a very little from the factors employedby other
anthropologists, who, moreover, differ a trifle between themselves; anyhow it suitsmy data better than
1.07or 1.09" (p. 247). Galton also maintained (and checked in his data) "that marriage selection takes
little or noaccountof shortnessor tal l ness...we maythereforeregardthe marriedfolk ascouples picked
out of the general populationat haphazard" (1885a, pp. 250-251)- a statement thatis not only
implausibleto anyonewho hascasually observed married couples but isalsonot borneout byreasonably
careful investigation.
FIG. 10.4 Frequency Surfaces and Ellipses
(From Yule & Kendall, 14th Ed., 1950)
136
GALTON'S DISCOVERY OF REGRESSION 137
reachedme, confirming, by purely mathematical reasoning, my various and
laboriousstatistical conclusions than I had daredto hope,for the original dataran
somewhatroughly, and 1 had tosmooth them with tender caution. (Galton, 1885b,
p. 509)
Now Galtonwascertainlynot amathematical ignoramus, but thefact that
one ofstatistics' founding fathers sought help for theanalysisof his momentous
discoverymay be ofsome small comfort to studentsof thesocial scienceswho
sometimesfind mathematics such a trial. Another breakthroughwas tocome,
and again, it was toculminatein a mathematical analysis, this time by Karl
Pearson, and thedevelopmentof the familiar formula for the correlation
coefficient.
In 1886,a paperon "Family Likenessin Stature,"publishedin the Proceed-
ings of the Royal Society, presents Hamilton Dickson's contribution, aswell as
data collectedby Galtonfrom family records. Of passing interest is his use of
the symbol w for "the ratioof regression" (short-lived, as it turns out)as he
details correlations between pairs of relatives.
Gallon's work had led him to amethodof describingthe relationship
between parentsand offspring and between other relatives on a particular
characteristicby usingthe regression slope. Now heapplied himself to thetask
of quantifyingthe relationship between different characteristics, the sort of data
collectedat theanthropometric laboratory. It dawnedon him in aflash of insight
that if each characteristic wasmeasuredon ascale basedon its ownvariability
(in other words,in what we nowcall standard scores), then the regression
coefficient could be appliedto these data. It was notedin chapter 1 that the
location of this illumination wasperhapsnot theplace that Galton recalled in
his memoirs (written whenhe was in hiseighties)so that the commemorative
tablet that Pearson said the discovery deserved will haveto besited carefully.
Before examining someof the consequencesof Galton's inspiration, the
phenomenonof regressionis worth a further look. Arithmeticallyit is real
enough, but,asForrest (1974) puts it:
It is not that the offspring have been forced towards mediocrity by thepressureof
their mediocreremote ancestry, but aconsequenceof a less than perfect correlation
betweentheparentsandtheir offspring. By restrictinghis analysisto the offspring
of a selectedparentageandattemptingto understand their deviations from the mean
Galton failsto account for thedeviationof all offspring. ... Galton's conclusionis
that regressionis perpetual andthat the only way in which evolutionarychangecan
occur is throughtheoccurrenceof sports,(p. 206)
5
5
In this contextthe term "sport" refersto ananimal or aplant that differs strikingly from its species
type. In modern parlancewe would speakof "mutations." It is ironic thattheMendelians, led byWilliam
Bateson(1861-1926),used Galton's work in supportof their argument that evolutionwasdiscontinuous
andsaltatory,andthat the biometricians. led by PearsonandWeldon,who held fast to thenotion that
continuousvariationwas thefountainheadof evolution, took inspirationfrom the same source.
138 10. COMPARISONS, CORRELATIONS AND PREDICTIONS
Wallis and Roberts (1956) present a delightful accountof the fallacy and
give examplesof it in connection with companyprofits, family incomes,
mid-termand finalgrades,andsalesandpolitical campaigns.As they say:
Take any set ofdata,arrange themin groups accordingto some characteristic, and
then for eachgroup computethe averageof somesecondcharacteristic. Thenthe
variability of thesecond characteristic wi l l usuallyappearto beless than that of the
first characteristic.( p. 263)
In statisticsthe term regressionnow means prediction, a point of confusion
for many students unfamiliar with its history. Yule andKendall (1950) observe:
The term "regression"is not aparticularly happyonefrom the etymological point
of view, but it is sofirmly embeddedin statistical literature that we makeno attempt
to replace it by an expression whichwould more suitably expressits essential
properties,(p. 213)
The word is nowpart of thestatistical arsenal, and itservesto remind those
of us who areinvolved in its applicationof animportantepisodein thehistory
of the discipline.
GALTON'S MEASURE OF CO-RELATION
On December 20th, 1888, Galton's paper, Co-relations and Their Measure-
ment,Chiefly from Anthropometric Data, wasread beforethe Royal Societyof
London. It begins:
"Co-relationor correlation of structure"is aphrase much used in biology, and not
leastin that branchof it which refersto heredity,and theideais even morefrequently
present thanthe phrase;but 1 am not awareof anyprevious attempt to defineit clearly,
to traceits modeof action in detail,or to showhow tomeasureits degree.(Galton,
1888a,p. 135)
He goeson to statethat the co-relation betweentwo variable organs must be
due, in part, to common causes, that if variation was wholly due tocommon
causes then co-relation would be perfect,and if variation "were in no respect
due tocommon causes, the co-relationwould be nil" (p. 135). His aimthenis
to showhow this co-relationmay beexpressedas asimple number, and heuses
as anillustration the relationship between the left cubit (the distance between
the elbow of the bent left arm and the tip of themiddle finger) and stature,
althoughhe presents tables showingthe relationshipsbetweena variety of other
physical measurements. His dataaredrawnfrom measurements made on 350
adult malesat theanthropometric laboratory, and then, as now, there were
missingdata. "The exact number of 350 is notpreserved throughout, asinjury
GALTON'S MEASURE OF CO-RELATION 139
to some limbor other reducedthe available number by 1, 2, or 3 indifferent
cases"(Galton, 1888a,p. 137).
After tabulatingthe datain order of magnitude, Galton noted the valuesat
the first, second,andthird quartiles.Onehalf the value obtainedby subtracting
the valueat thethird from the valueat thesecond quartile gives him Q, which,
he notes,is theprobable errorof anysingle measurein the series,and thevalue
at the second quartileis themedian. For staturehe obtaineda medianof 67.2
inchesand a Q of1.75, and for theleft cubit, a medianof 18.05 inchesand a Q
of 0.56. It shouldbe noted that although Galton calculated the median,he refers
to it as being practicallythe mean value, "becausethe seriesrun with fair
symmetry"(p. 137). Galton clearly recognized that these manipulations did not
demand thatthe original unitsof measurement be thesame:
It will be understood that the Qvalueis auniversalunit applicableto themost varied
measurements, such asbreathing capacity, strength, memory, keenness of eyesight,
and enables themto be compared together on equal terms notwithstanding their
intrinsic diversity. (Galton, 1888a, p. 137)
Perhapsin an unconscious anticipation of the inevitable universalityof the
metric system,he also recordshis dataon physical dimensions in centimeters.
Figure 10.5 reproduces the data (TableIII in Galton's paper) from which the
closenessof theco-relation between stature andcubit wascalculated.
A graphwasplottedof stature, measured in deviationsfrom M
s
in units of
Q
s
, against the mean of the correspondingleft cubits, again measuredas
deviations, this timefrom M
c
in units of Q
c
(columnA against columnB in the
table). Betweenthe same axes, left cubit wasplotted as adeviationfrom M
c
measuredin units of Q
c
againstthe meanof corresponding statures measured
asdeviationsfrom M
s
in units of Q
s
(columnsC and D in thetable). A line is
then drawnto represent "the general run" of theplotted points:
It is here seento be astraight line, and it wassimilarly found to bestraightin every
other figure drawnfrom thedifferent pairsof co-related variables that I haveas yet
tried. But theinclinationof theline to theverticaldiffers considerablyin different
cases.In thepresentone theinclination is suchthat a deviationof 1 on thepart of
the subject [the ordinate values], whether it be statureor cubit, is accompaniedby a
mean deviationon thepart of therelative[the valuesof theabscissa], whether it be
cubit or stature,of 0.8. This decimal fraction is consequentlythe measureof the
closenessof theco-relation. (Galton, 1888a, p. 140)
Galtonalso calculates the predicted values from the regression line. He takes
what he termsthe "smoothed"(i.e., readfrom the regression line) value for a
given deviationmeasurein units of Q
c
or Q
s
, multiplies it by Q
c
or Q
s
, andadds
the result to themeanM
c
or M
y
For example, +1.30(0.56) + 18.05= 18.8. In
modern terms, he computesz' (s) + X~ X. It isilluminating to recomputeand
FIG. 10.5 Galton's Data 1888. Proceedings
of the Royal Society.
2
THE COEFFICIENT OF CORRELATION 141
to replot Galton'sdataand tofollow his line of statistical reasoning.
Finally, Galton returnsto his original symbol r, to representdegreeof
co-relation, thesymbol whichwe usetoday, andredefines/=\ 1 - r as"the Q
valueof thedistributionof anysystemof x values,as x
}
, x
2
, x
3
, &c., roundthe
mean of all of them, whichwe may call X " (p. 144), which mirrorsour
modern-day calculationof thestandard errorof estimate,andwhich Galton
had obtainedin 1 877.
This short paper is not acomplete account of themeasurement of correlation.
Galton shows that he has not yet arrivedat thenotion of negative correlation
nor of multiple correlation,but hisconcluding sentences show just how far he
did go:
Lety = thedeviationof thesubject, whichever of the twovariablesmay betakenin
that capacity;and let;c,, ;c
2
, x
3
, &c., be thecorresponding deviations of the relative,
andlet themeanof thesebe X. Thenwe find: (1)that y = rX for all valuesof y; (2)
that r is thesamewhicheverof the twovariablesis takenfor thesubject;(3) that r is
always less than1; (4) that r measuresthe closenessof co-relation. (Gallon,1888a,
p. 145)
Chronologically, Gallon'sfinal contributionto regressionandheredityis his
book Natural Inheritance, published in 1889. This bookwascompleted several
months beforethe 1888 paperon correlationandcontains noneof its important
findings. It was, however,an influential book that repeatsa great dealof
Galton's earlier workon thestatisticsof heredity,the sweetpeaexperiments,
the dataon stature,the recordsof family faculties,and so on. It wasenthusias-
tically receivedby Walter Weldon(1860-1906),who wasthena Fellow of St
John's College, Cambridge, andUniversity Lecturerin Invertebrate Morphol-
ogy, and it pointedhim toward quantitative solutions to problemsin species
variation thathadbeen occupyinghis attention. This booklinked Galton with
Weldon, andWeldon with Pearson, andthen Pearsonwith Galton, a concate-
nation that beganthe biometric movement.
THE COEFFICIENT OF CORRELATION
In June 1890, Weldon waselecteda Fellow of theRoyal Society,andlater that
year became Jodrell Professor of Zoology at University College, London.In
March of thesame year the Royal Societyhadreceivedthefirst of his biometric
papers that describes the distributionof variationsin a numberof measurements
made on shrimps. A Marine Biological Laboratoryhad been constructedat
Plymouth2 yearsearlier,andsince that time Weldon hadspent partof theyear
there collecting-measurements of thephysical dimensions of these creatures and
their organs.The Royal Society paper hadbeen sent to Galtonfor review, and
with his help the statistical analyseshad been reworked. This marked the
142 10. COMPARISONS, CORRELATIONS AND PREDICTIONS
beginningof Weldon'sfriendshipwith Gallonand aconcentrationon biometric
work that lastedfor the restof his life. In fact, the paper doesnot presentthe
resultsof a correlational analysis, although apparently one hadbeen carried out.
I haveattemptedto apply to theorgansmeasuredthe testof correlation givenby Mr.
Galton ... and theresult seemsto show thatthe degreeof correlation betweentwo
organsis constantin all theracesexamined;Mr. Galton has,in a letter to myself,
predicted thisresult A result of this kind is, however,so importantto thegeneral
theory of heredity, thatI prefer to postponea discussionof it until a largerbody of
evidencehasbeen collected. (Weldon, 1890, p. 453)
The year 1892sawthese analyses published.Weldon beginshis paperby
summarizingGalton's methods.He then describesthe measurements that he
made,presents extensive tables of his calculations,the degreeof correlalion
betweenthe pairs of organs, and the probable errorof the distributions
(Qvl -r
2
). The actualcalculationof thedegreeof correlation departs some-
whal from Gallon's method, although,because Gallon was inclosetouchwilt
Weldon,we canassume that it had hisblessing.It is quite straightforwardand
is here quotedin full:
(1.). . . let all thoseindividualsbe chosenin which a certain organ,A, differs from
its averagesizeby afixed amount,Y; then, in these individuals, let thedeviationsof
a secondorgan,B, from its averagebe measured. The variousindividualswill exhibit
deviationsof B equaltox
}
,x
2
,x
3
,..., whose meanmay becalledx
m
. Theratio jc
m
/Y
will be constantfor all valuesof Y.
In the sameway, supposethoseindividualsarechosenin which the organB has
a constant deviation, X; then,in theseindividuals,y
m
. themean deviation of theorgan
A, will have the sameratio to X, whatevermay be thevalueof X.
(2.) The ratios *
m
/Y andy
m
/X are connectedby an interesting relation. Let Q
a
representthe probable errorof distributionof theorganA aboutits average,and Q
b
that of theorganB; then -
a constant.
So that by taking a fixed deviationof either organ, expressed in terms of its
probable error,and byexpressingthe mean associated deviation of thesecond organ
in termsof its probable error, a ratio may bedetermined,whosevaluebecomes 1
whena changein either organ involves an equal changein theother,and 0whenthe
two organsarequite independent. This constant, therefore, measures the "degreeof
correlation"betweenthe twoorgans.(Weldon, 1892,p. 3)
In 1893, more extensive calculations were reported on data collectedfrom
two large (eachof 1,000adult females)samplesof crabs,onefrom the Bay of
THE COEFFICIENT OF CORRELATION 143
Naples and theother from Plymouth Sound. In this work, Weldon(1893)
computesthe mean,the meanerror,and themodulus.His greatermathematical
sophisticationin this work is evident. He states:
The probableerroris given below, instead of the'meanerror, becauseit is the constant
which has thesmallest numerical value of any ingeneral use. This property renders
the probableerror more convenient than either the meanerror,the modulus,or the
error of meansquares,in thedeterminationof thedegreeof correlation whichwill
be described below. (Weldon, 1893, pp. 322-323)
Weldon found thatin theNaples specimens the distributionof the "frontal
breadth"producedwhat he termsan "asymmetricalresult." This finding he
hoped might arise fromthe presencein thesampleof two racesof individuals.
He notesthat Karl Pearsonhadtested this suppositionand found that it was
likely. Pearson (1906) statesin his obituaryof Weldon thatit wasthis problem
that led to his(Pearson's) first paperin theMathematical Contributionsto the
Theory of Evolutionseries, receivedby theRoyal Societyin 1893.
Weldon definedr andattemptedto nameit for Galton:
a measureof the degreeto which abnormalityin one organ is accompaniedby
abnormalityin a second. It becomes1 whena changein oneorgan involvesan
equal changein the other, and 0whenthe twoorgansarequite independent.The
importanceof this constantin all attemptsto deal with the problemsof animal
variationwasfirst pointedout by Mr.Galton ... theconst ant . . . may fitly beknown
as"Galton'sfunction." (Weldon, 1893, p. 325)
The statisticsof heredityand amutual interestin theplansfor thereform of
the University of London (whicharedescribedby Pearsonin Weldon's obitu-
ary) drew Pearsonand Weldon together,and they were close friends and
colleaguesuntil Weldon'suntimely death. Weldon's primary concern was to
make his discipline, particularlyas it related to evolution, a more rigorous
scienceby introducing statistical methods. He realized thathis ownmathemati-
cal abilities were limited,and hetried, unsuccessfully,to interest Cambridge
mathematical colleagues in his endeavor.His appointmentto theUniversity
College Chair brought him into contactwith Pearson, but, in themeantime,he
attemptedto remedy his deficienciesby anextensive studyof mathematical
probability. Pearson (1906) writes:
Of this the writer feels sure, that his earliest contributions to biometry werethe direct
resultsof Weldon's suggestions andwould never have been carried out without his
inspirationand enthusiasm. Both were drawn independentlyby Galton's Natural
Inheritance to these problems, (p. 20)
Pearson'smotivationwasquitedifferent from that of Weldon. MacKenzie
144 10. COMPARISONS, CORRELATIONS AND PREDICTIONS
(1981)hasprovided us with a fascinating account of Pearson's background,
philosophy, andpolitical outlook.. He wasallied with the Fabiansocialists
6
;
he held strong viewson women's rights; he wasconvincedof thenecessityof
adoptingrational scientific approachesto a range of social issues;and his
advocacyof theinterestsof theprofessional middle-class sustained his promo-
tion of eugenics.
His originality, his real transformation rather than re-ordering of knowledge, is to be
found in his work in statisticalbiology, wherehe took Galton'sinsightsandmade
out of thema newscience.It was thework of his maturity - hestartedit only in his
mid-thirties- and in it can befound theflowering of mostof themajor concernsof
his youth.(MacKenzie, 1981,pp. 87-88)
It can beclearly seen that Pearson was notmerely providinga mathematicalapparatus
for othersto use ... Pearson'spoint wasessentiallya political one:theviability, and
indeedsuperiorityto capitalism,of a socialiststatewith eugenically-plannedrepro-
duction. The quantitative statistical formof his argument providedhim with con-
vincing rhetorical resources. (MacKenzie, 1981,p. 91)
MacKenzie is atpainsto point out that Pearsondid not consciouslyset out
to found a professional middle-class ideology. His analysis confines itself to
the view that herewas amatch of beliefs and social interests that fostered
Pearson'sunique contribution, andthat this sortof sociological approachmay
be usedto assessthe work of such exceptional individuals. Pearsondid not seek
to becomethe leader of a movement; indeed, the compromises necessary for
suchan aspiration would have been anathema to what he saw as the role of the
scientist and theintellectual.
However one assesses the operationof the forces that molded Pearson's
work, it is clear that,from a purely technical standpoint, the introduction,at this
juncture, of an able, professional mathematician to the field of statistical
methodsbroughtabouta rapid advanceand agreatly elevated sophistication.
7
The third paperin theMathematical Contributions series wasread beforethe
Royal Societyin November 1895.It is anextensive paper which dealswith,
among other things, the general theoryof correlation. It containsa numberof
historical misinterpretations and inaccuracies, that Pearson (1920) later
6
TheFabian Society (the Webbs andGeorge Bernard Shaw were leading members) took its name
from Fabius,the Roman Emperor who adopteda strategyof defenseandharrassment in Rome'swar
with Hannibalandavoided directconfrontations. The Fabiansadvocated gradual advance andreform
of society, rather than revolution.
7
Pearsonplaced Third Wrangler in theMathematical Triposat Cambridgein 1879. The "Wran-
glers" were themathematics students at Cambridgewho obtainedFirst Class Honors- theoneswho
most successfully"wrangle"with math problems. This method of classificationwasabandonedin the
early yearsof this century.
THE COEFFICIENT OF CORRELATION 145
attemptedto rectify, but for today's users of statistical methods it is of crucial
importance,for it presentsthe familiar deviation score formula for the coeffi-
cient of correlation.
It might be mentioned here that the latter termhadbeen introducedfor r by
F. Y. Edgeworth(1845-1926)in an impossiblydifficult-to-follow paper pub-
lishedin 1892. EdgeworthwasDrummond Professor of Political Economyat
Oxford from 1891 untilhis retirementin 1922,and it is ofsomeinterestto note
that he hadtried to attract Pearsonto mathematical economics, but without
success.Pearson (1920) saysthat Edgeworthwasalso recruitedto correlation
by Galton's Natural Inheritance,and heremainedin close touch withthe
biometricians over many years.
Pearson's important paper (published in thePhilosophical Transactionsof
the Royal Societyin 1896) warrants close examination. He begins withan
introductionthat statesthe advantagesandlimitations of thestatistical approach,
pointing out that it cannot giveus preciseinformation about relationships
between individualsand that nothingbut meansandaveragesandprobabilities
with regardto large classescan bedealt with.
On theotherhand,themathematical theorywi l l be ofassistanceto themedicalman
by answering,inter alia, in its discussionof regressionthe problemas to theaverage
effect upontheoffspring of given degreesof morbid variationin theparents.It may
enable the physician, in many cases,to state a belief basedon ahigh degreeof
probability, if it offers no groundfor dogmain i ndi vi dual cases.(Pearson,1896, p.
255)
Pearsongoeson to define the mean, median, andmode,the normal prob-
ability distribution, correlation,and regression,as well as various termsem-
ployed in selectionand heredity. Next comesa historical section,whichis
examined laterin this chapter. Section4 of Pearson'spaper examinesthe
"special caseof two correlatedorgans." He derives whathe termsthe "well-
known Galtonian formof thefrequencyfor two correlated variables," andsays
that r is the "GALTON function or coefficient of correlation" (p. 264).
However, he is notsatisfied thatthe methods usedby GaltonandWeldon give
practically the best methodof determiningr, and hegoeson toshowby what
we would now call the maximum likelihood method that S(xy)/(nG\ a
2
) is the
bestvalue (todaywe replaceS by I forsummation).This expressionis familiar
to every beginning student in statistics. As Pearson (1896) puts it, "This value
presentsno practical difficulty in calculation,andthereforewe shall adoptit"
(p. 265). It is now well-known thatwe have done precisely that.
8
Pearson (1896, p. 265) notes "that S(xy) correspondsto theproduct-moment of dynamics,asS ( x )
to the momentof inertia." Thisis why r isoften referredto as the"product-momentcoefficient of
correlation."
146 10. COMPARISONS, CORRELATIONS AND PREDICTIONS
Therefollows the derivationof thestandard deviation of thecoefficientof
correlation,(1 - r
2
)/Vn( 1 - r
2
), which Pearson translates to the probable error,
0.674506(1- r
2
)/\n( 1- r
2
). These statistics arethen usedto rework Weldon's
shrimp and crab data, and theresults show that Weldon was mistaken in
assuming constancy of correlationin local racesof thesame species. Pearson's
exhaustive re-examination of Galton's dataon family stature also shows that
someof theearlier conclusions were in error.The detailsof these analyses are
now of narrow historical interest only, but Pearson's general approach is amodel
for all experimentersandusersof statistics. He emphasizesthe importanceof
samplesizein reducingthe probable error, mentionsthe importanceof precision
in measurement, cautions against general conclusions from biased samples, and
treatshis findings with admirablescientific caution.
Pearson introduces V= (o/w) 100 , thecoefficient of variation, as a way of
comparing variation, andshows that "the significance of themutual regressions
of . . . two organsare as thesquaresof their coefficientsof variation" (p. 277).
It should alsobe noted that,in this paper, Pearson pushes much closer to the
solutionof problems associated with multiple correlation.
This almost completes the accountof a historical perspective of thedevel-
opmentof thePearsoncoefficient of correlation as it iswidely knownandused.
However,the record demands that Pearson's account be lookedat in more detail
andthe interpretationof measuresof associationasthey were seenby others,
most notably GeorgeUdny Yule (1871-1951),who made valuable contribu-
tions to thetopic, be examined.
CORRELATION- CONTROVERSIES AND CHARACTER
Pearson's (1896) paper includes a sectionon thehistory of themathematical
foundationsof correlation. He says thatthe fundamentaltheorems were "ex-
haustivelydiscussed"by Bravaisin 1846. Indeed,he attributesto Bravaisthe
inventionof theGALTONfunction whi l e admittingthat "a single symbol is not
usedfor it" (p. 261). He alsostatesthat S(xy)/(na\ a
2
) "is thevalue givenby
Bravais,but hedoesnot show thatit is thebest" (p. 265). In examiningthe
general theoremof a multiple correlationsurface, Pearson refersto it as
Edgeworth's Theorem. Twenty-five years later he repudiates these statements
andattemptsto set hisrecord straight.He avers that:
They have been accepted by later writers, notablyMr Yule in his manualof statistics,
who writes (p. 188): "Bravaisintroducedtheproduct-sum,but not asingle symbol
for a coefficient of correlation. Sir Francis Galton developed the practical method,
determininghis coefficient (Galton'sfunction as it wastermedat first) graphically.
Edgeworth developedthe theoretical sidefurther andPearson introduced the prod-
uct-sum formula."
CORRELATION - CONTROVERSIES AND CHARACTER 147
Now I regretto saythat nearlythewhole of theabovestatementsarehopelessly
incorrect.(Pearson,1920, p. 28)
Now clearly, it is just not thecase"that nearlythe whole of theabove
statementsarehopelessly incorrect."In fact, what Pearsonis trying to do is to
emphasizethe importanceof his own andGalton's contributionand toshift the
blamefor themaintenanceof historical inaccuracies to Yule, with whomhe
hadcometo have some serious disagreement. The natureof this disagreement
is of interestand can beexaminedfrom at least three perspectives. The first is
that of eugenics,the secondis that of thepersonalitiesof theantagonists,and
the third is that of the fundamental utility of, and theassumptions underlying,
measuresof association. However, before these matters areexamined, some
brief commenton thecontributionsof earlier scholarsto themathematicsof
correlationis in order.
The final yearsof the 18th centuryand the first 20yearsof the19th was the
period in which the theoreticalfoundationsof the mathematicsof errors of
observation were laid. Laplace (1749-1827)andGauss(1777-1855)are the
best-knownof themathematicianswho derivedthe law offrequencyof error
anddescribedits application,but anumberof other writers also made significant
contributions (see Walker, 1929, for anaccountof these developments). These
scholarsall examinedthe questionof theprobability of thejoint occurrenceof
two errors,but "None of them conceivedof this as amatter which could have
application outsidethe fields of astronomy, physics, andgeodesyor gambling"
(Walker, 1929,p. 94).
In fact, put simply, these workers were interested solely in themathematics
associatedwith the probability of thesimultaneousoccurrenceof two errorsin,
say,the measurement of thepositionof a point in a planeor in three dimensions.
They were clearlynot looking for a measureof a possible relationship between
the errors and certainly not consideringthe notion of organic relationships
among directly measured variables. Indeed, astronomers andsurveyors sought
to make their basic measurements independent. Having said this,it is apparent
that the mathematical formulationsthat were produced by these earlier scientists
arestrikingly similarto those deduced by GaltonandHamilton Dickson.
Auguste Bravais(1811-1863),who hadcareersas anaval officer, an
astronomerandphysicist, perhaps came closest to anticipatingthe correlation
coefficient; indeed, he even usesthe term correlationin his paper of 1846.
Bravais derivedthe formula for thefrequency surfaceof thebivariate normal
distributionandshowed that it was aseriesof concentricellipses,as didGalton
andHamilton Dickson40 years later. Pearson's acknowledgment of thework
of Bravaisled to thecorrelationcoefficient sometimes being called the Bravais-
Pearson coefficient.
WhenPearson came, in 1920, to revisehis estimationof Bravais' role,he
148 10. COMPARISONS, CORRELATIONS AND PREDICTIONS
describeshis ownearly investigationsof correlationandmentionshis lectures
on the topic to research students at University College. He says:
I was far tooexcitedto stopto investigate properly what otherpeoplehaddone. I
wantedto reachnew resultsandapply them. AccordinglyI did not examine carefully
eitherBravaisor Edgeworth,andwhenI cameto put mylecturenoteson correlation
into written form, probablyaskedsomeonewho attendedthelecturesto examinethe
papersand saywhat was inthem. Only when I now comebackto thepapersof
BravaisandEdgeworthdo I realisenot only that I did grave injusticeto others,but
mademostmisleadingstatements which havebeenspreadbroadcast by thetext-book
writers. (Pearson,1920, p. 29)
The "Theory of Normal Correlation"was one of thetopics dealt withby
Pearson whenhe startedhis lectureson the'Theoryof Statistics"at University
College in the 1894-1895session, "givingtwo hoursa week to a small but
enthusiastic class of two students- Miss Alice Lee, Demonstrator in Physicsat
the Bedford College,andmyself (Yule, 1897a,p. 457).
One ofthese enthusiastic students, Yule, published his famous textbook, An
Introduction to theTheoryof Statistics,in 1911,a book thatby 1920was in its
fifth edition. Laterin the 1920 paper, Pearson again deals curtly with Yule. He
is discussingthe utility of a theoryof correlation thatis not dependenton the
assumptionsof thebivariate normal distributionandsays,"As early as1897Mr
G. U. Yule, thenmy assistant, made an attemptin this direction" (Pearson, 1920,
P- 45).
In fact, Yule (1897b)in a paperin theJournal of the Royal Statistical Society,
had derived least squares solutions to thecorrelationof two, three,and four
variables. This methodis acompelling demonstration of appropriatenessof
Pearson'sformula for r underthe least squares criterion. But Pearsonis not
impressed, or at leastin 1920he is notimpressed:
Are we notmakinga fetish of themethodof leastsquaresasothersmadea fetish of
the normal distribution?... It is by nomeans clear therefore that Mr Yule's
generalisationindicatesthe real line of future advance.(Pearson,1920, p. 45)
This, to say theleast,cold approachto Yule's work (and therearemany
other examplesof Pearson's overt invective) wascertainlynot evident overthe
10 years that span the turn of the 19th to the20th centuries. During what Yule
describedas"the old days"he spent several holidays with Pearson, andeven
whentheir personal relationship hadsoured,Yule states that in nonintellectual
mattersPearsonremained courteousandfriendly (Yule, 1936), althoughone
cannotbut helpfeel that herewe havethe wordsof anessentially kindandgentle
manwriting an obituary noticeof one of thefathersof his chosen discipline.
What was thenatureof this disagreement that so aroused Pearson's wrath?
In 1900, Yule developed a measureof associationfor nominal variables,the
CORRELATION - CONTROVERSIES AND CHARACTER 149
frequenciesof which areentered intothe cells of a contingencytable. Yule
presentsvery simple criteriafor such measures of association, namely, that they
shouldbe zero when thereis norelationship (i.e., the variablesareindependent),
+1 whenthereis complete dependence or association,and -1 when thereis a
complete negative relationship. The illustrative example chosenis that of a
matrix formedfrom cells labeledasfollows:
VaccinatedB Unvaccinated
SurvivedA AB A0
Died aB ap
andYule deviseda measure,Q (named,it appears,for Quetelet), that satisfies
the statedcriteria, and isgiven by,
Yule's paper hadbeen"received"by theRoyal Societyin October 1899,
and "read"in Decemberof thesame year.It wasdescribedby Pearson (1900b),
in a paper that examines the same problem, as "Mr Yule's valuable memoir"
(p. 1). In hispaper, Pearson undertakes to investigate "the theory of thewhole
subject" (p. 1) andarrivesat hismeasureof associationfor two-by-two contin-
gency table frequencies, an index that he calledthe tetrachoric coefficient of
correlation. He examines other possible measure's of association, including
Yule's Q, but heconsiders themto bemerely approximations to thetetrachoric
coefficient. The crucialdifferencebetween Yule's approach andthat of Pearson
is that Yule's criteriafor a measureof associationareempiricalandarithmetical,
whereasthe fundamental assumption for Pearsonwasthat the attributes whose
frequencieswere countedin fact arosefrom an underlying continuous bivariate
normal distribution. The detailsof Pearson's method aresomewhat complex
and are notexamined here. There were a numberof developmentsfrom this
work, including Pearson's derivation, in 1904,of themean square contingency
and thecontingencycoefficient. All these measures demanded the assumption
of an underlying continuous distribution, even thoughthe variables,asthey were
considered, were categorical. Battle commenced late in 1905 when Yule
criticized Pearson'sassumptionsin a paper readto theRoyal Society (Yule,
1906).Biometrika'sreaders were soon to see, "Replyto Certain Criticismsof
Mr G. U. Yule" (Pearson, 1907) and, after Yule' s discussionof his indices
appearedin the first edition of his textbook, were treatedto David Heron's
exhortation, "The Danger of Certain Formulae Suggested asSubstitutesfor the
Correlation Coefficient" (Heron, 1911). These were stirring statistical times,
markedby swingeing attacksas thebiometricians defended their position:
150 10. COMPARISONS, CORRELATIONS AND PREDICTIONS
If Mr Yule's viewsareaccepted,irreparable damage will be doneto thegrowth of
modern statistical theory ... weshall termMr Yule's latest methodof approaching
the problemof relationshipof attributesthe methodof pseudo-ranks... we. . . reply
to certaincriticisms, not to saycharges,Mr Yule hasmade against the work of one
or both of us. (Pearson& Heron, 1913, pp. 159-160)
Articulate snipingand mathematicalbombardment werethe methodsof
attack usedby thebiometricians. Yulewasmore temperatebut nevertheless
quite firm in his views:
All thosewho have diedof small-poxare allequally dead:no one ofthemis more
deador lessdeadthananother,and thedeadarequite distinct fromthe survivors.
The introductionof needlessandunverifiablehypotheses does not appearto me
to be adesirableproceedingin scientific work. (Yule, 1912, pp. 611-612)
Yule, and hisgreat friend Major Greenwood, poked private fun at the
opposition.Partsof a fantasysentto Yule by Greenwoodin November 1913
arereproduced here:
Extractsfrom TheTimes, April 1925
G. Udny Yule,who hadbeen convictedof high treasonon the 7thult, wasexecuted
this morningon ascaffold outside Gower St. Station. A short but painful scene
occurredon thescaffold. As theropewasbeing adjusted, thecriminal made some
observation, imperfectly heard in thepress enclosure, the only audible words being
"the normal coefficientis ." Yule wasimmediately seizedby theImperial guard
andgagged.
Up to thetime of going to pressthewarrantfor theapprehensionof Greenwoodhad
not been executed, but thepolice have what they regard to be animportant clue.
During the usual morning service at St.Paul's Cathedral, which waswell attended,
the carlovingian creed was, in accordancewith animperial rescript, chantedby the
choir. Whenthe solemn words,"I believein oneholy andabsolutecoefficient of
four-fold correlation" were uttereda shabbily dressedman near the North door
shouted"balls." Amid a sceneof indescribable excitement, the vergers armed with
severalvolumesof Biometrika made their way to thespot.(Greenwood,quotedby
MacKenzie, 1981, pp. 176-177)
The logical positionsof the twosides werequite different. For Pearsonit
wasabsolutely necessary to preservethelink with interval-level measurement
wherethe mathematicsof correlationhadbeenfully specified.
Mr Yule ... doesnot stopto discuss whether his attributesarereally continuousor
discrete,or hide under discrete terminology true continuous variates. We seeunder
such class indices as"death"or "recovery","employment"or "non-employment"of
mother, only measures of continuousvariates... (p. 162)
The fog in MrYule's mind is well illustratedby histabl e...(p. 226)
CORRELATION - CONTROVERSIES AND CHARACTER 151
Mr Yule is juggling with class-namesas if they representedreal entities,and his
statisticsonly a form of symbolic logic. No knowledgeof a practical kindevercame
out of theselogical theories,(p. 301) (PearsonandHeron, 1913)
Yule is attackedon almost everyone ofthis paper's157 pages,but for him
the issuewas quite straightforward. Techniques of correlation were nothing
more and nothing less than descriptions of dependencein nominal data. If
different techniques gavedifferent answers,a point that Pearsonand his
followers frequently raised, thenso be it. Themean, median, and mode give
different answersto questions about the central tendency of a distribution,but
eachhas itsutility.
Of coursethe controversywasnever resolved. The controversycannever
be resolved,for thereis noabsolute,"right" answer. Each camp started with
certain basic assumptions, and areconciliationof their viewswas notpossible
unlessone orboth sides wereto have abrogated those assumptions, or unlessit
could have been shown with scientific certainty that only oneside's assumptions
were viable.For Yule it was asituation thathe accepted.In his obituary notice
of Pearsonhe says, "Timewill settlethe questionin duecourse"(p. 84). In this
he waswrong, because there is nolongera question. The pragmatic practitioners
of statisticsin thepresentday arelargely unaware that there ever even was a
question.
The disagreement highlightsthe personalitydifferencesof theprotagonists.
Pearson couldbe describedas adifficult man. He held very strong viewson a
variety of subjects,and he wasalways readyto takeup his pen andwrite scathing
attackson those whomhe perceivedto bemisguidedor misinformed. He was
not ungenerous, and hedevotedimmenseamountsof time andenergyto the
work of his studentsandfellow researchers, but it islikely that tactwas not one
of his strong pointsand anysort of compromisewould be seenasdefeat.
In 1939, Yule commentedon Pearson's polemics, noting that, in 1914,
Pearson said that "Writers rarely . . . understandthe almost religious hatred
which arisesin thetrue man of science whenhe sees error propagated in high
places"(p. 221).
Surely neitherin thebest typeof religion nor in thebest typeof science should hatred
enter in at all. ... In onerespectonly hasscientific controversy perforce improved
since the seventeenth century. If A disagreeswith B's arguments, dislikeshis
personalityand isannoyedby thecock of his hat, he can nolonger, failingall else,
resortto abuseof 'slatinity. (Yule, 1939, p. 221)
Yule had respectand affection for "K.P." eventhough he haddistanced
himself from the biometriciansof Gower Street (where the Biometric Labora-
tory was located). He clearly disliked the way inwhich Pearsonand his
followers closed ranksandpreparedfor combatat thesniff of criticism. In some
152 10. COMPARISONS, CORRELATIONS AND PREDICTIONS
respectswe canunderstand Pearson's attitudes. Perhaps he didfeel isolatedand
beset. Weldon's death in 1906andGallon's in 1911 affectedhim greatly He
was thecolossusof his field, and yet Oxford hadtwice (in 1897and1899) turned
down his applicationsfor chairs,and in1901he applied, again unsuccessfully,
for the chair of Natural Philosophyat Edinburgh. He felt the pressureof
monumentalamountsof work and longedfor greater scopeto carry out his
research.This was indeedto come with grantsfrom the Drapers'Company,
which supportedthe Biometric Laboratory, andGalton's bequest, which made
him Professorof Eugenicsat University Collegein 1911.But more controversy,
andmore bitter battles, this time with Fisher, were just over the horizon.
Once more,we areindebtedto MacKenzie,who soably puts together the
eugenicaspectsof thecontroversy.It is his view that this providesa much more
adequate explanation than onederivedfrom the examinationof a personality
clash. It is certainly unreasonable to deny thatthis was animportant factor.
Yule appearsto have been largely apolitical. He came froma family of
professionaladministratorsandcivil servants, and hehimself worked for the
War Office and for theMinistry of Foodduring World War I, work for which
he receivedthe C.B.E. (Commander of the British Empire)in 1918.He was not
a eugenist,and hiscorrespondencewith Greenwood shows that his attitude
toward the eugenics movement was far from favorable.He was anactive
memberof the Royal Statistical Society, a body that awardedhim its highest
honor, the Guy Medal in gold, in 1911. The Society attracted members who
were interestedin anameliorativeandenvironmentalapproachto social issues
- debateson vaccination werea continuingfascination.Although Major Green-
wood, Yule's closefriend, was at first anenthusiasticmemberof thebiometric
school,his career,as astatisticianin the field of public healthandpreventive
medicine, drewhim toward the realization that povertyand squalor were
powerful factors in the statusandconditionof the lower classes,a view that
hardly reflected eugenic philosophy.
For Pearson, eugenics andheredity shapedhis approachto thequestionof
correlation,and thenotion of continuousvariationwas ofcritical importance.
His notion of correlation,as afunction allowing direct prediction fromonevariable
to another,is shown to haveits roots in thetask that correlationwas supposedto
performin evolutionaryandeugenic prediction. It was notadequate simplyto know
that offspring characteristicswere dependent on ancestral characteristics: this de-
pendencehad to bemeasuredin sucha way as toallow the predictionof theeffects
of natural selection,or of conscious intervention in reproduction. To move in the
direction indicatedhere, from prediction to potential control over evolutionary
processes,required powerful and accurate predictive tools: mere statements of
dependencewould beinadequate. (MacKenzie, 1981, p. 169)
MacKenzie's sociohistorical analysis is both compellingandprovocative,
CORRELATION - CONTROVERSIES AND CHARACTER 1 53
but atleasttwo points, bothof which arerecognizedby him, needto bemade.
The first isthat this viewof Pearson's motivations is contrary to his earlier
expressedviews of the positivistic natureof science(Pearson,1892), and
second,the controversy might be placedin thecontextof a tightly knit academic
group defendingits position. To thefirst MacKenzie says that practical consid-
erations outweighed philosophical ones, and yet it isclear that practical demands
did not leadthe biometriciansto form or to join political groups that might have
made their aspirations reality. The second viewis mundaneandrealistic. The
disciplineof psychologyhasseena numberof controversiesin its short history.
Notable among themis theconnectionist (stimulus-response) versus cognitive
argumentin the field oflearning theory. The phenomenological, the psychody-
namic, the social-learning,and thetrait theorists have arraigned themselves
against each other in a variety of combinationsin personalityresearch.Argu-
ments aboutthe continuity-discontinuityof animalsandhumankindareexem-
plified perhapsby the controversy over language acquisition in the higher
primates. All thesedebatesarefamiliar enoughto today'sstudentsof psychol-
ogy, and it is notinconceivableto view the correlation debateaspart of the
systemof academic"rows" that wi l l remainas long as there is freedomfor
intellectual controversyandscientific discourse.
But the Yule-Pearson debateand its implications are not among those
discussedin university classrooms across the globe. For amultitudeof reasons,
not theleast of which was thehorror of thenegative eugenics espoused by the
German Nazis,and thedemandsof thegrowing disciplineof psychologyfor
quantitative techniques that would help it deal with its subject matter, veryfew
if any of today'spractitionersand researchersin the social sciences think of
biometrics when theythink of statistics. Theybusily get onwith their analyses
and, if they give thanksto PearsonandGallon at all, they remember themfor
their statistical insights rather than for their eugenic philosophy.
11
Factor Analysis
FACTORS
A featureof theideal scientific method that is always hailedasmost admirable
is parsimony.To reducea massof factsto asingle underlying factor or evento
just a fewconceptsandconstructsis anongoing scientific concern. In psychol-
ogy, the statisticaltechnique knownas, factor analysisis most oftenassociated
with the attemptto comprehendthe structureof intelligenceand thedimensions
of personality. The intellectual struggles that accompany discussion of these
mattersare avery longway from being over.
Of all the statistical techniques discussed in this book, factor analysis may
be justly claimedaspsychology's own.As Lawley andMaxwell (1971)and
others have noted, some of the early controversies that swirled around the
methods employed arose from arguments about psychological, rather than
mathematical, matters. Indeed, Lawley andMaxwell suggest that the controver-
sies "discouragedthe interest shownby mathematiciansin the theoretical
problems involved." (p. 1). Infact, the psychological quarrels stem from a much
older philosophical debate that is mostoften tracedto Francis Bacon, and his
assertion that"the mere orderly arrangement of data would makethe right
hypothesisobvious"(Russell, 1961, p. 529). Russellgoeson:
The thing thatis achievedby thetheoretical organizationof scienceis thecollection
of all subordinateinductionsinto a few that arevery comprehensive- perhapsonly
one. Such comprehensive inductions areconfirmedby somany instances that it is
thought legitimateto accept,asregards them, an inductionby simple enumeration.
This situation is profoundly unsatisfactory,(p. 530)
To bejust a little more concrete, in psychology:
We recognizethe importanceof mentalityin our lives andwish to characterizeit, in
154
FACTORS 155
part sothat we canmakethedivisionsanddistinctionsamongpeoplethat our cultural
and political systemsdictate. We therefore givethe word "intelligence" to this
wondrously complex and multifaceted set of humancapabilities.This shorthand
symbol is then reifiedandintelligence achievesits dubiousstatusas aunitary thing.
(Gould, 1981,p. 24)
To besomewhat trite, it is clear that among employed persons there is ahigh
correlationbetweenthe sizeof a monthly paycheckandannualsalary,and it is
easyto seethat both variablesare ameasureof income- the "cause"of the
correlationis clearandobvious. It is therefore temptingnot only to look for the
underlyingbasesof observed interrelationships among variables but toendow
suchfindings with a substance that implies a causal entity, despite the protesta-
tions of thosewho assertthat the statistical associations do not, by themselves,
provide evidencefor its existence. The refined searchfor these statistical
distillations is factor analysis,and it isimportant to distinguishbetweenthe
mathematicalbasesof themethodsand theinterpretationof theresultof their
application.
Now thecorrelation between monthly paycheck andannual salarywill not
be perfect. Interest payments, profits, gifts, even author's royalties, will make
r less than+1, butpresumablyno onewould arguewith the proposition thatif
onerequiresa reasonable measure of incomeandmaterial well-being, either
variable is adequateand theother thereby redundant. Galton (1888a),in
commentingon Alphonse Bertillon's work that was designedto providean
anthropometric indexof identification, in particularthe identificationof crimi-
nals, pointsto thenecessityof estimatingthe degreeof interdependence among
the variables employed, aswell as theimportanceof precisionin measurement
and of not making measurement classifications too wide. Thereis little or
nothingto begainedfrom including in themeasurements a numberof variables
that arehighly correlated, whenonewould suffice.
A somewhatdifferent line of reasoningis to derivefrom the intercorrelations
of the measured variables a set ofcomponents that areuncorrelated. The number
of components thus derived would be thesameas thenumberof variables.The
componentsareorderedso that theinitial ones account for moreof thevariance
thanthe later ones that arelisted in theoutcome. Thisis themethodof principal
component analysis, for which Pearson (1901) is usually cited as theoriginal
inspirationandHotelling (1933, 1935) as thedeveloperandrefiner, although
thereis noindication thatHotelling wasinfluencedby Pearson's work. In fact,
Edgeworth (1892, 1893) had suggesteda schemefor generatinga function
containing uncorrelated terms that wasderivedfrom correlated measurements.
Macdonell (1901)was also an early workerin the field of "criminal anthro-
pometry."He acknowledgesin his paper that Pearson haspointedout to him a
methodof arriving at ideal characteristics that "would be given if we calculated
156 11. FACTOR ANALYSIS
the seven [there were seven measures] directions of uncorrelated variables, that
is, theprincipal axesof thecorrelation 'ellipsoid'"(p. 209).
The method,asLawley andMaxwell (1971) have emphasized, although it
haslinks with factor analysis, is not to betakenas avariety of factor analysis.
The latter is amethod that attempts to distill down or concentratethe covariances
in a set ofvariablesto amuch smaller number of factors. Indeed, none other
than Godfrey Thomson(1881-1955),a leading British factor theorist from the
1920sto the1940s, maintained in a letter to D. F.Vincent of Britain's National
Institute of Industrial Psychology that "he did notregard Pearson's 'lines of
closestfit as anythingto dowith factor analysis.'" (noted by Hearnshaw, 1979,
p. 176).The reasonfor theongoing association of Pearson's name with the birth
of factor analysisis to befound in the role playedby another leading British
psychologist,Sir Cyril Burt (1883-1971)in thewriting and therewriting of the
history of factor analysis, an episode examined later in this chapter.
In essence,the method of principal componentsmay be visualizedby
imagining thevariables measured - thetests- aspoints in space.Teststhat are
correlatedwill be close together in clusters,andtests thatare notrelatedwill be
further away fromthe cluster. Axesor vectorsmay beprojected into thisspace
throughthe clustersin a fashion that allowsfor as muchof the varianceas
possibleto beaccounted for. Geometrically this projection cantake placein
only three dimensions, but, algebraically anw-dimensionalspacecan becon-
ceptualized.
THE BEGINNINGS
In 1904 Charles Spearman (1863-1945)publishedtwo papers (1904a, 1904b)
in the same volumeof theAmerican Journalof Psychology, thenthe leading
English language journal in psychology.The first ofthese,a general account of
correlation methods, their strengths andweaknesses, and thesourcesof error
that may beintroduced that might "dilate" or "constrict" the results, included
criticism of Pearson'swork.
In his Huxley lecture (sponsored by theAnthropological Instituteof Great
Britain andIreland) of 1903, Pearson reported on agreat amount of data that
hadbeen collectedby teachersin a numberof schools.The physical variables
measured included health, hair and eyecolor, hair curliness, athletic power, head
length,breadth,andheight,and a"cephalic index."The psychological charac-
teristics were assertiveness, vivacity, popularity, introspection, conscientious-
ness, temper, ability, and handwriting.The average correlations (omitting
athletic power) reported between the measuresof brother, sister, andbrother-
sister pairsareextraordinarily similar, ranging from .51 to.54.
We areforced, 1 think literally forced,to thegeneral conclusion that the physicaland
THE BEGINNINGS 157
psychical charactersin men areinherited withinbroadlines in thesamemannerand
with the sameintensity. (Pearson,1904a,p. 156).
Spearmanfelt that the measurements taken could have been affected by
"systematic deviations," "attenuation" produced by the suggestion thatthe
teacher'sjudgments werenot infallible, and bylack of independencein the
judgements,finally maintaining that:
Whenwe further consider that each of these physical andmental characteristics will
have quitea different amountof such error(in theformer this being probably quite
insignificant) it is difficult to avoid the conclusion thatthe remarkable coincidences
announced between physical and mental hereditycan hardly be more than mere
accidental coincidence. (Spearman, 1904a,p. 98).
But Pearson's statistical procedures were not underfire, for later he says:
If this workof Pearsonhasthus been singled out for criticism, it is certainly fromno
desire to undervalueit. ... My present objectis only to guard against premature
conclusionsand topoint out theurgent needof still further improvingthe existing
methodicsof correlational work. (Spearman, 1904a,p.99)
Pearsonwas notpleased. Whenthe addresswasreprintedin Biometrikain
1904he includedan addendumreferringto Spearman's criticism that added fuel
to the fire:
The formula inventedby Mr Spearmanfor his so-called"dilation" is clearly wrong
... notonly are hisformulae, especiallyfor probableerrorserroneous, but hequite
misunderstandsandmisuses partial correlation coefficients, (p. 160).
It might be noted that neither manassumed that the correlations, whatever
they were, mightbe due toanything other than heredity, andthat Spearman
largely objected to the data and possible deficienciesin its collection and
Pearson largely objected to Spearman's statistics. The exchange ensured that
the opponents would never even adequately agree on the location of the
battlefield, let alone resolvetheir differences.Spearmanwas tobecome, suc-
cessively, Reader andheadof a psychological laboratory at University College,
London,in 1907, Grote Professor of thePhilosophyof Mind andLogic in 1911,
and Professorof Psychologyin 1928 until his retirementin 1931. He was
thereforea colleagueof Pearson'sin the same collegein the same university
during almostthe whole of thelatter's tenureof theChair of Eugenics. Their
interests and their methodshad agreat deal in common, but they never
collaborated and they disliked each other. They clashed on the matter of
Spearman's rank-difference correlation technique, and that and Spearman's
new criticism of PearsonandPearson's reaction to it beganthe hostilities that
158 11. FACTOR ANALYSIS
continuedfor many years, culminatingin Pearson's waspish, unsigned review
(1927) of Spearman's(1927)book TheAbilities of Man. Pearson dismisses
the book as,"distinctly writtenfor thelayman"(p. 181) andclaims that, "what
ProfessorSpearman considers proofs are notproofs. . . With the failure of
ChapterX, that is, 'Proof thatG and Sexist,'the very backbone disappears from
the body of Prof. Spearman's work" (p. 183).
Nowherein this book does Pearson's name appear! In his 1904 paper,
Spearman acknowledges that Pearson had given the namethe methodof
"product moments" to thecalculationof thecorrelation coefficient, but, in a
footnote in his book, Spearman (1927), commenting again on correlation
measures, remarks that:
easily foremostis theprocedure whichwassubstantially givenin a beautiful memoir
of Bravais andwhich is nowcalled thatof "product moments." Such procedures
have been further improved by Galton who inventedthedevice- since adopted
everywhere- of representingall gradesof interdependence by asingle number, the
"coefficient," which rangesfrom unity for perfect correlation down to zerofor entire
absenceof it. (p. 56)
So muchfor Pearson!
The second paper of 1904hadproduced Spearman's initial conclusion that:
The . . .observed facts indicate that all branchesof intellectual activity havein
commononefundamentalfunction (or group offunctions) , whereasthe remaining
or specific elementsof the activity seemin every caseto bewholly different from that
in all the others. (Spearman, 1904b, p. 284)
This is Spearman'sfirst general statement of the two-factor theoryof
intelligence,a theory thathe was tovigorouslyexpoundanddefend. Central to
this early workwas thenotion of a hierarchyof intelligences. In examining data
drawnfrom the measurement of childrenfrom a "High Class Preparatory School
for Boys" on avariety of tests,Spearman starts by adjustingthe correlation
coefficientsto eliminateirrelevantinfluencesanderrorsusingpartial correlation
techniquesand then orderingthe coefficients from highestto lowest. He
observeda steady decrease in the valuesfrom left to right andfrom top to bottom
in the resulting table.
The samplewasremarkablysmall (only 33), but Spearman (1904b) confi-
dently andboldly proclaims that his methodhasdemonstratedthe existenceof
"GeneralIntelligence,"which, by 1914,he waslabelingg. Hemadeno attempt
at this timeto analyzethe natureof his factor, but henotes:
An important practical consequence of this universalUnity of the Intellectual
Function, the various actual formsof mental activity constitutea stableinter-
THE BEGINNINGS 159
Classics French
English Math Discrim Music
Classics 0.83 0.78 0.70 0.66 0.63
French 0.83 0.67 0.67 0.65 0.57
English 0.78 0.67 0.64 0.54 0.51
Math 0.70 0.67 0.64 0.45 0.51
Discrim 0.66 0.65 0.54 0.45 0.40
Music 0.63 0.57 0.51 0.51 0.40
FIG 11.1 Spearman's hierarchical ordering of correlations
connectedHierarchy accordingto thedifferent degreesof intellective saturation,(p.
284)
This work may bejustly claimedto include the first factor analysis. The
intercorrelationsareshownin Figure 11.1.
The hierarchical orderof the correlationsin the matrix is shown by the
tendencyfor thecoefficientsin a pair of columnsto bearthe same ratioto one
another throughout that column.
Of course,the techniqueof correlation that began with Galton (1888b) is
central to all forms of factor analysis, but, more particularly, Spearman's work
relied on theconcept of partial correlation, which allowsus toexaminethe
relationships between, for example,two variables whena third is held constant.
The method gave Spearman the meansof making mathematical the notion that
the correlationof a specific test witha general factorwascommonto all tests
of intellectual functioning.The remaining portionof the variance, errorex-
cepted,is specificanduniqueto thetest that measures the variable. Thisis the
essenceof what cameto beknown as thetwo-factor theory.
The partial correlationof variables1 and 2whena third, 3, is held constant
is given bv
So that = and therefore,
r
a
j ri,d
160 11. FACTOR ANALYSIS
If thereare twovariablesa and b and g is a constantand theonly causeof
the correlation (r
ab
) betweena and b is g,thenr
ah
_
g
would be zeroandhence
If a is set toequalb then
This latter represents the variancein thevariablea that is accountedfor by
g andleadsus to thecommunalitiesin a matrix of correlations.
If now wewereto take four variablesa, b, c, and d, and to considerthe
correlationof a and bwith c and d,then
The left-hand sideof this latter equationis Spearman's famous tetrad
difference, and hisproof of it is given in anappendixto his 1 921book. The term
tetrad difference first appearsin the mid 1920s (see e.g., Spearman & Holzinger,
1925).
So, amatrix of correlations suchas
generatesthe tetraddifferencer
ac
r
M
- r
hc
r
ail
and, without beingtoo mathemati-
cal, this is thevalueof the minor determinant of order two. Whenall these
minor determinantsarezero,the matrix is of rank one. Moreover, the correla-
tions can beexplainedby onegeneral factor.
Wolfle (1940) has noted thatconfusion about whatthe tetrad difference
meantor implied persisted over a good many years. It wassometimes thought
that whena matrix of correlations satisfied the tetrad equation, every individual
measurementof every variable could onlybe divided intotwo independent
parts. Spearman insisted that what he maintainedwas that this division could
THE BEGINNINGS 161
be made,not that it was theonly possible division. Nevertheless, pronounce-
ments aboutthe tetrad equationby Spearman werefrequently followed by
assertions such as,"The onepart hasbeen calledthe 'general factor' anddenoted
by the letter g . . . Thesecond parthasbeen calledthe 'specific factor'and
denotedby theletter s" (Spearman, 1927, p. 75).
Moreover, Spearman'sfrequent referencesto "mental energy" gavethe
impression thatg was"real," even though, especially when challenged, he
deniedthat it was anything more thana useful, mathematical, explanatory
construction. Spearman made ambiguous statements about g, but itseems that
he was convinced thatthe two-factor theorywas paramountand thetetrad
differenceformedan important partof his argument.Not only that, Spearman
wassatisfied that Garnett's earlier "proof (1920)of thetwo-factor theoryhad
effectively dealt with any criticism:
Thereis another importantlimitation to thedivision of thevariables into factors.It
is that the division into generalandspecific factorsall mutually independent can be
effectedin one wayonly; in other words,it is unique.(Spearman,1927, p. vii)
In 1933 BrownandStephenson attempted a comprehensive test of thetheory
usinganinitial batteryof 22 testson asampleof 300boys aged between 10 and
10'/2 years. But they "purified"the batteryby dropping tests that for onereason
or anotherdid not fit andfound supportfor the two-factor theory. Wolfle's
(1940) reviewof factor analysis says of this work: "Whetheronecreditsthe
attempt with success or not, all that it provesis this; if one removesall tetrad
differenceswhich do notsatisfy the criterion, the remaining onesdo satisfyit"
(p. 9).
Pearsonand Moul (1927), in a lengthy paper, attempt a dissectionof
Spearman'smathematics,in particular examiningthe sampling distributionof
the tetradsandwhetheror not it can bejustified asbeing closeto thenormal
distribution.They conclude: "the claimof Professor Spearman to have effected
a Copernican revolution in psychology seems at present premature" (p. 291).
However, theydo state that even though they believe the theoryof general
and specific factorsis "too narrowa structureto form a framefor the great
variety of mental abilities," they believe that it shouldandcould be testedmore
adequately.
Spearman's annoyance with Pearson's view of his work was not sotroubling
as thepronouncements on it madeby theAmerican mathematician E. B. Wilson
(1879-1964),Professorof Vital Statisticsat Harvard's School of Public Health.
The episodehasbeen meticulously researched and acomprehensive account
given by Lovie andLovie (1995). Spearman met Wilson at Harvard duringa
visit to the United States latein 1927. Wilson hadreadThe Abilities of Man
162 11. FACTOR ANALYSIS
and hadformedthe opinion thatits author's mathematics were wanting and had
tried to explainto him that it waspossibleto "get the tetraddifferencesto vanish,
one andall, in somany ways that onemight suspect that the resolution intoa
general factorwas notunique" (quotedby Lovie & Lovie, 1995,p. 241).
Wilson subsequently reviewed Spearman's book for Science.It is generally
a friendly review. Wilson describes the book as "animportantwork," written
"clearly, spiritedly, suggestively, in places even provocatively," and hiswords
concentrateon themathematical appendix. Wilson admits thathis review is
"lop-sided,"but hemaintains that Spearman hasmissed someof the logical
implicationsof themathematics, and inparticular thathis solutionsareindeter-
minate. But heleavesa loopholefor Spearman:
Do g
x
, g ,. .. whether determinedor undeterminable represent theintelligence
of x, y,.. . ? Theauthor advancesa deal of argumentand ofstatisticsto show that
they do. This is for psychologists, not for me toassess (Wilson, 1928,p. 246)
Wilson did not want to destroythe basic philosophyor to undermine
completelythe thrustof Spearman's work. Later in 1929,Wilson publisheda
more detailed mathematical critique (1929b) and acorrespondence between him
and Spearman ensued, lasting at least until 1933.The Lovies have analyzed
theseexchangesand show thata solution, or, asthey termit, "an uneasy
compromise" was reached. A "socially negotiated" solutionsaw Spearman
accepting Wilson's critique, Garnett to modifying his earlier "proof of the
two-factor theory,andWilson offering waysin which the problems might be at
least modified,if not overcome. Morespecifically, the idea of a partial
indeterminacyin g wassuggested.
REWRITING THE BEGINNINGS
In 1976, the world of psychology,and British psychologyin particular, was
shakenby allegations publishedin the Sunday Timesby Oliver Gillie, to the
effect that Sir Cyril Burt (1883-1971), Spearman's successor asProfessorof
Psychology at University College,hadfaked a large partof thedatafor his
researchon twins and,among other labels in a seriesof articles, called Burt "a
plagiarist of long standing." Burt was aprominentfigure in psychologyin the
United Kingdom and hiswork on theheritability of intelligence bolsteredby
his dataon itsrelationship among twins reared apart, was notonly widely cited,
but influenced educational policy. Leslie Hearnshaw, Professor Emeritus of
Psychologyat theUniversity of Liverpool, was,at that time, researchinghis
biography (publishedin 1979) of Burt, andwhenit appearedhe had notonly
examinedthe apparentfraudulentnatureof Burt's databut healso reviewedthe
contributionsof Burt to thetechniqueof factor analysis.
REWRITING THE BEGINNINGS 163
The "Burt Scandal"led to adebateby theBritish Psychological Society, the
publicationof a "balancesheet"(Beloff, 1980), and anumberof attemptsto
rehabilitatehim (Fletcher, 1991; Jensen, 1992; Joynson, 1989). It mustbe said
that Hearnshaw's view that Burt re-wrote history in anattemptto place himself
rather than Spearman as thefounder of factor analysishas notbeen viewedas
seriously as theallegationsof fraud. Nevertheless, insofar as Hearnshaw's
words have been picked over in theattemptsto atleast downplay, if not discredit,
his findings, andinsofaras therecordis importantin thehistory and develop-
ment of factor analysis, they areworth examininghere. Once morethe credit
mustgo to theLevies (1993),who have provideda "side-by-side"comparison
of Spearman'sprepublication notesand commentson Burt's work and the
relevant sectionsof thepaper itself.
The publicationof Burt's (1909) paper on "general intelligence" was pre-
cededby some important correspondence with Spearman,who saw thepaper
asoffering supportfor his two-factor theory.In fact, Spearman re-worked the
paper, and large sectionsof Spearman's notes on it were used substantially
verbatimby Burt. As theLovies point out, thiswas notblatant plagiarismon
Burt's part, even though the latter's acknowledgement of Spearman's role was
incomplete:
but was akind of reciprocatedself-interest about theSpearman-Burtaxis at this time
which allowed Burt to copy from Spearman,and Spearmanto view this with
comparativeequanimitybecausethearticle provided suchstrongsupportfor the two
factor theory, (p. 315)
However,of more importas far as our history is concernedis Hearnshaw's
contention that Burt,in attemptingto displace Spearman, used subtle and
sometimes not-so-subtle commentary to placethe origins of factor analysis with
Pearson's(1901) articleon "principle axes"and tosuggest that he (Burt) knew
of thesemethods beforehis exchangeswith Spearman. Indeed, Hearnshaw
(1979) reportson Burt's correspondence with D. F. Vincent (noted earlier) in
which Burt claimed that he hadlearnedof Pearson's work when Pearson visited
Oxford and that it was then thathe andMcDougall (Burt's mentor) became
interestedin the techniques. After Spearman died, Burt's papers repeatedly
emphasizethe priority of PearsonandBurt's rolein theelaborationof methods
that became knownasfactor analysis. Hearnshaw notes that Burt did not
mentionPearson's 1901 work until 1947andimplies that it was thepublication
of Thurstone's book (1947) Multiple Factor Analysis and itsfinal chapteron
"The PrincipalAxes" that alerted Burt.It should alsobe noted that Wolfle's
(1940) review, although citing11 of Burt's papersof the late 1930s, does not
mentionPearsonat all.
It will be some years beforethe "Burt scandal" becomes no more thanan
164 11. FACTOR ANALYSIS
historical footnote. Burt'smajor work The Factorsof the Mind remainsa
significant andimportant contribution, not only to thedebate about the nature
of intelligence,but alsoto theapplicationof mathematical methodsto thetesting
of theory. Eventhe most vehement critics of Burt wouldlikely agreeand, with
Hearnshaw, might say, "It is lamentable that he should have blottedhis record
by the delinquenciesof his later years." In particular, Burt's writings moved
factor theorizing awayfrom the Spearman contention that g wasinvariant no
matter what typesof tests were used - in other words, that different batteriesof
tests should produce the same estimates of g. Theideaof a numberof group
factors thatmay beidentifiedfrom setsof teststhat hadsimilar, althoughnot
identical, content,- for example, verbal factors, numeracy factors, spatial
factorsand so on - was central in thedevelopmentof Burt's writings. Overlap
among the tests within each factordid not suggest thatthe g values were
identical. Again, Spearmandid not dismiss these approaches out of hand,but
it is plain thathe alwaysfavoredthe two-factor model.
THE PRACTITIONERS
Although the distinctionmay besomewhatsimplistic, it is possibleto separate
the theoreticiansfrom the applied practitionersin these early yearsof factor
analysis. Lovie (1983) discusses the essentially philosophical andexperimental
drive underlyingSpearman's work, aswell asthat of Thomsonand Thurstone.
Spearman's early paper (1904a) begins with an introduction that discusses the
"signs of weakness"in experimental psychology; he avers that"Wundt's
disciples havefailed to carry forward the work in all thepositive spiritof their
master,"and heannouncesa:
"correlationalpsychology,"for thepurposeof positively determiningall psychical
tendencies,and inparticular thosewhich connecttogetherthe so-called"mental
tests"with psychical activitiesof greatergeneralityandinterest,(p. 205)
The work culminatedin a book thatset out hisfundamental,if not yet
watertight andcomplete, conceptual framework. Burt's work with its continu-
ing premiseof theheritability of intelligencealso showsthe needto seek support
for a particular constructionof mental life.
On the other hand, a numberof contemporary researchers, whose contribu-
tions to factor analysis were also impressive, were fundamentally applied
workers. Two, KelleyandThomson, werein fact Professorsof Education,and
a third, Louis Thurstone,was agiant in the field of scaling techniquesand
testing. Their contributions were marked by thetask of developing testsof
abilities and skills, measuresof individual differences. Truman L. Kelley
(1884-1961),Professorof Educationand Psychologyat Stanford University,
THE PRACTITIONERS 165
published his Crossroads in the Mind of Man in 1928. In his preface he
welcomes Spearman'swork, but in hisfirst chapter, "Boundariesof Mental
Life, " he makesit clear that his work indicates that several traits areincluded
in Spearman'sg, and heemphasises these group factors throughout his book:
Mental life doesnot operatein a plain but in anetworkof canals. Though each canal
may haveindefinite limits in lengthand depth,it doesnot in width; though each
mentaltrait may grow andbecome moreandmore subtle, it doesnot loseits character
anddiscretenessfrom other traits,(p. 23)
Lovie and Lovie (1995) reporton correspondence between Wilson and
H. W. Holmes, Deanof theSchool of Educationat Harvard, after Wilsonhad
readandreviewed (1929a) Kelley' s book "If Spearmanis ondangerous ground,
Kelley is sitting on a volcano."
Wilson's (1929a) reviewof Kelley again concentrates on the mathematics
andagain raises, in even more acute fashion, the problemof indeterminacy.It
is not anunkind review,but it turns Kelley's statement (quoted above) on its
head:
Mental life in respectof its resolutioninto specificand general factorsdoesnot
operatein a networkof canalsbut in acontinuouslydistributed hyperplane... no
trait has anyseparateor discrete existence, for eachmay bereplacedby proper linear
combinationsof theothers proceeding by infinitesimal gradationsand it isonly the
complexof all that hasreality. Insteadof writing of crossroadsin themind of man
oneshouldspeakof man's trackless mental forest or tundraor jungle - mathemati-
cally mind you, accordingto theanalysisoffered. The canalsor crossroads have been
put in without any indication,so far as I cansee, that they have been put in anyother
way than we put in roads acrossour midwestern plains, namely to meet the
convenienceor to suit thefancy of thepioneers,(p. 164)
Wilson, a highly competent mathematician, hadobviouslyhad astruggle
with Kelley's mathematics. He concludes:
The fact of thematteris that the authorafter an excellent introductory discussion of
the elementsof thetheoryof theresolutioninto generalandspecific factors... gives
up the general problem entirely, throws up hishandsso tospeak,andproceedsto
develop special methods of examininghis variablesto seewhere there seemto be
specific bonds between them. (p. 160)
Kelley, perhapsa little bemused, perhaps a little rueful, andalmost certainly
grateful that Wilson's reviewhad not been more scathing, responds: "The
mathematical tool is sharpand I maynick, or may already have nicked myself
with it. At any rate I do still enjoy the fun of whittling and of fondling such
clean-cut chipsasWilson has letfall" (p. 172).
Sir Godfrey Thomsonwas Professorof Educationat the University of
166 11. FACTOR ANALYSIS
Edinburgh from 1925 to 1951. He hadlittle or no training in psychology;
indeed,his PhD was inphysics.After graduate studies in Strasbourg, he returned
hometo Newcastle, England, where he wasobligedto takeup apostin education
to fulfill requirements that were attached to thegrantshe hadreceivedas a
student.His work at Edinburghwasalmostwholly concerned withthe devel-
opmentof mental testsand therefining of his approachto factor analysis. In
1939his book TheFactorial Analysisof Human Ability waspublished, a book
that establishedhis reputationand set him firmlyoutsidethe Spearmancamp.
A later edition (1946) expanded his views andintroducedto awider audience
the work of Louis Thurstone(1887-1955).Thomson's nameis not aswell-
known asSpearman's,nor is hiswork much cited. Thisis probably because he
did not developa psychological theoryof intelligence, althoughhe did offer
explanationsfor his findings. Hecan, however, be regardedasSpearman's main
British rival, and heparts companywith him onthreemain grounds:first, that
Spearman'sanalysishad not andcould not conclusively demonstratethe "ex-
istence"of g; second, that evenif the existenceof g wasadmitted, it was
misleadinganddangerousto reify it so that it becamea crucial psychological
entity; andthird, that Spearman's hierarchical model hadbeen overtakenby
multiple factor models that were at once more sophisticated and hadmore
explanatory power.
The Spearmanschool of experimenters, however, tend always to explain asmuch
aspossibleby onecentral factor... .
Thereareinnumerable other ways of explaining these same correlations ...
And the final decision between themhas to bemadeon some othergrounds.The
decisionmay bepsychological.... Or thedecisionmay bemadeon theground that
we shouldbeparsimoniousin our inventionof "factors,"andthat whereonegeneral
and onegroupfactor will servewe shouldnot inventfive group factors..(Thomson,
1946, p. 14-15)
Thomsonwas notsaying that Spearman's view doesnot make sensebut that
it is not inevitably the only view. Spearmanhadagreed that g waspresentin
all the factorsof the mind, but in different amountsor "weights," andthat in
some factorsit wassmall enoughto beneglected. Thomson welcomed these
views, "provided thatg is interpretedas amathematical entity only, and
judgementis suspendedas towhetherit is anything more thanthat" (p. 240).
And his verdict on two-factor theory?After commenting that the method
of two factors was ananalytical devicefor indicating their presence, that
advancesin methodhad been made that led to multiple factor analysis,and
further "It was Professor Thurstone of Chicagowho sawthat onesolutionto the
problem couldbe reachedby ageneralizationof Spearman's ideaof zerotetrad
differences."(p. 20).
THE PRACTITIONERS 167
Thomson himself hadsuggesteda "sampling theory"to replaceSpearman's
approach:
The alternativetheoryto explain the zerotetraddifferencesis that eachtest calls
upon a sampleof bonds whichthemind canform, andthat someof thesebondsare
common to two testsandcausetheir correlation,(p. 45)
Thomsondid notgive more thana very general viewof what the bonds might
be, althoughit seems that they were fundamentally neurophysiological and,
from a psychological standpoint, akin to theconnectionsin the connectionist
views of learningfirst advancedby Thorndike:
What the "bonds"of themind are,we do notknow. But they are fairly certainly
associatedwith theneuronesor nerve cellsof our brains...Thinking is accompanied
by the excitationof theseneuronesin patterns.Thesimplestpatternsareinstinctive,
more complexonesacquired. Intelligenceis possibly associatedwith the number
andcomplexity of thepatterns whichthebrain can (orcould) make. (Thomson, 1946,
p. 51)
And, lastly, to complete this short commentary on Thomson's efforts, his
demonstrationof geometrical methodsfor the illustration of factor theory
contrasts markedly with both Spearman andBurt's workand ismuch morein
tunewith the "rotation" methodsof later contributors, notably Thurstone.
In 1939, aspart of a General Meetingof the British Psychological Society,
Burt, Spearman, Thomson (1939a), and Stephenson each gave papers on the
factorial analysisof humanability. These papers were published in the British
Journal of Psychology, together with a summingup by Thomson (1939b).
Thomson pointsout that neither Thurstone nor any of hisfollowers were present
to defend their viewsandgivesa very brief andfavorable summaryof them.
He even says, citing Professor Dirac at a meetingof the Royal Societyof
Edinburgh:
When a mathematical physicist finds a mathematical solutionor theorem whichis
particularly beautiful... he canhave considerable confidence that it will prove to
correspondto something real in physical nature. Something of thesamefaith seems
to lie behind Prof.Thurstone'strust in thecoincidenceof "Simple Structure"in the
matrix of factor loadings with psychological significance in thefactors thus defined.
(Thomson,1939b,p. 105)
He also noted that Burt had"expressedthe hope thatthe 'tetraddifference'of
the four symposiasts wouldbefound to vanish!" (p. 108).
Louis L. Thurstone produced his first work on "the multiple factor problem"
in 1931. TheVectorsof the Mind appearedin 1935andwhat he himself defined
168 11. FACTOR ANALYSIS
as adevelopmentand expansionof this work, Multiple Factor Analysis,in
1947. Reprintsof thebook werestill being published longafter his death.The
following schememight givesomefeel for Thurstone'sapproach.We looked
earlier at the matrix of correlations that producea tetrad difference.Now
consider:
This is aminor determinantof order three.Now a newtetradcan beformed:
If this new tetrad is zerothenthe determinant "vanishes." This procedure
can becarriedout with determinantsof anyorder and if wecometo a stage
whenall theminors"vanish" the "rank" of thecorrelation matrixwill be reduced
accordingly. Thurstonefound that testscould be analyzed intoasmany com-
mon factorsas thereduced rankof thecorrelation matrix.The present account
follows that of Thomson (1946, chap. 2), whogivesa numerical example. The
rank of thematrix of all thecorrelationsmay bereducedby inserting valuesin
the diagonalof thematrix - thecommunalities.Thesevaluesmay bethought
of asself-correlations,in which casethey are all 1. But they may alsobe regarded
asthat part of thevariancein thevariable thatis due to thecommon factors,in
which casethey haveto beestimatedin some fashion. They might even be
merely guessed. Thurstone chose values that made the tetrads zero. If these
methods seemnot to beentirely satisfactory now, they were certainly not
welcomedby psychologistsin the 1920s, 1930sand 1940swho were tryingto
cometo grips withthe newmathematical approaches of thefactor theoristsand
their views of human intelligence. Thurstone's centroid method bearssome
similarities to principal components analysis. The word centroid refersto a
multivariate average, andThurstone's "first centroid" may bethoughtof as an
averageof all thetestsin a battery. Central to theoutcomeof theanalysisis the
productionof'factor loadings- values that express the relationshipof thetests
to the presumed underlying factors. The ways in which these loadingsare
arrived at aremany andvarious,andThurstone's centroid technique wasjust
oneearly approach.However, it was anapproach that wasrelatively easyto
apply and (with some mathematics) relatively easy to understand.How the
factors wereinterpretedpsychologically was, and is, largely a matter for the
researcher. Thus teststhat were related that involved arithmeticor numerical
THE PRACTITIONERS 169
reasoningor number relationships might be labeled numerical,or numeracy.
Thurstone maintained that complete interpretation of factors involvedthe
techniqueof rotation. Loadingsin a principal factors matrix reflect common
factor variancesin test scores.The matricesare arbitrary for they can be
manipulatedto showdifferent axes- different "frames of reference."We can
only makescientific senseof the outcomes whenthe axes are rotatedto pass
through loadings that showan assumed factor that represents a "psychological
reality." Such rotations, which may produce orthogonal (uncorrelated) axes or
oblique (correlated)axes,were first carriedout by amixture of mathematics,
intuition, and inspirationand have becomea standard partof modern factor
analysis. It is perhaps worth noting that Thurstone's early training was in
engineeringandthat he taught geometryfor a time at theUniversity of Minne-
sota.
The configurational interpretations areevidently distasteful to Burt, for hedoesnot
havea singlediagramin his text. Perhapsthis is indicativeof individual differences
in imagery typeswhich leadsto differencesin methodsand interpretationamong
scientists. (Thurstone, 1947, p. ix)
Thurstone'swork eventually produceda set ofseven primary mental abili-
ties, verbal comprehension, word fluency,numerical, spatial, memory, percep-
tual speed,andreasoning. Thiskind of scheme became widely popular among
psychologistsin theUnited States, although manywere disconcerted by thefact
that the numberof factors tendedto grow. J. P.Guilford (1897-1987)devised
a theoretical model that postulated 120 factors (1967),and by 1971the claim
wasmade that almost 100 ofthemhadbeenidentified.
An ongoing problemfor the early researchers was theevident subjective
elementin all the methods. Mathematicians stepped into the picture (Wilson
was one of theearly ones),often at therequestof the psychologistsor educa-
tionists who were not generally mathematical sophisticates (Thomson, not
trained as apsychologist,was anexception),but they were remarkably quick
learners. The searchfor analytical methodsfor assessing "simple structure"
began. Carroll (1953) wasamongthe earliestwho tackledthe problem:
A criticism of current practicein mul ti pl e factor analysisis that the transformation
of theinitial factor matrixFto arotated "simple stucture" matrix ^must apparently
be accompaniedby methods which allow considerable scopefor subjectivejudge-
ment, (p. 23)
Carroll's papergoeson to describea mathematical method that avoids
subjectivedecisions. Modern factor analysis had begun. From thenon agreat
variety of methodswasdeveloped,and it is noaccident that such solutions were
developed coincidentallywith the rise of the use of thehigh-speed digital
170 11. FACTOR ANALYSIS
computer,which removedthe computational labor fromthe more complex
procedures.Harman (1913-1976), building on earlier work (Holzinger
[1892-1954]& Harman, 1941), published Modern Factor Analysis in 1960,
with a secondand athird (1976) edition, and thebook is aclassicin the field.
In researchon thestructureof human personality, the giantsin the areawere
Raymond Cattell(1905-1998)and Hans Eysenck(1916-1997).Cattell had
beena studentof Cyril Burt, and in hisearly work had introduced the concept
of fluid intelligence thatis akin to g, abroad, biologicallybasedconstructof
general mental abilityandcrystallized intelligence that depends on learningand
environmental experience. Later, Cattell's work concentrated on theidentifica-
tion andmeasurement of thefactorsof humanpersonality (1965a), and his16PF
(16 personality factors) test is widely used.His Handbook of Multivariate
Experimental Psychology (1965b) is a compendiumof the multivariateap-
proach witha numberof eminent contributors. Cattell himself contributed 6
of the 27chaptersandco-authored another.
Cattell's research output, over a very long career, wasmassive,as wasthat
of Eysenck.The latter's method, which he termed criterion analysis, was a use
of factor analysisthat attemptedto confirm the existenceof previously hypothe-
sized factors. Three dimensions were eventuallyidentified by Eysenckand his
work helpedto set off one of psychology's ongoing arguments about just how
manypersonality factors there really were. This is not theplaceto review this
debate,but it does,once again, place factor analysis at thecenterof sometimes
quite heated discussion about the "reality" of factorsand theextent to which
they reflect the theoreticalpreconceptionsof the investigators."What comes
out is nomore than what goesin," is the cry of thecritics. Whatis absolutely
clear from this situationis that it is necessaryto validate the factors by
experimental investigation that stands aside from the methods usedto identify
the factorsandavoid capitalizingon semantic similarities among the tests that
were originally employed. It has to beacknowledged, of course, that both Cattell
and Eysenck (1947; Eysenck & Eysenck, 1985) andtheir many collaborators
have triedto dojust that.
Recent years have seen an increasein the use offactor-analytictechniques
in a variety of disciplinesand settings, even apparently in the analysisof the
performanceof racehorses!In psychology,the sometimes heated discussion of
the dangersof thereification of factors, their treatment aspsychological entities,
aswell asarguments over, for example, justhow many personality dimensions
are necessaryand/or sufficient to define variationsin human temperament,
continue.At bottom, a large partof these problems concerns the use of the
technique either as aconfirmatory toolfor theoryor as asearchfor newstructure
in our data.Unless thesetwo quitedifferent views arerecognizedat thestartof
discussionthe debatewill takeon anexasperatingfutility that stiflesall progress.
12
The Design
of Experiments
THE PROBLEM OF CONTROL
WhenRonald Fisher accepted the postof statisticianat Rothamsted Experimen-
tal Stationin 1919, the tasksthat facedhim wereto make whathe could of a
large quantityof existing datafrom ongoing long-term agricultural studies (one
hadbegunin 1843!) and to try toimprovethe effectivenessof future field trials.
Fisher later describedthe first of these tasksas"raking over the muck heap";
the secondhe approached withgreatvigor andenthusiasm, layingas he did so
the foundationsof modern experimental design andstatistical analysis.
The essential problemis the problemof control. For thechemist in the
laboratory it is relatively easyto standardizeandmanipulatethe conditionsof
a specific chemical reaction. Social scientists, biologists, andagriculturalre-
searchers have to contend withthe fact that their experimental material (people,
animals, plants) is subjectto irregular variation that arises as aresultof complex
interactions of genetic factorsand environmental conditions. Thesemany
variations, unknownand uncertain, makeit very difficult to beconfident that
observeddifferencesin experimental observations are due to the manipulations
of the experimenter rather than to chance variation. The challengeof the
psychological sciences is thesensitivityof behaviorandexperienceto amulti-
plicity of factors. But in manyrespectsthe challengehas notbeenanswered
becausethe unexplained variationin our observationsis generallyregardedas
a nuisanceor asirrelevant.
It is useful to distinguish between experimental control and thecontrolled
experiment.The former is thebehaviorist's ideal, the state where some consis-
tent behaviorcan be set offand/or terminatedby manipulating precisely
specifiedvariables. On theother hand, the controlled experiment describes a
procedurein which the effect of themanipulationof the independent variable
or variables is, as it were, checked against observations undertaken in the
171
172 12. THE DESIGN OF EXPERIMENTS
absenceof themanipulation. Thisis themethod thatis employedandsupported
by the followers of theFisherian tradition. The uncontrolled variables that affect
the observationsareassumedto operatein a random fashion, changing individ-
ual behaviorin all kinds of waysso that whenthe dataareaveraged their effects
are canceledout, allowingthe effect of themanipulated variableto beseen.The
assumptionof randomnessin the influence of uncontrolled variablesis, of
course,not onethat is always easyto justify, and therelegation of important
influenceson variability to error may leadto erroneous inferences anddisastrous
conclusions.
Malaria is a disease thathas been knownand feared for centuries. It
decimatedthe Roman Empirein its final years,it wasquite widespreadin Britain
during the 17th century,andindeedit wasstill found therein the fencountryin
the 19th century.The names malaria, marsh fever, andpaludism all reflect the
view that the causeof the diseasewas thebreathingof damp, noxiousair in
swamp lands.The relationship between swamp lands and theincidenceof
malariais quite clear. The relationship between swamp lands and thepresence
of mosquitosis also clear.But it was notuntil the turn of thecentury thatit was
realized thatthe mosquitowasresponsiblefor thetransmissionof themalarial
parasite,andonly 20 years earlier, in 1880,was theparasite actually observed.
In 1879, Sir Patrick Manson(1844-1922),a physician whose work played a
role in the discovery of the malarial cycle, presenteda paper in which he
suggestedthat the diseaseelephantiasiswastransmitted throughinsectbites.
The paperwasreceived with scornanddisbelief. The evidencefor thelife cycle
of the malarial parasitein mosquitosandhuman beingsand itsbeingestablished
asthe causeof theillness camein a numberof ways- not theleast of which
was thehealthy survival, throughout the malarial season, of threeof Manson's
assistantsliving in a mosquito-proofhut in themiddle of theRoman Campagna
(Guthrie, 1946,pp. 357-358).This episodeis aninteresting exampleof the
control of a concomitantor correlated biasor effect that was thedirect causeof
the observations.
In psychological studies, some of theearliest work that used true experimen-
tal designsis that of ThorndikeandWoodworth (1901)on transferof training.
They used "before-after" designs, control group designs, and correlational
studiesin their work. However, the nowroutineinclusionof control groupsin
experimental investigations in psychology doesnot appearto have beenan
acceptednecessityuntil about50 years ago.In fact, controlled experimentation
in psychology moreor less coincidedwith the introductionof Fisherian statis-
tics, and the twoquite quickly became inseparable. Of courseit would be both
foolish andwrong to imply that early empirical investigations in psychology
were completely lackingin rigor, andmistaken conclusions rife. The point is
that it was not until the 1920s and 1930s thatthe "rules" of controlled
METHODS OF INQUIRY 173
experimentation were spelled out andappreciatedin thepsychological sciences.
The basic ruleshadbeenin existencefor many decades, having been codified
by John StuartMill (1806-1873)in a book, first publishedin 1843, thatis
usuallyreferredto as theLogic (1843/1872/1973). These formulationshadbeen
precededby anearlier British philosopher, Francis Bacon, who made recom-
mendationsfor what he thought wouldbe sound inductive procedures.
METHODS OF INQUIRY
Mill proposedfour basic methods of experimentalinquiry, and the fiveCanons,
the first of which is, "If two or more instancesof the phenomenon under
investigation have onlyone circumstancein common,the circumstancein
which alone all the instances agree, is the cause(or effect) of the given
phenomenon"(Mill, 1843/1872,8th ed., p. 390).
If observationsa, b, and c aremadein circumstancesA, B, and C, and
observationsa, d, and e incircumstances A, D, and ,thenit may beconcluded
that A causes a. Mill commented,"As this method proceedsby comparing
different instancesto ascertainin what they agree, I have termedit the Method
of Agreement"(p. 390).
As Mill points out,the difficulty with this methodis theimpossibility of
ensuringthat A is theonly antecedent of a that is commonto both instances.
The second canonis theMethodof Difference. The antecedent circumstances
A, B, and C arefollowed by a, b, and c.WhenA is absentonly b and c are
observed:
If an instancein which the phenomenon under investigation occurs, and aninstance
in which it doesnot occur, have every circumstance in commonsaveone, thatone
occurring onlyin the former; the circumstancein which alonethe two instances
differ, is the effect or the cause, or an indispensable partof the causeof the
phenomenon, (p. 391)
This method contains thedifficulty in practiceof being unableto guarantee
that it is thecrucial differencethat hasbeenfound. As part of the wayaround
this difficulty, Mill introducesa joint methodin his third canon:
If two or more instancesin which the phenomenon occurs have only one circum-
stancein common, whiletwo or more instancesin which it doesnot occur have
nothingin commonsavethe absenceof that circumstance; the circumstancein which
alonethe twosetsof instancesdiffer, is theeffect, or thecause,or an indispensable
part of thecause,of thephenomenon, (p. 396)
In 1881 Louis Pasteur (1822-1895) conducted a famous experiment that
exemplifiesthe methodsof agreement anddifference. Some30 farm animals
174 12. THE DESIGN OF EXPERIMENTS
were injectedby Pasteur witha weak cultureof anthrax virus. Later these
animalsand asimilar numberof others thathad notbeenso "vaccinated"were
given a fatal doseof anthrax virus. Withina few days the non-vaccinated
animals were deador dying, the vaccinated ones healthy. The conclusion that
wasenthusiastically drawnwasthat Pasteur's vaccination procedure hadpro-
ducedthe immunity thatwasseenin thehealthy animals. The effectivenessof
vaccinationis nowregardedas anestablished fact. But it is necessaryto guard
against incautious logic. The healthof thevaccinated animals could have been
due to some other fortuitous circumstance. Because it is known that some
animals infected with anthrax do recover,an experimental group composed of
theseresistantanimals could have resulted in a spurious conclusion. It should
be noted that Pasteur himself recognized this as apossibility.
Mill's Method of Residues proclaims that having identified by themethods
of agreementanddifferences that certain observed phenomena are theeffects
of certain antecedent conditions, the phenomena that remain are due to the
circumstancesthat remain. "Subduct from any phenomenon such part as is
known by previous inductions to be theeffect of certain antecedents, and the
residue of the phenomenonis the effect of the remaining antecedents"
(1843/1872,8th ed., p. 398).
Mill here uses a very modern argument for the use of themethodin
providing evidencefor thedebateon racial andgender differences:
Thosewho assert,what no one hasshownany real groundfor believing, that there
is in onehuman individual,onesex, or oneraceof mankindover another,an inherent
andinexplicable superiorityin mental faculties, could only substantiate their propo-
sition by subtracting fromthe differencesof intellect whichwe in fact see,all that
can betracedby known laws either to the ascertaineddifferences of physical
organization,or to thedifferences which have existed in theoutward circumstances
in which the subjectsof thecomparison have hitherto been placed. What thesecauses
might fail to accountfor, would constitutea residual phenomenon, which andwhich
alone wouldbe evidenceof an ulterior original distinction,and themeasureof its
amount. But theassertorsof such supposed differences have not provided them-
selveswith thesenecessarylogical conditionsin theestablishment of their doctrine,
(p. 429)
The final methodand the fifthcanonis theMethod of Concomitant Vari-
ations: "Whatever phenomenon varies in anymanner whenever another phe-
nomenon variesin some particular manner, is eithera causeor aneffect of that
phenomenon, or is connected withit through some fact of causation"(p. 401).
This methodis essentially that of thecorrelational study, the observationof
covariation:
Let ussupposethequestionto be,what influencethemoon exertson thesurfaceof
METHODS OF INQUIRY 175
the earth. We cannottry anexperimentin theabsenceof themoon, so as toobserve
what terrestrial phenomenon herannihilationwould put an end to; but whenwe find
that all the variationsin thepositionsof the moon arefollowed by corresponding
variationsin thetime andplaceof high water,the place always being either the part
of the earthwhich is nearestto, or that whichis most remote from, the moon, we
have ample evidence that the moonis, whol l y or partially, the cause which determines
the tides,(p. 400)
Mill maintained that these methods were in fact the rulesfor inductive logic,
that they were both methods of discoveryandmethodsof proof. His critics, then
andnow, argued against a logic of induction(see chapter 2), but it isclear that
experimentalistswill agreewith Mi l l that his methods constitute the meansby
which they gather experimental evidence for their views of nature.
The general structureof all theexperimental designs that areemployedin
the psychological sciences may beseenin Mi l l ' s methods.The applicationand
withholding of experimental treatments acrossgroups reflectthe methodsof
agreementanddifferences. The useof placebosand thesystematic attempts to
eliminatesourcesof error makeup themethodof residues,and themethodof
concomitant variationis, asalready noted, a complete descriptionof the corre-
lational study. It is worth mentioning thatMi l l attemptedto deal with the
difficulties presentedby the correlationalstudy and,in doing so, outlinedthe
basicsof multiple regressionanalysis,the mathematicsof which werenot to
comefor many years:
Suppose, then, that when A changesin quantity, a also changesin quantity,and in
sucha manner that we cantracethenumericalrelationwhich the changesof the one
bearto such changes of theotherastake placewi thi n the limits of our observation.
We maythen safely conclude that the samenumericalrelationwi l l hold beyondthose
limits. (Mill, 1843/1872,8th ed., p. 403)
Mill elaborateson this propositionandgoeson todiscussthe casewherea
is not wholly the effect of A butnevertheless varies with it:
It is probablya mathematical function not of A alone,but of A andsomething else:
its changes,for example,may besuchaswould occurif part of it remained constant,
or varied on someother principle, and theremainder variedin some numerical
relationto thevariationsof A. (p. 403)
Mill's Logic is hisprincipal work, and it may befairly castas thebook that
first describes botha justification for and themethodologyof, the social
sciences.Throughouthis works,the influenceof thephilosopherswho were,
in fact, the early social scientists, is evident. David Hartley(1705-1757)
publishedhis Observationson Man in1749. This bookis apsychology rather
176 12. THE DESIGN OF EXPERIMENTS
thana philosophy (Hartley, 1749/1966). It systematically describes associa-
tionismin a psychological context and is the firsttext that deals with physi-
ological psychology. James Mill (1773-1836),John Stuart'sfather, much
admired Hartleyand hismajor work becameone of themain source books that
Mill the elder introducedto his sonwhenhe startedhis formal educationat the
age of 3(when he learned Ancient Greek), although he did not get toformal
logic until he was 12.Another important earlyinfluencewasJeremy Bentham
(1748-1832),a reformer who preachedthe doctrine of Utilitarianism, the
essential featureof which is the notion thatthe severalandjoint effectsof
pleasureand pain governall our thoughtsandactions. LaterMill rejected strict
Benthamismandquestionedthe work of a famousandinfluential contemporary,
AugusteComte(1798-1857),whose work marksthe foundationof positivism
and ofsociology. The influenceof theideasof thesethinkerson early experi-
mental psychologyis strongandclear,but theywill not beexplored here.The
main point to bemadeis that John Stuart Mi l l was anexperimental psycholo-
gist'sphilosopher. More, he was themethodologist's philosopher. In a letter
to afriend, he said:
If there is any sciencewhich I am capableof promoting, I think it is the scienceof
scienceitself, thescienceof investigation- of method. I once heard Maurice say
... that almostall differencesof opinion when analysed, weredifferencesof method,
(quotedby Robsonin his textual introductionto theLogic, p. xlix)
And it is clear that all subsequent accounts of methodand experimental
designcan betraced backto Mi l l .
THE CONCEPTOF STATISTICAL CONTROL
The standard designfor agricultural experiments at Rothamstedin the days
before Fisher was todivide a field into a numberof plots. Each plot would
receivea different treatment, say, a different manureor fertilizer or manure/fer-
tilizer mixture. The plot that producedthe highest yield wouldbe takento be
the best,and thecorresponding treatment considered to be themost effective.
Fisher, andothers,realized that soilfertility is by nomeansuniform acrossa
largefield andthat this,aswell asother factors,canaffect the yields. In fact,
the differencesin the yields couldbe due tomany factors other thanthe
particular treatmentsand thehighest yield might be due tosome chance
combinationof these factors.The essential problemis to estimatethe magni-
tude of these chance factors - theerrors - to eliminate, for example,the
differencesin soil fertility.
Someof the first data thatFishersaw atRothamsted werethe recordsof
THE CONCEPT OF STATISTICAL CONTROL 177
daily rainfall and yearly yieldsfrom plots in the famous Broadbalk wheat
field. Fertilizershadbeen appliedto these plots, using the same pattern, since
1852. Fisher used the methodof orthogonal polynomialsto obtainfits of the
yields over time. In his paper (1921b) on these data published in 1921he
describes analysis of variance (ANOVA) for the first time.
Whenthe variationof anyquantity (variate)is producedby theactionof two or more
independentcauses,it is known that the varianceproducedby all the causes
simultaneouslyin operation is the sum of thevaluesof the variance producedby
eachcauseseparately... In Table II is shownthe analysisof thetotal variancefor
each plot, divided accordingas it may beascribed(i) to annualcauses, (ii)to slow
changesother than deterioration, (iii) to deterioration;the sixth columnshowsthe
probability of larger valuesfor the variancedue toslow changes occurring fortui-
tously. (Fisher,192 Ib, pp. 110-111)
The method of data analysis that Fisher employedwas ingeniousand
painstaking,but herealizedquickly that the data that were available suffered
from deficienciesin thedesignof their collection. Fisherset out on a newseries
of field trials.
He divided a field into blocksandsubdividedeach block into plots. Each
plot within the block wasgiven a different treatment,andeach treatment was
assignedto each plot randomly. This, asBartlett (1965) putsit, was Fisher's
"vital principle."
When statisticaldataarecollectedasnatural observations, themost sensible assump-
tions aboutthe relevant statistical model have to beinserted. In controlled experi-
mentation, however, randomness could be introduced deliberately into the design,
so that any systematic variability other than [that] due toimposed treatments could
be eliminated.
The secondprinciple Fisher introduced naturally went with the first. With
statistical analysisgearedto thedesign,all variability not ascribedto theinfluence
of treatmentsdid not have to inflate the random error. With equal numbersof
replicationsfor thetreatments each replication could be containedin a distinct block,
and only variability among plotsin thesame block werea sourceof error - that
between blocks couldbe removed. (Bartlett, 1965, p. 405)
The statistical analysis allowed for aneven more radical break with tradi-
tional experimental methods:
No aphorismis morefrequentlyrepeatedin connectionwith field trials, than thatwe
must askNature few questions,or, ideally, onequestion,at atime. The writer is
convinced that this viewis wholly mistaken. Nature, he suggests,will best respond
to alogical andcarefully thoughtout questionnaire; indeed, if we ask her asingle
question,shewill often refuseto answeruntil some other topichasbeendiscussed.
(Fisher, 1926b,p. 511)
178 12. THE DESIGN OF EXPERIMENTS
Fisher's"carefully thoughtout questionnaire"was thefactorial design. All
possible combinationsof treatments wouldbe applied with replications.For
example,in theapplicationof nitrogen (N), phosphate (P), andpotash(K) there
would be eight possible treatment combinations: no fertilizer, N, P, K, N & P,
N & K, P & K, and N & P & K. Separate compact blocks would be laid out and
these combinations would be randomly appliedto plots within eachblock. This
design allowsfor anestimationof themain effects of the basicfertilizers, the
first-order interactions (theeffect of two fertilizers in combination), and the
second-order interaction(theeffect of thethree fertilizersin combination).The
1926(b)papersetsout Fisher'srationalefor field experimentsandwas,as he
noted, the precursorof his book, TheDesign of Experiments(1935/1966),
published9 years later.The paperis illustratedwith a diagram(Fig. 12.1)of a
"complex experiment with winter oats" that had been carriedout with a
colleagueat Rothamsted (Eden & Fisher, 1927).
Here 12 treatments,including absenceof treatments- the "control" plots -
were tested.
FIG. 12.1 Fisher's Design 1926. Journal of the Ministry
of Agriculture
THE CONCEPT OF STATISTICAL CONTROL 179
Any generaldifference between sulphate and chloride, betweenearly and late
application, or ascribableto quantity of nitrogenous manure, can bebasedon
thirty-two comparisons, each of which is affected by such soil heterogeneityas
existsbetweenplots in thesameblock. To makethesethreesetsof comparisons
only, with the sameaccuracy,by single question methods, would require 224plots,
againstour 96; but inaddition many other comparisons can bemade with equal
accuracy,for all combinationsof thefactorsconcernedhave been explored. Most
important of all, theconclusions drawn fromthesingle-factor comparisonswill be
given, by thevariation of non-essential conditions, a very much wider inductive
basisthan could be obtained,by single question methods, without extensive
repetitions of theexperiment. (Fisher, 1926b, p. 512)
The algebraand thearithmeticof theanalysisaredealt within thefollowing
chapter.The crucial pointof this work is thecombinationof statistical analysis
with experimental design. Part of the stimulus for this paperwas SirJohn
Russell's (1926) article on field experiments, which hadappearedin the same
journal just months earlier. Russell's review presents the orthodox approachto
field trials andadvocatedcarefully planned, systematic layouts of the experi-
mental plots.Sir JohnRussellwas theDirector of theRothamsted Experimental
Station,he hadhired Fisher,and he wasFisher's boss, but Fisher dismissedhis
methodology.In a footnotein the 1926(b) paper, Fisher says:
This principlewasemployedin anexperimenton theinfluenceof theweatheron the
effectivenessof phophatesandnitrogen alludedto by SirJohn Russell.The author
must disclaimall responsibilityfor thedesignof this experiment, whichis, however,
a goodexampleof its class. (Fisher, 1926b,p. 506)
And asFisherBox (1978) remarks:
It is a measureof the climate of the times that Russell, an experiencedresearch
scientist who . . . had had the wisdom to appoint Fisher statisticianfor the better
analysisof theRothamsted experiments, did notdefer to theviews of his statistician
when he wrote on howexperiments were made. Design was, in effect, regardedas
an empirical exerciseattemptedby theexperimenter;it was not yet thedomainof
statisticians,(p. 153)
In fact the statistical analysis, in a sense, arises from the design. Nowadays,
whenANOVA is regardedasefficient androutine,the various designs that are
availableandwidely usedaredictatedto us by theknowledge that the reporting
of statistical outcomes andtheir related levels of significance is thesinequa non
of scientific respectabilityandacceptabilityby thepsychological establishment.
Historically, the newmethodsof analysis camefirst. Theconfounds, defects,
andconfusionsof traditionaldesigns became apparent when ANOVA wasused
to examinethe dataand so newdesigns were undertaken.
180 12. THE DESIGN OF EXPERIMENTS
Randomizationwas demandedby thelogic of statistical inference. Esti-
matesof error andvalid testsof statisticalsignificancecanonly be made when
the assumptions that underlie the theoryof sampling distributionsareupheld.
Put crudely, this means that "blind chance"should not berestrictedin the
assignmentof treatmentsto plots, or experimental groups.It is, however,
important to note that randomization does not imply that no restrictionsor
structuringof thearrangementswithin a designarepossible.
Figure12.2 showstwo systematic designs: (a) ablock designand (b) aLatin
squaredesign,and tworandomized designs of thesame type, (c) and(d). The
essentialdifference is that chance determines the applicationof the various
treatments applied to theplots in thelatter arrangements, but therestrictionsare
apparent.In the randomizedblock and in therandomizedLatin square, each
block containsonereplicationof all thetreatments.
The estimateof error is valid, because,if we imaginea large numberof different
results obtainedby different random arrangements, the ratio of the real to the
estimatederror, calculated afreshfor eachof these arrangements, will be actually
distributed in the theoreticaldistribution by which the significanceof the result is
tested.Whereasif a groupof arrangementsis chosen such that thereal errorsin this
groupare on thewhole less than those appropriate to random arrangements, it has
now beendemonstratedthat the errors,asestimated,will, in sucha group,behigher
thanis usualin random arrangements, andthat,in consequence, within sucha group,
the testof significanceis vitiated. (Fisher, 1926b, p. 507)
Treatments A Standard Latin Square
1 2 3 4 5
Block 1 A B C D E A B C D
Block 2 A B C D E B A D C
Blocks A B C D E C D B A
Block 4 A B C D E D C A B
Blocks A B C D E
(a) (b)
Treatments A Random Latin Square
1 2 3 4 5
Block 1 D C E A B D A C B
Block 2 A D B C E C B D A
Block3 B A E C D B D A C
Block 4 E D C A B A C B D
Blocks B A D E C
(c) (d)
FIG. 12.2Experimental Designs
THE LINEAR MODEL 181
Fisherlater examinestheutility of theLatin square design, pointing out that
it is by far themostefficient andeconomical for "thosesimple typesof manurial
trial in which every possible comparison is of equal importance"(p. 510). In
1925andearly 1926, Fisher enumerated the 5 x 5 and 6x6squares,and in the
1926 paperhe madeanoffer that undoubtedly helped to spreadthe name,and
the fame,of theRothamsted Station to many partsof theworld:
The Statistical Laboratoryat Rothamstedis preparedto supply these,or other typesof
randomized arrangements, to intending experimenters; this procedure is consideredthe more
desirable sinceit is only too probable that new principleswill, at their inception,be, insome
detail or other, misunderstood andmisapplied; a consequence for which their originator,who
hasmade himself responsible for explaining them, cannot be held entirely freefrom blame.
(Fisher, 1926b,pp. 510-511)
THE LINEAR MODEL
Fisher described ANOVAas a way of"arrangingthe arithmetic" (Fisher Box,
1978, p. 109), an interpretationwith which not a fewstudents would quarrel.
However,the description does point to thefact that the componentsof variance
are additive andthat this propertyis anarithmeticalone and notpart of the
calculusof probability andstatisticalinferenceassuch.
The basic construct that marks the culminationof Fisher's workis that of
specifyingvaluesof anunknown dependent variable, >>, in termsof a linear set
of parameters, eachone ofwhich weightsthe several independent variables jc,,
jc
2
, #
3
, . . . , jt
n
, thatareusedfor prediction, together with an error component 8
that accountsfor the randomfluctuationsin y for particularfixed valuesof A:,,
x
2
, J C
3
, . . . , x
n
. In algebraic terms,
As we noted earlier, the random component in themodeland thefact thatit
is sample-based make it a probabilistic model,and thepropertiesof the
distributionof this component, real or assumed, govern the inferences that may
be made about the unknown dependent variable. Fisher's work is thecrucial
link betweenclassical least squares analysis andregression analysis.
As Seal (1967) notes, "The linear regression model owes so muchto Gauss
that we believeit should bearhis name"(p. 1).
However, thereis little reasonto suppose that this will happen. Twentyyears
agoSeal found that veryfew of thestandard textson regression,or thelinear
model, or ANOVA made more thana passing referenceto Gauss,and the
situationis little changed today. Some of thereasonsfor this have alreadybeen
182 12. THE DESIGN OF EXPERIMENTS
mentionedin this book.The European statisticians of the18th and19th centuries
were concerned with vital statistics andpolitical arithmetic,andinferenceand
predictionin themodern sense were, generally speaking, a long way off in these
fields. The mathematicsof Legendreand Gaussand otherson thetheory of
errorsdid notimpingeon thework of thestatisticians. Perhaps more strikingly,
the early links between social andvital dataanderror theory that were made by
LaplaceandQuetelet were largely ignored by Karl PearsonandRonald Fisher.
Why, then,could not theTheory of Errors beabsorbed intothe broaderconceptof
statisticaltheory... ? ... Theoriginal reasonwasPearson'spreoccupation withthe
multivariate normal distributionand itsparameters.Thepredictiveregressionequa-
tion of his pathbreaking'regression'paper (1896) was notseento beidentical in
form and solution to Gauss'sTheoria Motus (1809) model. R. A. Fisher and his
associates... wererediscovering manyof themathematical results of leastsquares(or
error) theory,apparentlyagreeingwith Pearsonthat this theoryheld little interestto
the statistician. (Seal, 1967, p. 2)
There mightbe other more mundane, nonmathematical reasons. Galton and
others were strongly opposed to the use of theword error in describingthe
variability in human characteristics, and themany treatiseson thetheory might
thushave been avoided by the newsocial scientists, who were,in themain,not
mathematicians.
In his 1920 paperon the history of correlation, Pearsonis clearly most
anxiousto downplayany suggestion that Gaussian theory contributed to its
development.He writes of the"innumerabletreatises"(p. 27) onleast squares,
of the lengthy analysis, of his opinion that Gaussand Bravais "contributed
nothingof real importanceto theproblemof correlation"(p. 82), and ofhis view
that it is not clear thata least squares generalization "indicates the real lineof
future advance"(p. 45). The generalizationhadbeen introduced by Yule, who
Pearsonand hisGower Street colleagues clearly saw as theenemy. Pearson
regardedhimself as thefather of correlation and regression insofar as the
mathematics were concerned. Galton andWeldon were,of course, recognized
asimportant figures, but they werenot mathemaliciansandposedno threatto
Pearson'sauthorily. In other respects, Pearsonwasdriven to try to show that
his contributions were supreme andindependent.
The historical recordhasbeen tracedby Seal (1967),from the fundamental
work of LegendreandGaussat thebeginningof the 19th centuryto Fisher over
100 years later.
THE DESIGN OF EXPERIMENTS
A lady declaresthat by tastinga cup of teamade with milkshe candiscriminate
whetherthemilk or the teainfusion was firstaddedto thecup. We will considerthe
THE DESIGN OF EXPERIMENTS 183
problemof designinganexperimentby meansof which thisassertioncan betested.
(Fisher, 1935/1966,8th ed., p. 11)
With thesewords Fisher introduces the example that illustrated his view of the
principlesof experimentation. Holschuh (1980) describes it as"the somewhat
artificial 'lady tasting tea' experiment" (p. 35), andindeedit is, but perhapsan
American writerdoesnot appreciatethe fervor of the discussionon thebest
methodof preparing cupsof teathatstill occupiesthe British! FisherBox (1978)
reports thatan informal experiment wascarriedout atRothamsted.A colleague,
Dr B. Muriel Bristol, declineda cup of teafrom Fisheron thegrounds that she
preferredone towhich milk had firstbeen added.Her insistence that the order
in which milk and teawere poured intothe cupmadea differenceled to a
lightheartedtestactually being carried out.
Fisher examinesthe designof suchan experiment. Eight cups of tea are
prepared. Four of them haveteaaddedfirst andfour milk. The subjectis told
that thishasbeen doneand thecupsof tea arepresentedin a randomorder. The
task is, of course,to divide the set ofeight intotwo setsof four accordingto the
methodof preparation. Because there are 70waysof choosinga set of 4objects
from 8:
A subject withoutany faculty of discrimination wouldin fact dividethe 8 cups
correctly intotwo setsof 4 in onetrial out of 70, or,more properly, witha frequency
which would approach1 in 70more andmore nearlythe more oftenthe test were
repeated.. . . Theoddscould be made much higher by enlargingthe experiment,
while if the experiment were much smaller even thegreatest possiblesuccesswould
give oddsso lowthat theresult, might with considerable probability, be ascribedto
chance.(Fisher, 1935/1966,8th ed., pp. 12-13)
Fishergoeson to saythat it is "usualandconvenient to take5 percent,as a
standard level of significance,"(p. 13) and so anevent that would occur by
chance oncein 70 trials is decidedlysignificant. The crucial pointfor Fisheris
the act of randomization:
Apart, therefore, fromthe avoidable errorof theexperimenter himself introducing
with his testtreatments,or subsequently, other differences in treatment,the effects
of which the experiment is not intendedto study, it may besaid thatthe simple
precaution of randomisation will suffice to guaranteethe validity of the test of
significance, by which the result of the experiment is to bejudged. (Fisher,
1935/1966,8th ed., p. 21)
This is indeedthe crucial requirement. Experimental design when variable
measurementsarebeing made,andstatistical methodsare to beusedto tease
out theinformation from the error, demands randomization. But there are
184 12. THE DESIGN OF EXPERIMENTS
ironiesherein this, Fisher's elegant account of thelady tastingtea.It hasbeen
hailedas themodel for thestatisticalinferential approach:
It demandsof thereadertheability to follow a closelyreasonedargument,but it will
repay the effort by giving a vivid understandingof the richness, complexityand
subtlety of modern experimental method. (Newman, 1956, Vol.3, p. 1458)
In fact, it usesa situation and amethod that Fisher repudiated elsewhere.
More discussionof Fisher's objections to theNeyman-Pearson approach are
given later. For themoment,it might be noted that Fisher's misgivings center
on theequationof hypothesis testing with industrial quality control acceptance
procedures wherethe population being sampled has anobjective reality,and
that populationis repeatedly sampled. However, the tea-tasting example appears
to follow this model! Kempthorne (1983) highlightedproblemsandindicated
the difficulties that so many havehad inunderstandingFisher'spronounce-
ments.
In his book Statistical Methods and Scientific Inference,first publishedin
1956, Fisher devotes a whole chapterto "Some Misapprehensions about Tests
of Significance." Herehe castigatesthe notion that 'thelevel of significance'
should be determinedby "repeatedsamplingfrom the same population",
evidentlywith no clear realization that the populationin questionis hypotheti-
cal" (Fisher,1956/1973,3rd ed.,pp. 81-82).
He determinesto illustrate "the more generaleffects of the confusion
betweenthe level of significanceappropriately assigned to aspecific test, with
the frequencyof occurrenceof a specifiedtype of decision" (Fisher, 1956/1973,
3rd ed.,p. 82). He states,"In fact, as amatterof principle, the infrequencywith
which, in particular circumstances, decisive evidence is obtained, shouldnot be
confusedwith the force,or cogencyof suchevidence" (Fisher, 1956/1973,3rd
ed., p. 96).
Kempthorne(1983), whose perceptions of both Fisher's genius andincon-
sistenciesare ascogentandilluminating as onewould find anywhere, wonders
if this book'slack of recognitionof randomizationarose because of Fisher's
belated, but of course not admitted, recognition that it did not mesh with
"fiduciating." Kempthorne quotes a "curious" statementof Fisher'sandcom-
ments, "Well,well!" A slightly expanded version is given here:
Whereasin the"Theoryof Games"a deliberately randomized decision (1934) may
often be usefulto give an unpredictable element to thestrategyof play; andwhereas
plannedrandomization(1935-1966)is widely recognizedasessential in theselection
andallocation of experimental material, it has nouseful part to play in the formation
of opinion, andconsequentlyin testsof significance designedto aid theformation
of opinion in theNatural Sciences.(Fisher, 1956/1973,3rd ed.,p. 102)
THE DESIGN OF EXPERIMENTS 185
Kendall (1963) wishes that Fisher hadnever writtenthe book, saying,"If
we had tosacrificeany of hiswritings, [this book] would havea strong claim
to priority" (p. 6).
However he didwrite the book, and heusedit to attackhis opponents.In
marshalinghis arguments,he introduced inconsistencies of both logic and
method that haveled to confusion in lesser mortals.Karl Pearsonand the
biometricians used exactly the same tactics.In chapter15 theview is presented
that it is this rather sorry state of affairs that has led to thehistorical development
of statistical procedures, as they are usedin psychologyand thebehavioral
sciences, being ignored by thetexts that made them available to awider, and
undoubtedlyeager,audience.
13
AssessingDifferences
and Having Confidence
FISHERIAN STATISTICS
Any assessment of theimpactof Fisher's arrival on thestatistical battlefieldhas
to recognizethat his forcesdid notreally seekto destroy totally Pearson's work
or its raison d'etre. The controversy betweenYule and Pearson, discussed
earlier,had aphilosophical,not to sayideological, basis.If, at theheightof the
conflict, one or other sidehad "won," then it is likely that the techniques
advocatedby thevanquishedwould have been discarded andforgotten. Fisher's
war wasmore territorial. The empireof observationandcorrelationhad to be
takenover by themanipulationsof experimenters. Although he would never
have openly admitted it - indeed,he continuedto attack Pearson and hisworks
to the very end of hislife (which came26 yearsafter the end ofPearson's)-
the paradigmsandprocedureshe developeddid indeed incorporate andimprove
on the techniques developed at Gower Street.The chi-square controversy was
not adispute about the utility of thetest or its essential rationale, but abitter
disagreement over theefficiency andmethodof its application. For anumber
of reasons, which havebeendiscussed, Fisher's views prevailed. He wasright.
In the late 1920sand 1930s Fisher was at theheight of his powersand
vigorously forging ahead. Pearson, although still a man to bereckoned with,
was nearly30 years awayfrom his best work,an old manfacing retirement,
rather isolatedas heattackedall thosewho werenot unquestioninglyybrhim.
Last, but by nomeans least, Fisher was thebetter mathematician. He had an
intuitive flair that broughthim to solutionsof ingenuityand strength.At the
sametime, he wasable to demonstrateto thecommunityof biological and
behavioral scientists, a communitythat so desperately needed a coherent system
of data management andassessment, that his approachhadenormous practical
utility.
186
THE ANALYSIS OF VARIANCE 187
Pearson'swork may becharacterizedas large sampleand correlational,
Fisher'sassmall sampleandexperimental. Fisher's contribution easily absorbs
the best of Pearsonandexpandson theseminal workof "Student." Assessing
the import and significanceof the variation in observationsacrossgroups
subject to different experimental treatmentsis theessenceof analysis of
variance, and ahaphazard glanceat any research journalin the field of
experimental psychology attests to its impact.
THE ANALYSIS OF VARIANCE
The fundamental ideasof analysisof varianceappearedin the paper that
examined correlation among Medelian factors (Fisher, 1918). At this time,
eugenic researchwasoccupying Fisher's attention. Between 1915 and 1920,
he published halfa dozen papers that dealt with matters relevant to this interest,
an interest that continued throughout his life. The 1918 paper uses the term
variance for f ai
2
+ a
2
2
J, ai and a
2
representingtwo independent causes of
variability, anareferreato thenormally distributed population.
We may nowascribeto theconstituentcausesfractions or percentagesof thetotal
variancewhich they togetherproduce. It is desirableon the onehand that the
elementaryideas at the basis of the calculus of correlationsshould be clearly
understood,andeasilyexpressedin ordinary language,and on theotherthat loose
phrasesaboutthe "percentageof causation,"which obscurethe essentialdistinction
betweenthe individual and thepopulation,should be carefully avoided. (Fisher,
1918,pp. 399-400)
Here we seeFisher already moving away from Pearsonian correlational
methodsassuchandappealingto theGaussianadditivemodel. Unlike Pear-
son'swork, it cannotbe said thatthe particular philosophy of eugenics directly
governed Fisher's approach to newstatistical techniques, but it is clear that
Fisheralways promotedthe valueof themethodsin genetics research (see, e.g.,
Fisher, 1952). What the newtechniques wereto achievewas arecognitionof
the utility of statistics in agriculture,in industry,and in thebiological and
behavioral sciences,to anextent that couldnot possibly have been foreseen
beforeFisher cameon thescene.
The first published account of anexperiment that used analysis of variance
to assessthe datawasthat of FisherandMacKenzie (1923)on TheManurial
Responseof Different Potato Varieties.
Two aspectsof this paperare ofhistorical interest. At that time Fisherdid not fully
understandtherulesof theanalysisof variance- his analysisis wrong - nor therole
of randomization.Secondly,although the analysisof variance is closelytied to
188 13. ASSESSING DIFFERENCES AND HAVING CONFIDENCE
additive models, Fisher rejects the additive modelin his first analysisof variance,
proceedingto amultiplicative modelasmore reasonable. (Cochrane, 1980, p. 17)
Cochrane pointsout that randomizationwas notusedin thelayout andthat
an attempt to minimize error usedan arrangement that placed different treat-
ments nearoneanother. The conditions couldnot providean unbiased estimate
of error. Fisher thenproceedsto ananalysis basedon amultiplicative model:
Rather surprisingly, practically all of Fisher's later workon theanalysisof variance
usesthe additive model. Later papers give no indicationas to why theproduct model
was dropped. Perhaps Fisher found, as 1did, that the additive modelis a good
approximation unless main effects arelarge,aswell asbeing simplerto handle than
the product model. (Cochrane, 1980, p. 21)
Fisher'sderivation of the procedureof analysisof variance and hisunder-
standingof theimportanceof randomizationin theplanningof experimentsare
fully discussedin Statistical Methodsfor Research Workers (1925/1970), first
publishedin 1925. This work is nowexaminedin more detail.
Over 45 yearsand 14editions,the general character of the book did not
change.The 14th editionwas publishedin 1970, usingnotesleft by Fisher at
the time of his death. Expansions, deletions, andelaborationsareevident over
the years.NotableareFisher's increasing recognition of thework of othersand
greaterattentionto thehistorical account. Fisher's concentration on his rowwith
the biometriciansastime wentby is also evident.The prefaceto thelast edition
follows earlier onesin stating thatthe book was aproductof theresearch needs
of Rothamsted. Further:
It wasclear thatthe traditional machinery inculcated by thebiometrical schoolwas
wholely unsuitedto the needsof practical research. The futile elaborationof
innumerablemeasuresof correlation, and theevasionof the real difficulties of
sampling problems under cover of a contemptfor small samples, were obviously
beginningto makeits pretensionsridiculous. (Fisher, 1970, 14th ed., p. v)
The opening sentence of thechapteron correlationin the firstedition reads:
No quantity is more characteristic of modern statistical work than the correlation
coefficient, and nomethodhasbeen appliedsuccessfullyto such various data as the
methodof correlation. (Fisher, 1925, 1st ed., p. 129)
and in the14th edition:
No quantity hasbeen more characteristic of biometrical work thanthe correlation
coefficient, and nomethodhasbeen appliedto such various data as themethodof
correlation. (Fisher, 1970, 14thed., p. 177)
THE ANALYSIS OF VARIANCE 189
This not-so-subtle change is reflectedin thedivisionsin psychology that arestill
evident. The twodisciplines discussed by Cronbachin 1957 (see chapter 2) are
thoseof thecorrelationalandexperimental psychologists.
In his opening chapter, Fisher sets out thescopeanddefinition of statistics.
He notes that theyareessential to social studies andthat it is becausethe methods
areused there that "thesestudiesmay beraisedto therank of sciences"(p. 2).
The conceptsof populationsandparameters,of variationandfrequency distri-
butions,of probability and likelihood, and of thecharacteristicsof efficient
statistics are outlined very clearly.A short chapteron diagrams ought to be
required reading, for it pointsup howuseful diagramscan be in theappraisalof
data.
1
The chapteron distributions deals with the normal, Poisson, andbinomial
distributions.Of interestis theintroductionof the formula:
(Fisher usesS for summation) for variance, noting that s is thebest estimate
of a.
Chapter4 deals withtestsof goodness-of-fit, independence, and homoge-
neity, givinga complete description of theapplicationof the X
2
tests,including
Yates'correctionfor discontinuityand theprocedurefor what is nowknown as
the Fisher Exact Test. Chapter 5 is ontestsof significance,about which more
is said later. Chapter 6 managesto discuss,quite thoroughly,the techniquesof
interclass correlation without mentioningPearsonby name except to acknow-
ledge thatthe dataof Table31 arePearsonandLee's. Failureto acknowledge
the work of others,which was acharacteristicof both PearsonandFisher,and
which, to some extent, arose out of both spiteand arrogance,at least partly
explainsthe anonymous presentation of statistical techniques that is to befound
in the modern textbooksandcommentaries.
And then chapter7, two-thirdsof the waythroughthe book, introduces that
most importantand influential of methods- analysis of variance. Fisher
describesanalysisof varianceas"the separationof thevarianceascribableto
one group of causesfrom the variance ascribableto other groups" (Fisher,
1925/1970,14th ed., p. 213),but heexaminesthe development of thetechnique
from a considerationof the intraclass correlation. His exampleis clear and
worth describing. Measurements from n' pairsof brothersmay betreatedin
two ways in a correlational analysis. The brothersmay be divided into two
Scatterplots,that so quickly identify the presenceof "outliers," are critical in correlational
analyses.Thegeometrical explorationof thefundamentalsof variance analysis provides insights which
cannotbematched (see, e.g., Kempthorne, 1976).
1 90 13. ASSESSING DIFFERENCES AND HAVING CONFIDENCE
classes, say,the elder brotherand theyounger,and theusual interclass correla-
tion on some measured variable may becalculated. When, on theother hand,
the separationof thebrothers intotwo classesis either irrelevant or impossible,
thena common meanandstandard deviation and anintraclass correlationmay
be computed.
Given pairsof measurements, x\ ,x'\ ; x
2
, x
r
2 ; *3 , x'
3
; . . .x
n
> , x'
n
>
the following statisticsmay becomputed:
In the preceding equations, Fisher's S hasbeen replaced with S and r,used
to designatethe intraclass correlation coefficient. The computationof r, is very
tedious,as thenumberof classesk and thenumberof observationsin each class
increases. Each pair of observationshas to beconsidered twice, (;ci , x'1) and
(x'1 , x1) for example. A set of kvalues givesk (k - 1) entriesin a symmetrical
table. "To obviate thisdifficulty Harris [1913] introducedan abbreviated
method of calculationby which the value of the correlation givenby the
symmetricaltablemay beobtained directlyfrom two distributions" (Fisher,
1925/1970, 14th ed., p. 216).In fact:
Fisher goeson to discussthe sampling errorsof the intraclass correlation
and refers themto his z distribution. Figure 13.1 shows the effect of the
transformationof r to z.
Curvesof very unequal varianceare replacedby curves of equal variance, skew
curvesby approximately normal curves, curves of dissimilar formby curvesof
similar form. (Fisher,1925/1970,14th ed.,p. 218)
The transformationis given by:
Fisherprovides tablesof the r to ztransformation.After giving an example
FIG. 13.1 r to z transformation (from Fisher, Statistical
Methods for Research Workers).
191
192 13. ASSESSING DIFFERENCES AND HAVING CONFIDENCE
of the use of thetableandfinding the significanceof theintraclass correlation,
conclusionsmay bedrawn. Becausethe symmetrical table does not give the
best estimateof thecorrelation,a negative biasis introduced intothe valuefor
z andFisher showshow this may becorrected. Fisher then shows that intraclass
correlationis anexampleof theanalysisof variance:"A very great simplifica-
tion is introduced into questionsinvolving intraclass correlation whenwe
recognisethat in such casesthe correlation merely measures the relative
importanceof two groupsof variation" (Fisher, 1925/1970, 14th ed., p. 223).
Figure 13.2is Fisher's general summary table, showing, "in the last column,
the interpretationput upon each expression in thecalculationof an.intraclass
correlationfrom asymmetricaltable" (p. 225).
FIG. 13.2 ANOVA summary table 1 (from Fisher's
Statistical Methods for Research Workers
A quantity madeup of two independentlyandnormally distributed parts
with variancesA and Brespectively,has atotal varianceof (A + B). A sample
of n' valuesis takenfrom the first part anddifferent samplesof k valuesfrom
the second part added to them. Fisher notes that in thepopulationfrom which
the valuesaredrawn, the correlation between pairs of membersof the same
family is:
and thevaluesof A and B may beestimatedfrom the set of kn'observations.
The summary tableis then presented again (Fig. 13.3) andFisher pointsout that
"the ratio betweenthe sums of squaresis altered in the ratio n ' : ( n ' ) ,
THE ANALYSIS OF VARIANCE 193
FIG. 13.3 ANOVA summary table 2 (from Fisher,
Statistical Methods for Research Workers
which precisely eliminatesthe negative bias observedin z derived by the
previous method" (Fisher, 1925/1970, 14th ed., p. 227).
The generalclassof significancetestsapplied hereis that of testingwhether
an estimateof variance derivedfrom n1 degreesof freedomis significantly
greater thana second estimate derived from n
2
degreesof freedom. The
significance may beassessedwithout calculatingr. The valueof z may be
calculatedas
}
/2 log
e
\(n' - 1)(kA + B) - n'(k- 1)B|. Fisher provides tables of
the zdistributionfor the 5% and 1%points. In thelater editionsof thebook he
notes that these values were calculated from the corresponding values of the
variance ratio, e
12
, andrefersto tablesof these values prepared by Mahalonobis,
usingthe symbolx in 1932,andSnedecor, using the symbolF in 1934. In fact:
Z =
l
/2\OgeF
"The wide use in theUnited Statesof Snedecor's symbol has led to the
distributionbeingoften referredto as thedistributionof F " (Fisher, 1925/1970,
14th ed., p. 229). Fisher ends the chapterby giving a numberof examplesof the
useof themethod.
It shouldbe mentioned here that the detailsof thehistory of therelationship
of ANOVA to intraclass correlationis aneglected topicin almostall discussions
of the procedure.A very useful referenceis Haggard (1958).
The final twochaptersof thebook discussfurther applicationsof analysis
of varianceandstatistical estimation. Of most interest here is Fisher's demon-
strationof the way inwhich the techniquecan beusedto test the linear model
andthe "straightness"of theregression line. The methodis thelink between
least squares andregression analysis. Also of importanceis Fisher's discussion
194 13. ASSESSING DIFFERENCES AND HAVING CONFIDENCE
of Latin square designs and theanalysisof covariancein improving the effi-
ciency andprecisionof experiments.
Fisher Box notes that the book did notreceivea single good review.An
example, which reflected the opinionsof many,was thereviewerfor theBritish
Medical Journal:
If hefearedthat he waslikely to fall betweentwo stools,to producea book neither
full enoughto satisfythoseinterestedin its statistical algebranor sufficiently simple
to pleasethosewho dislike algebra,we think Mr. Fisher'sfearsarejustified by the
result. (Anonymous, 1926, p. 815)
Yates (1951) comments on these early reviews, noting that many of them
expressed dismay at thelack of formal mathematical proofs andinterpretedthe
work asthoughit was only of interestto thosewho were involvedin small
sample work. Whatever its receptionby thereviewers,by 1950, whenthe book
was in its11th edition, about 20,000 copies hadbeen sold.
But it is fair to saythat something like10 years went by from the dateof the
original publication before Fisher's methods really startedto havean effect on
the behavioral sciences. Lovie (1979) traces its impact overthe years 1934to
1945. He mentions,as doothers,that the early textbook writers contributed to
its acceptance. Notable here are theworks of Snedecor, published in 1934and
1937. Lush (1972) quotes a European researcher who told him, "Whenyou see
Snedecoragain, tellhim that over herewe say, 'ThankGod for Snedecor; now
we canunderstand Fisher' " (Lush, 1972,p. 225).
Lindquist publishedhis Statistical Analysisin Educational Researchin
1940, andthis book, too,waswidely used. Eventhen, some authorities were
skeptical,to say theleast. CharlesC. Petersin aneditorial for theJournal of
Educational Research rather condescendingly agreesthat Fisher'sstatisticsare
"suitable enough"for agricultural research:
And occasionallythesetechniqueswill beuseful for rough preliminary exploratory
researchin other fields, including psychology andeducation. But if educationists
andpsychologists, out of somesort of inferiority complex, grabindiscriminately at
them andemploy them where they areunsuitable, education andpsychology will
suffer anotherslump in prestigesuch as they have often hitherto sufferedin
consequenceof thepursuitof fads. (Peters, 1943, p. 549)
That Peters'conclusion partly reflects the situationat that time,but some-
what missesthe mark, is best evidencedby Lovie's (1979) survey, which we
look at in thenext chapter.Fifty or sixty years yearson, sophisticated designs
andcomplex analyses arecommonin the literaturebut misapprehensions and
misgivingsarestill to befound there. The recipesareenthusiastically applied
but their structureis notalways appreciated.
MULTIPLE COMPARISON PROCEDURES 195
If the early workers reliedon writers like Snedecorto help them with
Fisherianapplications, later ones areindebtedto workers like Eisenhart (1947)
for assistance withthe fundamentalsof themethod. Eisenhart setsout clearly
the importanceof theassumptionsof analysisof variance, their critical function
in inference,and therelative consequences of their not beingfulfilled. Eisen-
hart'ssignificant contributionhasbeen pickedup andelaboratedon by sub-
sequent writers, but hisaccount cannot be bettered.
He delineatesthe twofundamentallydistinct classesof analysisof variance
- what are nowknown as thefixed andrandom effects models. Thefirst of
theseis the most familiarto researchersin psychology. Herethe task is to
determinethe significanceof differences among treatment means: "Testsof
significanceof employedin connection with problems of this classaresimply
extensionsto small samplesof thetheoryof least squares developed by Gauss
andothers- theextensionof the theoryto small samples beingdueprincipally
to R. A. Fisher" (Eisenhart, 1947, pp. 3-4).
The secondclassEisenhart describes as thetrue analysisof variance. Here
the problemis one ofestimating,andinferring the existenceof, the components
of variance,"ascribableto random deviation of thecharacteristicsof individuals
of a particular generic typefrom the meanvalueof these characteristics in the
'population' of all individualsof that generictype" (Eisenhart, 1947, p. 4).
The failure of thethen current literature to adequately distinguish between
the twomethodsis becausethe emphasishadbeenon testsof significancerather
thanon problemsof estimation.But it would seem that, despite the best efforts
of the writers of themost insightful of the nowcurrent texts(e.g.,Hays, 1963,
1973),the distinctionis still not ful l y appliedin contemporary research. In other
words, the emphasisis clearly on theassessment of differencesamong pairsof
treatment means rather than on therelativeandabsolute sizeof variances.
Eisenhart's discussion of theassumptionsof thetechniquesis amodel for
later writers. Random variation, additivity, normality of distribution, homoge-
neity of variance,and zero covariance among the variablesarediscussedin
detail andtheir relative importance examined. This work can beregardedas a
classicof its kind.
MULTIPLE COMPARISON PROCEDURES
In 1972, Maurice Kendall commented on howregrettableit wasthat duringthe
1940s mathematicshad begunto "spoil" statistics. Nowhereis the shift in
emphasisfrom practice, withits room for intuition andpragmatism,to theory
andabstraction more evident than in the areaof multiple comparisonproce-
dures.The rulesfor making such comparisons have been discussed ad nauseam,
andthey continueto bediscussed. Among the more completeandilluminating
196 13. ASSESSING DIFFERENCES AND HAVING CONFIDENCE
accountsarethoseof Ryan (1959)andPetrinovichandHardyck (1969). Davis
andGaito (1984)providea very useful discussionof some of the historical
background.In commentingon Tukey 's (1949) intentionto replacethe intuitive
approach (championed by "Student")with some hard, cold facts, and toprovide
simpleanddefinite proceduresfor researchers, they say:
[This is] symptomaticof thetransitionin philosophyandorientationfrom the early
use ofstatisticsas apractical andrigorousaid for interpretingresearchresults,to a
highly theoreticalsubject predicatedon theassumptionthat mathematical reasoning
wasparamountin statistical work. (Davis& Gaito, 1984,p. 5)
It is alsothe case that the automaticinvoking, from the statistical packages,
of any one ofhalf a dozen procedures following an Ftest hashelpedto promote
the emphasison thecomparisonof treatment means in psychological research.
No onewould argue withthe underlying rationaleof multiple comparison
procedures. Given that the taskis to compare treatment means, it is evident that
to carry out multiple t testsfrom scratchis inappropriate. Over the long run it
is apparent thatas larger numbersof comparisonsare involved, usinga
procedure that assumes that the comparisonsarebasedon independent paired
datasetswill increasethe Type I error rateconsiderably whenall possible
comparisonsin a given set ofmeansaremade. Put simply, the numberof false
positiveswill increase.As Davis andGaito (1984) point out, witha set at the
.05 level and H
0
true, comparisons, usingthe ttest, among10 treatment means
would, in thelong run, leadto thedifferencebetweenthe largestand thesmallest
of them being reported assignificantsome60% of thetime. One of theproblems
hereis that the range increases faster thanthe standard deviation as thesizeof
the sample increases. The earliest attempts to devise methods to counteract this
effect come from Tippett (1925)and "Student" (1927),and later workers
referredto the"studentizing" of therange,usingtablesof thesampling distri-
bution of therange/standard deviationratio, known as the qstatistic. Newman
(1939)publisheda procedure that uses this statistic to assessthe significanceof
multiple comparisons among treatment means.
In general,the earlier writers followed Fisher, who advocatedperformingt
testsfollowing an analysis that produced an overall z that rejectedthe null
hypothesis,the variance estimate being provided by theerror mean square and
its associated degrees of freedom. Fisher'sonly cautionary note comes in a
discussionof theprocedureto beadopted whenthe ztest fails to reject the null
hypothesis:
Much caution shouldbe used before claiming significance for special comparisons.
Comparisons, whichtheexperimentwasdesignedto make, may,of course,bemade
without hesitation. It is comparisonssuggestedsubsequently,by ascrutiny of the
MULTIPLE COMPARISON PROCEDURES 197
resultsthemselves, that areopento suspicion; for if the variants arenumerous,a
comparisonof thehighestwith the lowestobservedvalue,pickedout from the results,
will often appearto be significant, even from undifferentiated material. (Fisher,
1935/1966,8th ed., p. 59)
Fisher is heregiving his blessing to plannedcomparisonsbut doesnot
mention thatthesecomparisons should, strictly speaking, be orthogonal or
independent. Whathe doessay isthat unforeseen effectsmay betaken as
guidesto future investigations. Davis andGaito (1984)are atpainsto point out
that Fisher's approach wasthat of thepractical researcher and tocontrastit with
the later emphasison fundamental logicandmathematics.
Oddly enough,a numberof procedures, springing from somewhat different
rationales,all appearedon thesceneat about the sametime. Amongthe best
known arethoseof Duncan (1951, 1955), Keuls (1952), Scheffe (1953), and
Tukey (1949, 1953). Ryan (1959) examines the issues.After contending that,
fundamentally,the same considerations apply to both a posteriori (sometimes
called post hoc) and apriori (sometimes called planned) comparisons, and
drawingan analogy with debates over one-tail andtwo-tail tests(to bediscussed
briefly later), Ryan definesthe problemas thecontrol of error rate. Per
comparisonerror ratesrefer to theprobability thata given comparisonwill be
wrongly judged significant. Per experiment error rates refer not toprobabilities
assuch,but to thefrequencyof incorrect rejectionsof thenul l hypothesisin an
experiment, overthe long run of such experiments.Finally, the so-called
experimentwise error rate is aprobability,the probability thatany oneparticular
experimenthas atleast oneincorrect conclusion. The various techniques that
were developed have all largely concentratedon reducing,or eliminating,the
effect of the latter. The exception seemsto be Duncan,who attemptedto
introduce a test basedon theerror rateper independent comparison. Ryan
suggeststhat this special procedure seems unnecessary andScheffe (1959),a
brilliant mathematical statistician, is unableto understandits justification.
Debateswill continue and, meanwhile, the packagesprovide us with all the
methodsfor a keypressor two.
For the purposesof the present discussion, the examinationof multiple
comparison procedures provides a case historyfor the stateof contemporary
statistics. First,it is anexample,to matchall examples,of the questfor rules
for decision makingandstatistical inference that lie outsidethe structureand
conceptof theexperiment itself. Ryan argues "that comparisons decided upon
a priori from somepsychological theory should not affect the natureof the
significancetests employedfor multiple comparisons" (Ryan, 1959, p. 33).
Fisher believed that research should be theory drivenandthat its results were
always opento revision.
"Multiple comparison procedures" could easily replace, with only some
198 13. ASSESSINGDIFFERENCESAND HAVING CONFIDENCE
slight modification,the subjectof ANOVA in thefollowing quotation:
The quick initial successof ANOVA in psychologycan beattributedto theunattrac-
tivenessof the then available methods of analysing large experiments, combined
with theappealof Fisher'swork which seemedto match, witha remarkabledegree
of exactness, the intellectual ethosof experimental psychologyof theperiod, with
its atheoretical andsituationalistnatureand itswish for moreextensiveexperiments.
(Lovie, 1979,p. 175)
Fisher would have deplored this; indeed, he diddeploreit. Second,it reflects
the emphasis,in today's work,on theavoidanceof the Type I error. Fisher
would havehadmixed feelings about this. On the onehand,he rejectedthe
notion of "errorsof thesecond kind"(to bediscussedin thenext chapter); only
rejection or acceptanceof thenull hypothesisentersinto his schemeof things.
On theother, hewould have been dismayed - indeed, he wasdismayed- by
a concentrationon automaticacceptanceor rejection of thenull hypothesisas
the final arbiter in assessingthe outcomeof anexperiment.
Criticizing the ideologiesof both Russiaand theUnited States, where hefelt
such technological approaches were evident, he says:
How far, within sucha system [Russia], personal and individual inferencesfrom
observed factsarepermissiblewe do notknow, but it mayperhapsbe saf er. . . to
conceal rather than to advertisetheselfishandperhapshereticalaim of understanding
for oneselfthe scientific situation.In theU.S. alsothe great importanceof organized
technologyhas Ithink madeit easyto confusethe process appropriate for drawing
correct conclusions with those aimed rather at, let ussay,speeding production, or
saving money. (Fisher, 1955, p. 70)
Third, the multiple comparison procedure debate reflects, as hasbeen noted,
the increasingly mathematical approach to applied statistics.In the United
States,a Statistical Computing Center at theState Collegeof Agriculture at
Ames, Iowa (now Iowa State University), becamethefirst centerof its kind. It
was headedby GeorgeW. Snedecor,a mathematician,who suggestedthat
Fisher be invited to lecture thereduring the summer sessionof 1931. Lush
(1972) reportsthat academic policyat that institutionwassuch that graduate
coursesin statistics were administered by theDepartmentof Mathematics.At
Berkeley,the mathematics department headed by Griffith C. Evans,who went
therein 1934,was to beinstrumentalin making thatinstitution a world center
in statistics. Fisher visited there in the late summerof 1936but madea very
poor personalimpression. Jerzy Neyman was tojoin the departmentin 1939.
And, of course,Annalsof Mathematical Statisticswasfoundedat University of
Michigan in 1930.On more thanoneoccasionduring thoseyearsat the end of
the 1930s, Fisher contemplated moving to theUnited States, and one may
CONFIDENCE INTERVALS AND SIGNIFICANCE TESTS 199
wonder whathis influence would have been on statistical developments had he
becomepart of that milieu.
And, finally, the techniqueof multiple comparisons establishes, without a
backward glance, a systemof statistics that is based unequivocally on along-run
relative frequencydefinition of probability where subjective, a priori notions
at bestrun in parallel withthe planningof anexperiment,for they certainlydo
not affect, in thestatistical context, the real decisions.
CONFIDENCE INTERVALS AND SIGNIFICANCE TESTS
Inside the laboratory, at theresearcher's terminal, as theoutcomeof the job
reveals itself,andmost certainlywithin the pagesof thejournals, success means
statistical significance.A very great many reviews andcommentaries, some of
which have been brought together by Henkel andMorrison (1970), deplorethe
concentrationon theType I error rate,the a level, as it isknown, that this
implies. To a lesserextent,but gainingin strength,is thepleafor an alternative
approachto thereportingof statistical outcomes, namely, the examinationof
confidence intervals. And,to aneven lesser extent, judging from the journals,
is the challenge thatthe statistical outcomes made in the assessment of
differencesshouldbe translatedinto the reportingof strengthsof effects. Here
the wheel is turningfull circle, backto theappreciationof experimental results
in termsof a correlational analysis.
2
Fisher himselfseemeto believe thatthe notion of statistical significance
was moreor less self-evident. Even in the last edition of Statistical Methods,
the words null andhypothesisdo notappearin the index, andsignificanceand
testsof significance, meaningof haveoneentry,which refersto thefollowing:
From a limited experience, for example,of individuals of a species,... we may
obtainsome ideaof theinfinite hypothetical populationfrom which our samplehas
been drawn, and so of theprobablenatureof future samples If a second sample
beliesthis expectationwe infer that it is, in thelanguageof statistics,drawnfrom a
second population; that the treatment...did in fact makea material difference....
Critical testsof this kind may becalled testsof significance,andwhen such tests are
availablewe maydiscover whether a second sample is or is notsignificantly different
from the first. (Fisher, 1925/1970,14th ed.,p. 41)
A few pages later, Fisher does explain the use of thetail area of the
probability interval andnotes that the p = .05level is the"convenient" limit for
judging significance. He does thisin the context of examplesof how often
2
Of interesthere,however,is theincreasinguse,and theincreasingpower,of regression
modelsin theanalysisof data; see, for example,Fox (1984).
200 13. ASSESSING DIFFERENCES AND HAVING CONFIDENCE
deviationsof a particularsizeoccurin a given numberof trials - that twicethe
standard deviation is exceeded about once in 22 trials, and so on.Thereis little
wonder that researchers interpreted significance as anextensionof thepropor-
tion of outcomesin a long-run repetitive process, an interpretationto which
Fisherobjected! In TheDesignof Experiments,he says:
In orderto assertthat a natural phenomenon is experimentally demonstrable we need,
not anisolatedrecord,but areliable methodof procedure. In relationto thetestof
significance,we may saythat a phenomenonis experimentally demonstrable when
we know how toconductan experiment whichwill rarelyfail to give us astatistically
significant result. (Fisher, 1935/1966,8th ed., p. 14)
Here Fisher certainly seems to beadvocating "rulesof procedure," againa
situation which elsewherehe condemns. Of more interestis thenotion that
experiments might be repeatedto seeif theyfail to give significant results. This
seemsto be avery curious procedure, for surely experiments, if they are to be
repeated,arerepeatedto find supportfor anassertion.The problems that these
statements cause arebasedon thefact that the null hypothesisis astatement that
is thenegationof the effect that the experimentis trying to demonstrate, and
that it is this hypothesis that is subjectedto statistical test.The Neyman-Pearson
approach (discussed in chapter15) was anattemptto overcome these problems,
but it was anapproach that again Fishercondemned.
It appearsthat Fisheris responsiblefor the firstformal statementof the .05
level as thecriterion for judgingsignificance,but theconvention predates his
work (Cowles& Davis, 1982a). Earlier statements about the improbability of
statistical outcomes were made by Pearsonin his 1900(a) paper, and"Student"
(1908a) judged that three times the probable errorin thenormal curve would
be consideredsignificant. Wood and Stratton (1910) recommend "taking
30 to 1 as thelowest oddswhich can beacceptedasgiving practical certainty
that a differencein a given directionis significant" (p. 433).
FisherBox mentions that Fisher took a courseat Cambridgeon thetheory
of errorsfrom Stratton duringthe academic year 1912-1913.
Oddsof 30 to 1representa little morethanthree timesthe probable error
(P.E.)referredto thenormal probability curve. Because the probable erroris
equivalentto alittle more than two-thirds of a standard deviation, three P.E.s is
almost two standard deviations, and, of course, reference to anytableof the
"areasunderthe normal curve" shows that a zscoreof 1.96 cutsoff 5% in the
two tails of thedistribution. With somelittle allowancefor rounding,the .05
probability levelis seento have enjoyedacceptancesome time before Fisher's
prescription.
A testof significanceis a testof theprobability of a statistical outcome under
the hypothesisof chance.In post-Fisherian analyses the probability is that of
CONFIDENCE INTERVALS AND SIGNIFICANCE TESTS 201
making an error in rejectingthe null hypothesis,the so-called TypeI error. It
is, however,not uncommonto readof thenull hypothesisbeingrejectedat the
5% level of confidence', an oddinversion that endows the act ofrejection with
a sort of statementof belief aboutthe outcome. Interpretationsof this kind
reflect, unfortunatelyin theworst possible way, the notionsof significancetests
in the Fisherian sense, andconfidence intervals introduced in the 1930sby Jerzy
Neyman. Neyman (1941) statesthat the theory of confidence intervalswas
establishedto give frequency interpretations of problemsof estimation. The
classical frequency interpretationis best understood in the context of the
long-run relative frequency of theoutcomesin, say,the rolling of a die. Actual
relative frequenciesin a finite run aretaken to be more or less equalto the
probabilities,and in the"infinite" run areequalto theprobabilities. In his 1937
paper,Neyman considersa systemof random variables x1,x
2
, xi,. .. x
n
desig-
nated E, and aprobability law p(E | 0i,02, . . . 01) where 61, 0
2
, . . . 0i are un-
known parameters. The problemis to establish:
single-valuedfunctionsof the x 's 0 () and 0() havingthe property that, whatever
the valuesof theO's,say 0' i , 0'
2
,. . . 0'i, theprobability of 0() falling short of 9'i
and at thesame timeof 9() exceedingO'i is equalto anumbera fixed in advance
so that 0 < a < 1,
It is essentialto notice thatin this problemthe probability refersto the valuesof
9() and() which, beingsingle-valuedfunctionsof the.x 's arerandom variables.
9'i being a constant,the left-hand side of [the above] doesnot representthe
probability of 0'i falling within somefixed limits. (Neyman, 1937, p. 379)
The values 0() and0() represent the confidencelimits for 0'1 andspan
the confidence interval for theconfidencecoefficient a. Caremustbe taken here
not to confuse thisa with the symbolfor theType 1 error rate;in fact thisa is
1 minusthe Type I error rate. The last sentencein thequotation from Neyman
just given is very important. First, an exampleof a statementof the confidence
interval in morefamiliar termsis, perhaps, in order. Supposethat measurements
on a particular fairly large random sample have produced a meanof 100 and
that the standarderror of this meanhasbeen calculatedto be 3. Theroutine
methodof establishingthe upperandlower limits of the 90%confidence interval
would be tocompute100 1.65(3). Whathasbeen established? The textbooks
will commonlysaythat the probability thatthe population mean, u, falls within
this interval is 90%, whichis precisely what Neyman says is not thecase.
For Neyman,the confidence limitsrepresent the solution to the statistical
202 13. ASSESSING DIFFERENCES AND HAVING CONFIDENCE
problemof estimating0\ independentof a priori probabilities. What Neyman
is sayingis that, overthe long run, confidence intervals, calculated in this way,
will containthe parameter90% of thetime. It is not just being pedanticto insist
that, in Neyman's terms, to saythat the oneinterval actually calculated contains
the parameter90% of thetime is mistaken. Nevertheless, that is the way in
which confidence intervals have sometimes come to beinterpretedandused.
A numberof writers have vigorously propounded the benefitsof confidence
intervals asopposedto significance testing:
Wheneverpossible,the basicstatisticalreportshouldbe in theform of a confidence
interval. Briefly, a confidence interval is a subsetof the alternative hypotheses
computedfrom the experimentaldatain such a waythat for a selectedconfidence
level a, theprobability thatthe thetrue hypothesisis includedin a set soobtainedis
a. Typically, an a -level confidence interval consistsof thosehypothesesunder
which the pvaluefor theexperimental outcome is larger than1 - a. . . . Confidence
intervals are theclosestwe can atpresent cometo quantitative assessment of
hypothesis-probabilities. . . and arecurrentlyour most effectiveway to eliminate
hypothesesfrom practicalconsideration- if we chooseto act asthough noneof the
hypothesesnot includedin a 95%confidence interval arecorrect, we stand onlya
5% chanceof error. (Rozeboom,1960, p. 426)
Both Ronald Fisher andJerzy Neyman would have been very unhappy with
this advice! It does, however, reflect once againthe way inwhich researchers
in the psychological sciences prescribe andpropound rules that they believe will
leadto acceptanceof thefindings of research. Rozeboom's paper is athoughtful
attemptto provide alternativesto theroutinenull hypothesis significance test
and dealswith the important aspect of degreeof belief in an outcome.
Onefinal point on confidenceinterval theory:it is apparent that some early
commentators(e.g.,E. S.Pearson, 1939b; Welch, 1939) believed that Fisher's
"fiducial theory" andNeyman's confidence interval theory were closely related.
Neyman himself (1934)felt that his work was anextensionof that of Fisher.
Fisherobjected stronglyto thenotion that therewasanythingat all confusing
about fiducial distributionsor probabilitiesanddeniedany relationshipto the
theoryof confidence intervals, which he maintained,wasitself inconsistent.In
1941 Neyman attemptedto show that thereis norelationship betweenthe two
theories,andherehe did notpull his punches:
The presentauthoris inclined to think that the literatureon thetheoryof fiducial
argumentwasbornout of ideassimilar to those underlyingthe theory of confidence
intervals. Theseideas,however, seemto have beentoo vagueto crystallizeinto a
mathematical theory. Instead they resulted in misconceptionsof "fiducial prob-
ability" and"fiducial distribution of a parameter"which seemto involve intrinsic
inconsistencies In this light, the theoryof fiducial inferenceis simply non-existent
A NOTE ON "ONE-TAIL" AND "TWO-TAIL" TESTS 203
in the samesenseas, for example, a theory of numbers definedby mutually
contradictorydefinitions. (Neyman, 1941, p. 149)
To theconfused onlooker, Neyman doesseemto have been clarifyingone
aspectof Fisher's approach, andperhapsfor a brief momentof time therewas
a hint of a rapprochement.Had it happened, thereis reasonto believe that
unequivocalstatementsfrom thesemenwould have beenof overriding impor-
tancein subsequent applications of statistical techniques. In fact, their quarrels
left the job totheir interpreters. Debate anddisagreement would, of course,
have continued,but those who like to feel safe could have turnedto the
orthodoxyof themasters,a notion thatis notwithout its attraction.
A NOTE ON "ONE-TAIL" AND "TWO-TAIL" TESTS
In the early 1950s, mainlyin the pagesof Psychological Bulletinand the
Psychological Review- then, asnow, immensely important andinfluential
journals- a debate took placeon theutility anddesirabilityof one-tail versus
two-tail tests(Burke, 1953; Hick, 1952; Jones, 1952,1954; Marks, 1951,1953).
It had been stated that when an experimental hypothesis had a directional
component- that is, notmerely thata parameteru, differed significantly from
a second parameter jj.
2
, but that, for example,ji, > u
2
- thenthe researcherwas
permittedto use theareacut off in only onetail of theprobability distribution
whenthe test of significancewasapplied. Referredto thenormal distribution,
this means that the critical value becomes 1.65 rather than 1.96. It was argued
that becausemost assertions that appealed to theory were directional- for
example, that spacedwas better than massed practicein learning, or that
extraverts conditioned poorly whereas introverts conditioned well- the actual
statistical testshould takeinto account these one-sided alternatives. Arguments
againstthe use of one-tailed tests primarily centeredon what the researcher does
whena very large differenceis obtained,but in theunexpected direction. The
temptationto "cheat"is overwhelming!It wasalso argued that such data ought
not to betreated withthe same reactionas azero difference on scientific
grounds: "It is to bedoubted whether experimental psychology, in its present
state,canafford suchlofty indifferencetoward experimental surprises" (Burke,
1953, p. 385).
Many workers were concerned that a move toward one-tail tests represented
a looseningor a lowering of conventional standards, a sort of reprehensible
breakingof therulesandpious pronouncements about scientific conservatism
abound. Predictably, the debateled toattemptsto establishthe rulesfor the use
of one-tail tests (Kimmel, 1957).
What is found in these discussions is theimplicit assumption that formally
204 13. ASSESSING DIFFERENCES AND HAVING CONFIDENCE
stated alternative hypotheses are anintegral partof statistical analysis. What is
not found in these discussions is anyreferenceto thelogic of usinga probability
distribution for the assessment of experimental data.Put baldly andsimply,
usinga one-tail test means that the researcheris using only halfthe probability
distribution, and it is inconceivable that this procedure would have been
acceptableto any of thefounding fathers.The debateis yetanother exampleof
the eagernessof practical researchers to codify the methodsof data assessment
so that statistical significance has themaximum opportunityto reveal itself,but
in the presenceof rules that discouraged, if not entirely eliminated, "fudging."
Significancelevels, or p levels,areroutinely acceptedaspart of the inter-
pretationof statistical outcomes. The statistic thatis obtainedis examined with
respectto ahypothesized distribution of thestatistic,a distribution thatcan be
completelyspecified. Whatis not soreadily appreciatedis the notion of an
alternative model. This matter is examinedin the final chapter. For now, a
summaryof theprocessof significancetesting givenin one of theweightierand
more thoughtful texts might be helpful.
1. Specificationof a hypothesizedclassof modelsand analternativeclassof models.
2. Choiceof a function of theobservationsT.
3. Evaluationof thesignificance level, i.e.,SL = P(T> t\ wheret is theobserved
value of T andwherethe probability is calculatedfor the hypothesizedclassof
models.
In most applied writingsthe significancelevel is designatedby P, acustom which
hasengendereda vast amountof confusion.
It is quite commonto refer to the hypothesizedclassof models as thenull
hypothesisand to thealternativeclassof modelsas thealternative hypothesis.We
shall omitthe adjective "null" becauseit may bemisleading. (Kempthorne& Folks,
1971, pp. 314-315)
14
Treatments and Effects:
The Rise of ANOVA
THE BEGINNINGS
In the last chapterwe consideredthe developmentof analysis of variance,
ANOVA . Herethe incorporationof this statistical technique into psychological
methodologyis examinedin a little more detail.
The emergenceof ANOVA as themost favored method of data appraisal in
psychologyfrom the late 1930sto the1960s represents a most interestingmix
offerees.The first was thecontinuingneedfor psychologyto be"respectable"
in the scientific senseand thereforeto seekout mathematical methods. The
correlational techniques that were the norm in the 1920sandearly 1930s were
not enough, nor did they lend themselves to situations where experimental
variables mightbe manipulated. The secondwas thegrowth of the commenta-
tors and textbook writerswho interpreted Fisher witha minimum needof
mathematics. Froma trickle to aflood, the recipe books pouredout over a
period of about25 yearsand theflow continues unabated. And thethird is the
emergenceof explanationsof the modelsof ANOVA using the concept of
expected mean squares. These explanations, which did notavoid mathematics,
but which wereat alevel that required little more than high school math, opened
the way formore sophisticated procedures to beboth understoodandapplied.
THE EXPERIMENTAL TEXTS
It is clear, however, that the enthusiasm that wasmountingfor the newmethods
is not reflectedin thetextbooksof experimental psychology that were published
in those years. Considering four of thebooks that were classics in their own
time shows that their authors largely avoided or ignoredthe impact of Fisher's
statistics.Osgood's Methodand Theoryin Experimental Psychology (1953)
205
206 14. TREATMENTS AND EFFECTS: THE RISE OF ANOVA
mentions neither Fisher nor ANOVA, neither Pearsonnor correlation. Wood-
worth andSchlosberg's Experimental Psychology (1954) ignores Pearson by
name,and theauthors state, "To go thoroughly into correlational analysis lies
beyond the scopeof this book" (p. 39). The writers, however, showsome
recognitionof thechanging times, admitting that the "old standard 'ruleof one
variable'" (p. 2) doesnot mean thatno more thanonefactor may bevaried.The
experimental design must allow, however, for theeffects of single variablesto
be assessedand alsothe effect of possible interactions. Fisher's Design of
Experimentsis then cited, and Underwood's (1949) bookis referencedand
recommendedas asourceof simple experimental designs. And that is it!
Hilgard's chapterin Stevens' Handbookof ExperimentalPsychology (1951)
not only givesa brief descriptionof ANOVA in thecontextof thelogic of the
control groupandfactorial designbut also comments on theutility of matched
groups designs,but other commentaries, explanations, and discussionsof
ANOVA methodsdo not appearin the book. Stevens himself recognizes the
utility of correlational techniquesbut admits thathis colleague Frederick
Mosteller had toconvincehim that his claim wasoverly conservativeandthat:
"rank order correlation does not apply to ordinal scales because the derivation
of the formula for this correlation involves the assumption that the differences
between successive ranks areequal,(p. 27)." An astonishing claim.
Underwood's(1949) book is a little more encouraging, in that he does
recognizethe importanceof statistics.His preface clearly states how important
it is to deal with both methodandcontent.His ownteachingof experimental
psychology required that statisticsbe anintegral partof it: "I believethat the
factual subject matter can becomprehended readily without a statistical knowl-
edge,but afull appreciationof experimental design problems requires some
statistical thinking"(p. v).
But, in general,it is fair to saythat many psychological researchers were not
in tune withthe statistical methods that were appearing."Statistics"seemsto
havebeen seenas anecessaryevil! Indeed, thereis more thana hint of thesame
mindsetin today'stexts.An informal surveyof introductory textbooks publish-
ed in thelast 10 years showsa depressingly high incidence of statistics being
relegatedto an appendixand of sometimesshrill claims that theycan be
understood withoutrecourseto mathematics. In using statistics such as the
t ratio andmore comprehensive analyses such asANOVA, the necessityof
randomizationis always emphasized. Researchers in thesocial andeducational
areasof psychology realized that such a requirementwhenit cameto assigning
participantsto "treatments"wasjust not possible. Levelsof ability, socioeco-
nomic groupings, age, sex, and so oncannotbe directly manipulated. When
methodologists such asCampbellandStanley (1963), authorities in theclassi-
fication and comparisonof experimental designs, showed that meaningful
THE JOURNALS AND THE PAPERS 207
analyses couldbe achieved through the use of"found groups"andwhat is often
called quasi-experimental designs, the potentialof ANOVA techniques wid-
ened considerably. The argument was, and is,advanced that unless the experi-
menter can control for all relevant variables alternative explanations for the
results other thanthe influenceof theindependent variable can befound - the
so-called correlated biases. The skeptic might argue that, in additionto thefact
that statistical appraisals are bydefinition probabilisticandtherefore uncertain,
direct manipulationof theindependent variable does not assuredly guard against
mistakenattributionof its effects.
And, very importantly, the commentaries, such asthey were,on experimen-
tal methods haveto beseenin the light of the dominant force,in American
psychologyat least,of behaviorism. For example, Brennan's textbook History
and Systemsof Psychology(1994)devotestwo chaptersandmore than40 pages
out of about 100 pagesto Twentieth-Century Systems (omitting Eastern Tradi-
tions, Contemporary Trends and theThird Force Movement). This is not to be
takenas acriticism of Brennan: indeed, his book has run toseveral editionsand
is an excellentandreadable text.
1
The point is that from Watson's (1913) paper
until well on into the 1960s, experimental psychology was, for many, the
experimental analysisof behavior - latterly wi thi n a Skinnerian framework.
Sidman's (1960) book Tactics of Scientific Research: Evaluating Experimental
Data in Psychologyis most certainlynot about statistical analysis!
Scienceis presumably dedicated to stampingout ignorance,but statistical evaluation
of dataagainsta baselinewhosecharacteristicsaredeterminedby unknown variables
constitutesa passiveacceptanceof ignorance. Thisis a curious negationof the
professedaimsof science.More consistent withthoseaimsis theevaluationof data
by meansof experimental control, (p. 45)
In general,the experimental psychologists of this ilk eschewed statistical
approaches apart from descriptive means, frequency counts,and straightfor-
ward assessments of variability.
THE JOURNALS AND THE PAPERS
Rucci andTweney (1980) have carried out acomprehensive analysis of the use
of ANOVA from 1925to 1950, concentrating mainly, but not exclusively,on
American publications. They identify the earliest research to useANOVA as a
paperby Reitz (1934).The author checkedfor homogeneityof variance, gave
the methodfor computingz, andnotedits relationshipwith r\
2
. An early paper
1
Perhapsone might be permittedto askBrennanto include a chapteron thehistory and
influenceof statistics!
208 14. TREATMENTS AND EFFECTS: THE RISE OF ANOVA
that usesa factorial designis that of Baxter (1940).The author,at theUniversity
of Minnesota, acknowledges the statistical help given to him byPalmer Johnson,
and healso credits Crutchfield (1938) with the first applicationof a factorial
designto apsychological study. Baxter states that his aim is "anattemptto show
how factorial design,asdiscussedby Fisherand more specificallyby Yates
. . . , hasbeen appliedto astudyof reactiontime" (p.494). The proposed study
presentedfor illustration dealswith reaction timeandexamined three factors:
the hand usedby theparticipant,the sensory modality (auditory or visual),and
discrimination(a singlestimulus,two stimuli, three stimuli). Baxter explains
how the treatment combinations could be arrangedand shows how partial
confounding(which resultsin some possible interactions becoming untestable)
is usedto reduce subject fatigue. The paperis altogethera remarkably clear
accountof thestructureof a factorial design.And Baxter followed throughby
reporting (1942)on theoutcomeof a study which used this design.
Rucci andTweney examined 6,457 papers in six American psychological
journals. Theyfind that from 1935to 1952 thereis asteady risein the use of
ANOVA, which is paralleledby arise in the use of the t ratio. The rise became
a fall during the war years, and therise became steeper after the war. It is
suggested that the younger researchers, who would have become acquainted
with the newtechniquesin their pre-war graduate school training, would have
been eligiblefor military serviceand the"old-timers" who wereleft used older
procedures, such as thecritical ratio. It is not thecasethat the use ofcorrelational
techniques diminished. Rucci andTweney's analysis indicates that the percent-
age ofarticles using such methods remained fairly steady throughout the period
examined. Their conclusion is that ANOVA filled the void in experimental
psychologybut did notdisplace Cronbach's (see chapter 2, p. 35)other disci-
pline of scientific psychology. Nevertheless, the use ofANOVA was not
establisheduntil quite lateandonly surpassedits pre-waruse in1950.
Overall, RucciandTweney's analysis leads, asthey themselves state,to the
view that "ANOVA was incorporatedinto psychologyin logical andorderly
steps"(p. 179), andtheir introduction avers that "It took less than15 yearsfor
psychologyto incorporateANOVA" (p. 166). Theydo notclaim - indeedthey
specifically deny - that the introductionof these techniques constitutes a
paradigm shift in the Kuhnian sense.
An examinationof the use ofANOVA from 1934to 1945 by aBritish
historianof science, Lovie (1979), gives a somewhatdifferent view from that
offered by Rucci andTweney, who,unfortunately,do notcite this work,for it
is an altogether moreinsightful appraisal, which comes to somewhatdifferent
conclusions:
The work demonstrates that incorporatingthetechnique[ANOVA] into psychology
THE JOURNALS AND THE PAPERS 209
was along andpainful process.This was duemoreto thesubstantive implications
of the method thanto thepurely practicalmattersof increasedarithmetical labour
andnovelty of language,(p. 151)
Lovie also shows that ANOVAcan beregarded,as heputsit, as"one of the
midwives of contemporary experimental psychology" (p. 152). A decidedly
non-Kuhniancharacterization, but close enoughto anindicationof a paradigm
shift! Threecase studies described by Lovie showtheshift, over a period from
1926to 1932,from informal, as itwere, "let-me-show-you" analysis through a
mixture of informal andstatistical analysis to awholly statistical analysis. It was
the designof theexperiment that occupied the experimenteras far asmethod
was concerned,and themeaningof the results couldbe consideredto be "a
scientifically acceptableform of consensual bargaining betweenthe authorand
the readeron thebasisof non-statistical common-sense statements about the
data"(p. 155).
When ANOVA wasadopted,the oldconceptualizations of the interpreta-
tions of datadied hard, and theresults brought forth by themethod were used
selectivelyand asadditional support for outcomes that the oldverbal methods
might well have uncovered. Lovie's valuableaccountand hisexaminationof
the early papers provides detailed support for thereasonswhy manyof the first
applicationsof ANOVA were trivial and whymore enlightenedand effective
useswere slowto emergein anyquantity.
A seriesof papersby Carrington (1934, 1936, 1937) in theProceedingsof
the Societyfor Psychical Research report on resultsgiven by one-way ANOVA.
The papers deal with the quantitativeanalysisof trance statesandshow thatthe
pioneersin systematic research in parapsychology were among the earliest users
of the newmethods. Indeed, Beloff (1993) makesthis interesting point:
Statistics,after all, have always been a more criticalaspectof experimental para-
psychology than they have been for psychophysicsor experimental psychology
precisely becausethe results were more likely to bechallenged.Thereis, indeed,
someevidencethat parapsychology acted as aspurto statisticiansin developing their
own disciplineand inelaboratingthe conceptof randomness,(p. 126)
It is worth noting that some of the most well-knownof theearly rigorous
experimental psychologists,for example, GardnerMurphy and William
McDougall, were interestedin theparanormal, and it was thelatter who, leaving
Harvardfor Duke University, recruitedJ. B. Rhine,who helpedto found, and
later directed, that institution's parapsychology laboratory.
Of course, there were several psychological andeducational researchers who
vigorouslysupportedthe "new" methods. Garrett andZubin (1943) observe that
ANOVA had notbeen usedwidely in psychological research. These authors
210 14. TREATMENTS AND EFFECTS: THE RISE OF ANOVA
make the point that:
Even asrecently as 1940, the author of a textbook [Lindquist]in which Fisher's
methodsareappliedto problemsin educationalrersearch, found it necessaryto use
artificial datain severalinstancesbecauseof thelack of experimental material in the
field, (p. 233)
These authors also offer the opinion thatthe belief thatthe methods deal
with small sampleshadinfluencedtheir acceptance - a belief, nay,a. fact, that
has ledothersto point to it as areasonwhy they were welcomed! They also
worried about "the language of the farm," "soil fertility, weights of pigs,
effectivenessof manurial treatmentsand thelike" (p. 233). Hearnshaw, the
eminenthistorianof psychology, also mentions the samedifficulty as areason
for his view that"Fisher'smethods were slowto percolate into psychology both
in this country [Britain]and inAmerica" (1964,p. 226). Althoughit is thecase
that agricultural examples arecited in Fisher's books, andphrases that include
the words "blocks," "plots," and"split-plots" still seema little odd topsycho-
logical researchers, it is not true to saythat the world of agriculture permeates
all Fisher's discussions and explanations, as Lovie (1979) has pointed out.
Indeed,the early chapterin TheDesignof Experimentson themathematicsof
a lady tastingtea isclearly the story of anexperiment in perceptual ability, which
is aboutaspsychologicalas onecould get.
It must be mentioned that Grant (1944) produced a detailed criticismof
Garrett andZubin's paper,in particular taking themto taskfor citing examples
of reports where ANOVAis wrongly usedor inappropriatelyapplied. Fromthe
standpointof thebasisof themethod, Grant states that in theGarrett andZubin
piece:
The impressionis given . . . that the primary purposeof analysisof varianceis to
divide thetotal variance intotwo or more components (the mean squares) which are
to beinterpreteddirectly as therespective contributions to thetotal variance made
by theexperimental variables andexperimental error, (pp. 158-159)
He goeson to saythat the purposeof ANOVA is to testthe significanceof
variation and, while agreeing that it is possibleto estimatethe proportional
contribution of the various components, "the process of estimation mustbe
clearly differentiatedfrom the test of significance"(p. 159). Theseare valid
criticisms, but it must be said that suchconfusionassome commentaries may
bring, reflects Fisher's (1934) own assertionin remarkson apaper givenby
Wishart (1934) at ameetingof theRoyal Statistical Society that ANOVA"is
not a mathematical theorem, but rathera convenient methodof arrangingthe
arithmetic" (p. 52). In additionthe estimationof treatmenteffectsis considered
THEJOURNALS AND THE PAPERS 211
by many researchers to becritical (andseechapter7, p. 83).
What is most telling about Grant's remarks, made more than 50 yearsago,
is that they illustratea criticism of data appraisal using statistics that persists to
this day, thatis, thenotion thatthe primary aim of theexerciseis to obtain
significant resultsandthat reportingeffect sizeis not aroutine procedure. Also
of interestis Grant's inclusionof expressionsfor theexpected values of the mean
squares,of which more later.
A detailed appraisal of thestatistical content of theleading British journal
has notbeen carried out, but areview of that publication over10 year periods
showsthe rise in the use ofstatistical techniques of increasing varietyand
sophistication.The progressionis not dissimilarfrom that observedby Rucci
andTweney. The first issueof theBritish Journal of Psychology appeared in
1904 underthe distinguished co-editorships of James Ward(1843-1925)and
W. H. R. Rivers(1864-1922).Some papersin that volume madeuse ofsuch
statisticsasweregenerally available, averages, medians, mean variation, stand-
ard deviation,and thecoefficient of variation. Statistics were there at thestart.
By 1929-1930,correlational techniques appeared fairly regularly in the
journal. Volume30, in 1940, contained29 papersandincludedthe use of x
2
,
the mean,the probable error,andfactor analysis. There were no reports that
usedANOVA and nor did any of the 14 papers published in the 1950 volume
(Vol. 41). But x,
2
, and the tratio still found a place.Fitt andRogers (1950), the
authorsof thepaper that used the t ratio for thedifferencebetweenthe means
felt it necessaryto includetheformula andcited Lindquist (1940)as thesource.
The researchersof the1950swho were beginningto useANOVA, followed the
appearanceof a significant F valuewith posthoc multiple t tests.The use of
multiple comparisonprocedures(chapter 13, p. 195) had not yetarrivedon the
data analysis scene. Six of the 36paperspublishedin 1960 included ANOVA,
one ofwhich was anANOVA by ranks. Kendall's Tau, the tratio, correlation,
partial correlation, factor analysis, theMann-WhitneyU test - all were used.
Statisticshadindeed arrivedandANOVA was in theforefront..
A third of the papers(17 out of 52, to bemore precise!)in the 1970 issue
madeuse ofANOVA in the appraisalof thedata. Also present were Pearson
correlation coefficients,the Wilcoxon T, Tau, Spearman's rank difference
correlation,the Mann-WhitneyU, the phicoefficient, and the tratio.
Educational researchersfigured largely in the early papers that used
ANOVA, both ascommentatorson themethodand inapplyingit to their own
research.Of course, manyin the fieldwere quitefamiliar with the correlational
techniques that had led tofactor analysisand itscontroversiesandmathematical
complexities. Therehadbeenmore than40 yearsof study anddiscussionof
skill andability testingand thenatureof intelligence-the raisond'etreof factor
analysis- andthis wascentral to thework of thosein the field.They wereready,
212 14. TREATMENTS AND EFFECTS: THE RISE OF ANOVA
eveneager,to cometo grips with methods that promised an effective way to
use themulti-factor approachin experiments,an approach thatoffered the
opportunityof testing competing theories more systematically than before.
And it was from theeducational researcher's standpoint that Stanley (1966)
offered an appraisalof theinfluenceof Fisher's work "thirty yearslater." He
makesthe interesting point that although there were experimenters with "con-
siderable 'feel'for designing experiments, . . . perhaps someof themdid not
have the technical sophisticationto do thecorrespondinganalyses."This
judgmentis basedon thefact that 5 of the 21volunteered papers submitted to
the Division of Educational Psychology of theAmerican Psychological Asso-
ciation annual conventionin 1965had to bereturnedto have more complex
analysis carried out. This may have beendue tolack of knowledge,it may have
beendue touncertainty about the applicability of thetechniques,but it also
reflects perhapsthe lingering appealof straightforwardand even subjective
analysis thatthe researchersof 20 years before favored, as wenoted earlierin
the discussionof Lovie'sappraisal.It is also Stanley's view that "the impact of
the Fisherian revolutionin thedesignandanalysisof experiments came slowly
to psychology" (p. 224). It surelyis amatterof opinion as towhether Rucci
and Tweney'scountof less than5 percentof papersin their chosen journals in
1940to between15 and 20 per centin 1952is slow growth, particularly when
it is observed that the growth of the use of ANOVA varied considerably across
the journals.
2
This leadsto thepossibility thatthe editorial policyandpractice
of their chosen journals could have influencedRucci andTweney'scount.
3
It
is also surelythe case that if a newtechniqueis worth anythingat all, thenthe
growth of its usewill showas apositively accelerated growth curve, because
as more people takeit up, journal editors havea wider anddeeper poolof
potential refereeswho cangive knowledgeable opinions. Rucci and Tweney's
remark, quoted earlier, on thelengthof time it took for ANOVA to become part
of experimental psychology, has atone that suggests that they do not believe
that the acceptancecan beregardedas"slow."
THE STATISTICAL TEXTS
The first textbooks that were written for behavioral scientists began to appear
in the late 1940sandearly 1950s althoughit mustbe noted that the much-cited
2
Stanleyobservesthat theJournal of GeneralPsychologywaslate to includeANOVA and it did
not appearat all in the1948volume.
3
Theauthor, many years ago, had apaper sent back with a quitekind note indicating that the
journal generallydid notpublish "correlational studies." Younger researchers soon learn which journals
arelikely to bereceptiveto their approaches!
EXPECTED MEAN SQUARES 213
text by Lindquist waspublishedin 1940.The impactof the firstedition of this
book may have beenlessened,however, by agreat manyerrors,both typo-
graphicandcomputational. Quinn McNemar (1940b), in his review, observes
that the volume,"in the opinion of thereviewerandstudentreaders, suffers from
an overdoseof wordiness"(p. 746). He also notes that there areseveral major
slips, although acknowledging the difficulty in explaining several of thetopics
covered,anddeclares that the book "shouldbe particularlyuseful to all who are
interestedin obtaining non-mathematical knowledge of thevariance technique"
(p. 748).
Snedecor,an agricultural statistician, published a much-used text in 1937,
and he isoften given creditfor making Fisher comprehensible (see chapter 13,
p. 194). Rucciand Tweney place George Snedecor of Iowa State College,
Harold Hotelling, an economist at Columbia, and Palmer Johnsonof the
University of Minnesota,all of whom hadspent time with Fisher himself, as the
foundersof statistical trainingin theUnited States.But anyinformal surveyof
psychologistswho were undergraduates in the 1950sand early 1960s would
likely reveal thata large number of them were rearedon thetexts of Edwards
(1950) of the University of Washington, Guilford (1950)of theUniversity of
SouthernCalifornia,or McNemar (1949)of StanfordUniversity.Guilford had
publisheda text in 1942,but it was thesecondedition of 1950and thethird of
1956 that became very popular in undergraduate teaching. In Canadaa some-
what later popular text wasFerguson's (1959) Statistical Analysis in Psychology
and Education, and in theUnited Kingdom, Yule'sAn Introduction to the
Theory of Statistics,first publishedin 1911,had a14th edition published (with
Kendall) in 1950 that includes ANOVA.
EXPECTED MEAN SQUARES
In his well-known book,Scheffe(1959) says:
The origin of therandom effects models, like that of thefixed effects models, lies in
astronomical problems; statisticians re-invented random-effects models long after
they wereintroducedby astronomersandthen developedmorecomplicatedones,
(p. 221)
Scheffeciteshis own1956(a) paper,which gives some historical background
to the notion of expected mean squares - E(MS). The estimationof variance
components using E(MS) wastakenup byDaniels (1939)andlater by Crump
(1946, 1951). Eisenhart's (1947) influential paperon theassumptions underly-
ing ANOVA modelsalsodiscusses themin this context. AndersonandBancroft
(1952) publishedone of theearliest texts that examines mathematical expecta-
tion - theexpected valueof a random variable"over thelong run" - andearly
214 14. TREATMENTS AND EFFECTS: THE RISE OF ANOVA
in this work theystatethe rules usedto operate with expected values.
The concept is easily understoodin the contextof a lottery. Supposeyou
buy a $1 .00ticket for a draw in which 500 ticketsare to besold.The first and
only prize is $400.00.Your "chances"of winning are 1 in 500 and of losing 499
in 500.If Y is therandom variable,
When Y = -$1.00,the probability of Y, i.e.,p(Y) = 499/500.
When Y= $399.00 (you won!), the probability p(Y)= 1/500.
The expectationof gain overthe long run is:
E(Y) = ZYp(Y) = (-1)(499/500)+ (399)( 1/500)= -0.0998+ 0.798= - 0.20
If you joined in thedraw repeatedlyyou would "in the long run" lose20 cents.
In ANOVA we areconcernedwith the expected valuesof the various
components- themean squares - over thelong run.
Although largely ignoredby the elementary texts, the slightly moread-
vancedapproachesdo treat their discussions andexplanationsof ANOVA' s
model from the standpointof E(MS). Cornfield andTukey (1956)examinethe
statistical basisof E(MS), but Gaito (1960) claims that "there has beenno
systematic presentation of this approachin a psychological text or journal which
will reach most psychologists" (p. 3), and hispaper aimsto rectify this. The
approach, which Gaito rightly refers to as"illuminating," offers some mathe-
matical insight intothe basisof ANOVA andfreesresearchersfrom the "recipe
book" andintuitive methods that areoften used. Eisenhart makes the point that
these methods have been worthwhile and that readersof the books for the
"non-mathematical" researcher have achieved sound andquite complex analy-
ses inusing them. But these early commentators saw theneedto go further.
Gaito'sshort paperoffers a clear statement of thegeneralE(MS) model,and his
contention that this approach brings psychology a tool for tackling many
statistical problemshasbeen borneout by thefact that now all thewell-known
intermediate texts that deal with ANOVA modelsuse it.
This is not theplaceto detail the algebraof E(MS) but hereis thestatement
of the generalcasefor a two- factor experiment:
EXPECTED MEAN SQUARES 215
In thefixed effects model - often referredto asModel I - inferences can be
made only about the treatments that have been applied. In therandom effects
model the researchermay make inferencesnot only about the treatments
actually appliedbut about the rangeof possible levels. The treatmentsapplied
are arandom sampleof therangeof possible treatments that could have been
applied. Then inferences may bemade about the effectsof all levelsfrom the
sampleof factor levels. This model is often referredto asModel II. In the case
where thereare twofactorsA and Bthereare Apotential levelsof A but only
a levels areincludedin the experiment. Whena = A then factorA is a fixed
factor. If the a levels includedare arandom sampleof thepotentialA levels then
factor A is a random factor. Similarlyfor factor B thereare Bpotential levels
of the factor and blevelsareusedin theexperiment. Whena = Athen^ = 1
andwhen B is arandom effect,the levels sample sizeb is usually very, very
muchsmaller thanB and | becomesvanishinglysmall, asdoes ^ where n is
sample sizeand N ispopulation size.
Obtaining the variance components allows us to seewhich componentsare
included in the model, be thefactor fixed or random, and to generatethe
appropriateF ratiosto test the effects.
For fixed effects: E(MS
A
) =
2 2
E(MS
AB
) = cr
e
+ no
a
p
2
, and theerror termis o
e
.
2
For random effects: E(MS
A
) = <r
e
2
+ no
ap
+ nbc
a
2
,
2 2 2
E(M$B) = <T
E
2
+na
a
p + nap
2
, E(MS
Af
i) - <7
E
2
+noa
a
p , the error termis a
E
.
2
WhenA is fixed and B israndom: E(MS
A
) = O
E
2
+ no
ap
+ nbo
a
2
,
2 2
E(MSn ) = cr
E
2
+ naoB
2
, E(MS
A
u) = O
e
2
+ noa
a
p , the error termis CT
E
.
Given thatthe structureof the Fratio requires that its numerator consists of the
componentsfor a given sourceand itsdenominatorthe same components save
the oneassociated withthe source,we find, forexample, that to testfor A, the
F ratio for fixed effectsis:
Whereas for random effects it is F
Thereis little doubt that development of this approachhasenabled ANOVAto
be more closely appreciated by its practitioners.
Finally, it is only in recent years that there hasbeena full realization that
mathematicallythe ANOVA approachand theregression approachcan be
brought together. Cronbach's delineation of the"two disciplines" (1957,and
seechapter2), theneglectby thetextbook writersof anymeaningful account
of the general linear model, and theavoidanceby theintroductory text authors
of a discussionof expected mean squares, have all contributedto amisunder-
standingof theunifying basisof statistical methods.
15
The Statistical Hotpot
TIMES OF CHANGE
The later yearsof the 1920s were watershed years for statistics. Karl Pearson
wasapproachingthe end of hiscareer(he retiredin 1933),andsomeof theolder
statisticians werenot ableto copewith the neworder.The divisionshadbeen,
andwere still being, drawn.Yule retiredfrom full-time teachingat Cambridge
in 1930, and, writingto Kendallafter K.P.'s deathin 1936 said, "I feel asthough
the Karlovingianera hascometo anend,and thePiscatorial erawhich succeeds
it is one inwhich I canplay no part" (quotedby Kendall, 1952,p. 157).
Yule's text had not bythen tackledthe problemsof small samples. The t
testswerenot discusseduntil the 11thedition of 1937,a revision that was, in
fact, undertakenby MauriceKendall. The changes that were taking place in
statistical methodologyled to positions being adopted that were often based
moreon personalityandstyle andloyalties than rational argument on thebasic
logic andutility of theapproaches. Yates, who succeeded Fisher at Rothamsted
in 1933, was towrite, in 1951, in a commentaryon Statistical Methodsfor
ResearchWorkers'.
Becauseof theimportance that correlation analysis hadassumedit wasnatural that
the analysisof variance shouldbe approachedvia correlation,but tothosenot trained
in the school of correlational analysis (of which I am fortunate to beable to count
myself one) this undoubtedly makes this part of thebook moredifficult to compre-
hend. (Yates, 1951, p. 24,emphasis added)
And Yateswas asolid supporterof Fisherto thevery end.
Fisher publishedhis text in 1925, and in thesame yearhis paper on the
applicationsof "Student's"distribution appeared. Egon Pearson and Jerzy
Neymanmet that year. They were to introducenew features into statistics that
Fisher would vehemently oppose to the end of hislife.
216
NEYMAN AND PEARSON 217
NEYMAN AND PEARSON
It is acurious fact that what most social scientists now take to be inferential
statistics is amixture of procedures that, aspresentedin many of the current
statistical cookbooks,
1
would be criticized by its innovators, RonaldA. Fisher,
Jerzy Neyman, andEgon Pearson. Fisher established the present-day paramount
importanceof therejectionor acceptanceof thenull hypothesisin the determi-
nation of a decisionon astatistical outcome- thehypothesis that theoutcome
is due tochance.The newparticipants- they couldnot bedescribedaspartners
- in theunion that became statistics argued for an appreciationof the probability
of an alternative hypothesis.
In 1925 Jerzy Neyman, on aPolishgovernmentfellowship andwith a new
PhD from Warsaw, arrivedat University College, London,to study statistics
with Karl Pearson. Neyman wasborn in Russiain 1894andread mathematics
at theUniversity of Kharkov. In 1921,he movedto the newRepublicof Poland,
wherehe becamean assistantat theUniversity of Warsawandwherehe worked
for the StateMeteorological Institute. His initial months in London were
difficult, partly becauseof his struggleswith English and partly becauseof
misunderstandings with "K.P.," but graduallyhe struckup afriendshipthat led
to a research collaboration with Egon Pearson. Neyman's professional progress
and hissocial andacademic relationships have been recounted in a sympathetic
andrevealing account by ConstanceReid (1982).
The younger Pearsonhasbeen describedby manyas asomewhatdiffident
andintrovertedman who wasvery muchin theshadowof his father. Reid tells
us of his feelings:
Pearsonhaddecided thatif he wasgoing to be astatisticianhe wasgoing to haveto
break with his father's ideasand construct his own statistical philosophy. In
retrospect,he describes what he wantedto do as"bridging the gap"between "Mark
I" statistics- a shorthand expression heusesfor thestatisticsof K.P., whichwas
basedon large samples obtained from natural populations- and thestatisticsof
StudentandFisher, whichhadtreatedsmall samples obtainedin controlled experi-
ments- "Mark II statistics." (Reid, 1982, p. 60)
Pearson (1966) himself describes "the first steps." In 1924papersby E. C.
Rhodesand byKarl Pearson (1924b) hadexploredthe problemof choosing
1
The phrase statistical cookbook has a apejorative ring, whichis not wholly justified.
Therearemany excellent basic texts available, andtherearemany excellent cookerybooks.
Both arenecessaryto ourwell-being. The point is that conflicting recipesleadto statistical
and gastronomic confusion, and thefact that such conflicts exist has,by andlarge, been
ignoredby consumersin thesocial sciences.
218 15. THE STATISTICAL HOTPOT
between alternative tests for thesignificanceof differences,tests thathad the
same logical validitybut that gavedifferent levels of significancefor the
outcome:
I setabout exploringthe multi-dimensional sample spaceandcomparing what came
to betermedthe rejection regions associated with alternative tests. Could one find
somegeneralprinciple or principles appealingto intuition, which would guideone
in choosing betweentests?(E. S.Pearson, 1966, p. 6)
Pearsonhadponderedthe question: what, exactly, was theinterpretationof
"Student's"test?
In largesamples... theratio t = (x - \i)^n/s could beregardedas thebest estimate
available of the desired ratio(x- |a)Vn/aand, as such, referredto the normal
probability scale. If the samplewas . . .small ... asamplewith a less divergent
mean jci,might well provide a largervalueof t thana secondsample witha more
divergent mean, 3ci, simply becauses \ in the firstsample happened through sampling
fluctuations to besmaller than52 in thesecond. To someone brought up with the
older pointof view this seemedat first sight paradoxical.. .
I realize thata reorientationof outlook mustfor me at anyrate have been
necessary.It was ashift which I think K. P. was notableor neversaw theneedto
make.(E. S.Pearson,1966, p. 6)
In 1926 Pearson, put theproblemto "Student,"and thelatter's reply shows,
once more, thefertility of his ideas.Justasthey hadaided Fisher's inspirations,
his commentsnow set theyounger Pearson andlater Neymanon thepath that
led to theNeyman-PearsonTheory. After noting what,of course,waswidely
accepted, that with large samples one isableto find the chance that a given value
for the meanof the sample liesat anygiven distancefrom the meanof the
population,andthat, evenif the chanceis very small, thereis no proo/thatthe
samplehas notbeen randomly drawn, he says:
What it doesis to show thatif thereis anyalternative hypothesis which will explain
the occurrenceof thesample witha more reasonable probability, say0.05 (suchas
that it belongsto adifferent populationor that the sample wasn't random or whatever
will do thetrick) you will be very much moreinclined to consider that the original
hypothesisis nottrue, (quotedby Pearson,1966, p. 7,emphasis added)
Pearson recalls that during the autumnof 1926,the problemsof thespecifi-
cation of the classof alternative hypotheses andtheir definition, the rejection
region in thesample space, and the twosourcesof error were discussed. At the
end of theyear, Pearsonwasexaminingthe likelihood ratio criterionas a way
of approachingthe questionas towhetherthe alternative,or what Fisher later
NEYMAN AND PEARSON 219
calledthe null, hypothesiswas themorelikely. Pearson always recognized that
he was aweak mathematicianandthat he needed help withthe mathematical
formulation of the newideas. He recalledto Reid (1982) thathe approached
Neyman,the newpost-doctoral student at Gower Street,perhapsbecausehe
was "so'fresh' to statistics"(p. 62) andbecause other possible collaborators
would all have preferences for either MarkI or Mark II statistics. Neymanwas
not a memberof eitherof thecamps.He really knew very littleof the statistical
work of theelder Pearson, nor that of "Student"or Fisher. But animmediate
andclose collaborationwas notpossible. Although Egon Pearson andNeyman
had a good deal of social contact overthe summerof 1926 and Pearson
remembered that they hadtouchedon the newquestions, Neyman was disen-
chanted withthe Biometric Laboratory.It was not thefrontier of mathematical
statistics thathe hadexpected,and hedeterminedto go toParis (wherehis wife
was an artstudent)andpresson with his work in probability andmathematics.
At the end of1926, correspondence began between Neyman and Pearson,
and Pearson visitedhis colleaguein Parisin the springof 1927. Reid (1982)
reportsthat shegave copiesof Neyman's lettersto Pearsonto Erich Lehmann,
an early studentof Neyman'sat Berkeleyand anauthorityon the Neyman-
Pearson theory (Lehmann, 1959), for his comment. Lehmann's conclusion was
that, at anyrateuntil early in 1927, Neyman "obviously didn't understand what
Pearsonwastalking about"(p. 73).
The first joint paper, in two parts, was publishedin Biometrika in 1928.
Neymanhadreturnedto Polandfor theacademic year 1927-1928, teaching both
at Warsawand atKrakow, and heclearlyfelt that he had notplayedas big a
role in the first paperas hadPearson.The paper (PartI) ends with Neyman's
disclaimer:
N.B. I feel it necessaryto makea brief comment on theauthorshipof this paper. Its
origin was amatterof close co-operation, both personal and byletter,and theground
covered includedthe general ideasand theillustration of theseby samplingfrom a
normal population.A partof theresults reachedin commonareincludedin Chapters
1, II and IV. Later I was much occupiedwith other work,and therefore unableto
co-operate.The experimental work, the calculationof tablesand thedevelopment
of the theory of ChaptersIII and IV are dueentirely to Dr EgonS. Pearson,(p. 240)
It might be,too, thatat this time Neyman wanted to distancehimself just a
little from the maximum likelihoodcriterion, which was themain theoretical
underpinningof theideas developedin thepaper. In theearly correspondence
with Pearson, Neyman referred to it as"your principle"and was not convinced
that it was theonly possible approach.
The 1928 (PartI) paper beginsby statingthe important problemof statistical
inference, that of determining whetheror not a particular sample(E) has
220 15. THE STATISTICAL HOTPOT
been randomly drawn from a population(n). HypothesisA is that indeedit has.
But I may have been drawn from some other populationn', andthus,two sorts
of error may arise. The first iswhenA is rejectedbut I wasdrawnfrom K, and
the secondis whenJis acceptedbut Z hasbeen drawnfrom n':
In the long run of statistical experience thefrequencyof thefirst sourceof error(or
in a single instanceits probability) can becontrolledby choosingas adiscriminating
contour, one outside whichthe frequencyof occurrenceof samplesfrom TC is very
small - say, 5 in 100 or 5 in1000.
The second sourceof error is moredifficult to control. ... It is not of course
possibleto determinen',... but . . . we maydeterminea "probable" or "likely" form
of it, and hencefix the contoursso that in moving "inwards" acrossthem the
differencebetweenn and thepopulationfrom which it is "most likely" that S has
been sampled should become less andless. This choice also implies that on moving
"outwards" acrossthe contours, other hypotheses as to thepopulation sampled
becomemore andmorelikely than HypothesisA. (Neyman& Pearson,1928, p.
177)
Hypothesis^correspondsto S having been drawn from n, andHypothesis
A' to I having been drawn from n'. A ratio of theprobabilitiesof A and A' is
a measurefor their comparison. But:
Probability is aratio of frequenciesandthis relative measure cannot be termedthe
ratio of theprobabilitiesof the hypotheses, unless we speakof probability a posteriori
andpostulate somea priori frequencydistributionof sampled populations. Fisher
hastherefore introduced the term likelihood,andcalls this comparative measure the
ratio of thelikelihoodsof the twohypotheses. (Neyman & Pearson, 1928, p. 186)
The likelihood criterionis given by:
This ratio definessurfacesin a probability spacesuchthat it decreasesfrom
1 to 0 as aspecific point moves outwardand alternativesto thestatistical
hypothesis become more likely:
One hadthento decideat which contourH
(|
shouldbe regardedas nolonger tenable,
that is whereshouldonechooseto boundthe rejection region?To help in reaching
this decisionit appeared that the probability of falling into the region chosen,if H
were true,was onenecessary piece of information. In taking this viewit can ofcourse
be argued thatour outlook was conditionedby current statistical practice, (E. S.
Pearson, 1966, p. 10)
NEYMAN AND PEARSON 221
The paperdoesconsider other approaches andnotes thatthe authorsdo not
claim thatthe principle advancedis necessarilythe bestto adopt, but it is clear
that it is favored:
We haveendeavouredto connectin a logical sequenceseveralof themost simple
tests,and in sodoing havefound it essentialto makeuse ofwhat R. A. Fisher has
termed"the principleof likelihood."
The processof reasoning,however, is necessarilyan individual matter,and we
do not claim that the method whichhasbeen mosthelpful to ourselves willbe of
greatestassistanceto others. It would seemto be acasewhere each individual must
reasonout for himself his ownphilosophy. (Neyman& Pearson,1928, p. 230)
There are faint echoes hereof the subjective element in Neyman's initial
reasoning, commented on byLehmannandreportedby Reid (1982)- thenotion
that beliefin a prior hypothesis affects the considerationof the evidence:
Thereis a subjective elementin the theory whichexpressesitself in the choice of
significance levelyou aregoing to require;but it is qualitative rather than quantita-
tive.
2
While it is not very satisfyingand rather pragmatic,I think this reflectsthe
way our minds work betterthanthe more extreme positions of either denyingany
subjective element in statistics or insisting upon its complete quantification.
(Lehmann, quotedby Reid, 1982,p. 73)
Neyman, in Polandin 1929, wantedto presenta joint paperat theInterna-
tional Statistical Institute meeting, scheduled to beheld there that year. The
paper that Neymanwas preparing dealt withthe Bayesian approachto the
problem thathe andPearsonhadconsideredand was toattemptto show thatit
led to essentiallythe same solution. Egon Pearson just could not agreeto a
collaboration that admitted, in theslightest,the notion of inverse probability:
Pearson... pointed out toNeyman thatif they publishedthe proposedpaper,with
its admissionof inverse probability, they would find themselvesin a disagreement
with Fisher. . . . Many yearslater he explained, "The conflict between K.P. and
R.A.F. left mewith a curious emotional antagonismandalsofear of thelatter sothat
it upsetme a bit to see him or to hear him talk." (Reid, 1982,p. 84)
He would not put hisnameto the paper.The curious factis that neither
Neymannor Pearson ever wholeheartedly subscribed to theinverse probability
approach, but Neymanfelt that it shouldbe addressed. Pearson's attempt to
avoid a confrontationwasdoomedto failure, just as hisfather's earlier attempts
CowlesandDavis (1982b)carriedout asimple experiment that supports the suggestion that the
.05 level of significanceis subjectively reasonable. Sandy Lovie pointed out to theauthor thatthe same
experiment,in a slightly different context,wasperformedby Bilodeau (1952).
222 15. THE STATISTICAL HOTPOT
to deflect Fisher'sviews hadbeen.
The years that spanned the turn of the 1920sto the1930s werenot particu-
larly happy onesfor thecollaborators. Pearson wasenduringthe agoniesof an
unhappyromanceand finding his relationshipwith his father increasingly
frustrating. Neymanfound his work greatly constrainedby economicand
political difficulties in Poland, and theelder Pearson rejected a paper he
submittedto Biometrika. But, albeit in anintermittentfashion,the collaboration
continuedasthey worked toward what they described astheir "big paper."
This wascommunicatedto theRoyal Societyby Karl Pearsonin Augustof
1932, readin Novemberof that year,andpublishedin Philosophical Transac-
tions in 1933. Neymanhad written to Fisher,with whom he wasthen on
reasonably amicable terms, about the paper,and thelatter hadindicated that
wereit to besentto theRoyal Society,he would likely be areferee:
To Neyman it hasalways beena sourceof satisfactionandamusement that his and
Egon'sfundamentalpaperwaspresentedto theRoyal Societyby Karl Pearson, who
washostile andskeptical of its contents,andfavourably refereedby the formidable
Fisher, who waslater to behighly critical of muchof theNeyman-Pearson theory.
(Reid, 1982,p. 103)
Reid reportsthat whenshewrote to thelibrarian of the Royal Societyto
discoverthe nameof thesecond referee, shefound that therehadonly beenone
referee, andthat he was A. C.Aitken of Edinburgh(a leading innovatorin the
field of matrix algebra).
In fact, two papers were published in 1933.The Royal Society paper dealt
with proceduresfor the determinationof the most efficient testsof statistical
hypotheses. There is no doubt thatit is one of themost influential statistical
papers ever written. It transformedthe way inwhich both the reasoning behind,
andthe actual applicationof, statistical tests were perceived. Forty years later,
Le Cam andLehmann enthusiastically assessedit:
The impact of this work hasbeenenormous. It is, for example,hardto imagine
hypothesistestingwithout theconceptof power. . . . However,theinfluenceof the
work goesfar beyond.... By deriving testsas thesolutions of clearly defined
optimum problems,NeymanandPearson established a pattern for Wald's general
decisiontheoryand for thewholefield of mathematicalstatisticsas it hasdeveloped
sincethen. (Le Cam &Lehmann, 1974, p. viii)
The later paper (Neyman& Pearson, 1933), presented to the Cambridge
Philosophical Society, again setsout theNeyman-Pearson rationale andproce-
duresfor hypothesis testing. Its statedaim is toseparate hypothesis testing from
problemsin estimationand toexaminethe employmentof testsindependently
of a priori probability laws.The authors have rejected the Bayesian approach
NEYMAN AND PEARSON 223
and employed the frequentist's viewof probability. A conceptof central
importanceis that of the powerof a statisticaltest,and theterm is introduced
herefor the firsttime. A Type I error is that of rejectinga statistical hypothesis,
H
0
, whenit is true. The Type II error occurs whenH
Q
is not rejectedbut some
rival, alternative hypothesis is, in fact, true:
If now wehave chosena regionw, in thesample spaceW, ascritical regionto test
//
o
, thentheprobability thatthesample pointI definedby the set ofvariates[x , ;c
2
. . . , *
n
] falls into w, if //
Q
is true, may bewritten as:
The chanceof rejectingH
Q
if it is true is therefore equal to 8, and w may be termed
j r of size e for H . Thesecond typeof error wi l l be made when some alternative H.
i o
is true, and Z falls in w= W-w. If we denoteby P (w),/
J
n
(w) andP(w) the chance
of an error of the first kind, the chanceof anerror of thesecondkind and thetotal
chanceof error usingw ascritical region, thenit follows that:
(Neyman& Pearson, 1933, p. 495)
The rule thenis to set thechanceof thefirst ki nd of error (the sizeor what
we would now call the alevel) at asmall valueandthen choosea rejectionclass
so that the chanceof thesecond kindof error is minimized.In fact, the procedure
attemptsto maximizethe powerof thetest for a given size. The probability of
rejecting the statistical hypothesis, H
()
, whenin fact the true hypothesisis H.,
that is P (w |H.), is calledthe powerof thecritical regionw with respectto H..
If we nowconsiderthe probability P\\ (w) of type II errors when using a testT based
on the critical regionw, we maydescribe
as theresultant powerof thetest T. . .. It isseen that whilethe power of a test with
regardto agiven alternativeH\ is independent of theprobabilitiesa priori, and is
therefore known precisely assoonas H\ and w are specified, thisis not thecasewith
the resultant power, which is afunction of the(pi's. (Neyman& Pearson, 1933b, p.
499)
224 15. THE STATISTICAL HOTPOT
Note here thatthe cp'sare theprobabilitiesa priori of the admissible
alternative hypotheses. Neyman andPearsonof coursefully recognized that the
(p's cannot oftenbe expressedin numericalform andthat the statisticianhas to
considerthe sense,from a practical point of view, in which testsareindependent
of probabilitiesa priori, noting:
This aspectof the error problemis very evidentin a numberof fields where tests
must be usedin a routine manner,anderrorsof judgment leadto wasteof energyor
financial loss. Suchis thecasein sampling inspection problems in mass-production
industry. (Neyman& Pearson,1933, p. 493)
This apparentlyinnocuous statement alludes to thefeaturesof theNeyman-
Pearson theoryof hypothesis testing that, over the courseof thenext few years,
Fisher vigorouslyattacked,that is, (a) thenotion that hypothesistestingcan be
regardedas adecisionprocessakin to methods usedin quality control, and
reducedto a set of practicalrules,and (b) theimplication that"repeatedsampling
from the same population" - which occursin industrywhensimilar samplesare
drawn from the same productionrun - determinesthe level of significance.
These suggestions were anathema to Fisher. They were, however, the very
features that made statistics so welcomein psychologyand thesocial sciences
- that is, thepromiseof a set ofruleswith apparently respectable mathematical
foundationsthat would allow decisions to bemadeon themeaningin noisy
quantitative data.Few ventured intoan examinationof thelogic of themethods;
very few would wishto betrampledas thegiantsfought in the field. These
mattersareexamined laterin this chapter.
Perhapswe shouldsparea thoughtfor Sir Ronald Fisher, curmudgeon that
he was. He must indeedbe constantly tossingin his graveas lecturersand
professors across the world, if they remember him atall, referto thecontentof
most current curricula asFisherian statistics.
STATISTICS AND INVECTIVE
The full force of Fisher's oppositionto the"Neyman-Pearsontheory was not
immediatelyfelt in 1933.The circumstancesof his developingfury are, how-
ever, evident.In a decision that could hardly have been less conducive to
harmony in the developmentof statistics,the University of London split Karl
Pearson'sdepartment whenhe retiredin 1933:
Statisticswastaught at noother British university, nor wasthereanother professor
of eugenics,chargedwith theduty and themeansof research into human heredity.
If Fisherwereto teachat auniversity, it would haveto be asPearson'ssuccessor.
(Fisher Box, 1978,p. 257)
STATISTICS AND INVECTIVE 225
Fisherbecame Galton Professor in the Departmentof Eugenics,andEgon
Pearson, promotedto Reader, headedthe Departmentof Applied Statistics.
There,in Gower Street, Fisher's department on the topfloor andEgonPearson's
on thefloor below, the rows beganto simmer. Fisher Box (1978) reports that
her fatherhadascertained, before accepting the appointment, that Egon Pearson
was well disposedto it. Fisher corresponded with him, apparently in the hope
that they might resolve conflicts in theteachingof statistics. Fisher Box (1978)
learnedfrom correspondencewith Egon Pearson that Fisher hadeven suggested
that, despitethe decisionto split the olddepartment,he andPearson should
reuniteit. Pearson's response was to theeffect that no lecturesin thetheoryof
statistics shouldbe given by Fisher, thatthe territorieshadbeen defined, and
that each should stay within them.
Early in 1934 Pearsoninvited Jerzy Neymanto join his department, tempo-
rarily, as anassistant. Neyman accepted and, in thesummerof 1934, received
an appointmentaslecturer.
The moodsat Gower Street must have been impossible, and theintensityof
the strainis difficult to imagineandreconstruct:
[Karl] Pearsonwasmadeanhonorary member of the TeaClub, andwhen hejoined
themin theCommon Room,it was observed that Fisher did him theunique honour
of breakingout of conversationto step forwardandgreethim cordially. (Fisher Box,
1978, p. 260)
Or,
The Common Roomwascarefully shared.Pearson'sgrouphad tea at 4; and at 4:30,
when theyweresafelyout of theway, Fisherand hisgroup troopedin. Karl Pearson
had withdrawn acrossthe college quadrangle withhis young assistant Florence
David. He continuedto edit Biometrika; but,as far asMiss Davidremembers, he
never againenteredhis oldbuilding. (Reid, 1982,p. 114)
It appears that, at first, NeymanandFishergot onquite well andthat Neyman
tried to bring FisherandEgon Pearson together. His work on estimation, rather
thanthe joint work with Pearsonon hypothesis testing, occupied his attention.
His 1934 paper (discussed earlier) was, in the main, well-receivedby Fisher,
and Fisher's December 1934 paper, presented to theRoyal Statistical Society
(Fisher, 1935),wascommentedon favorably by Neyman.
But any harmony disappeared at meetingsof theIndustrial andAgricultural
Sectionof theRoyal Statistical Society in 1935. Neyman presented his "Statis-
tical Problemsin Agricultural Experimentation" in which he questionedthe
efficiency of Fisher's handlingof Randomized Block andLatin Square designs,
illustrating his talk with wooden models that had been preparedfor him
226 15. THE STATISTICAL HOTPOT
at University College.
3
Fisher's responseto the paper (for whichhe was
supposedto give the vote of thanks) beganby expressingthe view, to put it
bluntly (which he did), thathe hadhoped that Neyman would be speakingon
something thathe knew something about.His closing comments disparaged
the Neyman-Pearson approach.
Frank Yates (1935) presented a paper on factorial designsto the Royal
StatisticalSocietylater that year. Neyman expresseddoubts about the interpre-
tation of interactionsand main effects whenthe numberof replicationswas
small. The problemhasbeen examined more recently by Traxler (1976). The
detailsof thesecriticisms,andthey hadvalidity, did not concern Fisher:
[Neymanhad] assertedthat Fisher waswrong. This was anunforgivable offense-
Fisher wasnever wrongandindeedthesuggestion that hemight be wastreatedby
him as adeadlyassault. Anyonewho did notacceptFisher's writingas theGod-given
truth was atbeststupid and atworst evil. (Kempthorne, 1983, p. 483)
OscarKempthorne, quoted here, andundoubtedlyan admirer of Fisher's
work, hadwhat he hasdescribedas a"partial relationship" with Fisher when he
workedat Rothamstedfrom 1941to 1946andknew Fisher'sfiery intransigence.
It is understandable that Fisher Box (1978) presentsa view of these troubled
times thatis more sympatheticto Fisher, couchingher commentaryin terms
that do reflect an important aspectof the situation. Statisticswas becoming
more mathematical, and theshift of its intellectual power baseto theUnited
Stateswas tomakeit even more mathematical. Fisher Box (1978)comments,
"Fisher was aresearch scientist usingmathematical skills, Neymana mathema-
tician applying mathematical concepts to experimentation" (p. 265).
Shequotes Neyman's reply to Fisher,in which he explainshis concerns
about inconsistencies in theapplicationof the ztest. The test is deducedfrom
the sumsof two independent squares but therestricted samplingof randomized
blocksandLatin squares, leads to themutual dependenceof results:
Mathematicianstendedto formulatethe argumentin these terms, that is in termsof
normal theory,ignoring randomization. Nevertheless, in doing sothey exhibiteda
fundamental misunderstandingof Fisher'swork, for it happensto befalse that the
derivation of the z distribution dependson the assumptions Neyman criticized.
(Fisher Box, 1978,p. 266)
Fisher Box defends this view, but it was not aview that convinced many
mathematiciansandstatisticians outside the Fisherian camp.
3
Reid (1982) relates a storytold by both NeymanandPearson.Oneevening they returned to the
departmentafter dinnerandfound the models strewn about thefloor. They suspected that the angryact
wasFisher's.
STATISTICS AND INVECTIVE 227
In 1935, Egon Pearson waspromotedto ProfessorandNeyman appointed
Readerin statisticsat University College. Fisher opposed Neyman's appoint-
ment. Neyman recalledto Reid (1982) thatat about this time Fisherhad
demanded that Neyman should lecture using only Statistical Methods for
Research Workers andthat when Neyman refused said,"'Well, if so, thenfrom
now on I shall opposeyou in all mycapacities.'And heenumerated- member
of the Royal Societyand soforth. There were quite a few. Thenheleft. Banged
the door" (Reid, 1982,p. 126).
Fisher withdrewhis support for Neyman's electionto the International
Statistical Institute.It was this sort of intense personal animosity that led to
confusion,indeeddespair,asonlookers attemptedto graspthe developmentsin
statistics. The protagonists introduced ambivalence andcontradictionasthey
defendedtheir positions. Debateanddiscussion, in any rational sense, never
took place. As if this was not enough, Neymanand Egon Pearson were
beginningto draw apart. Neyman certainly did notfeel committedto University
College. Karl Pearson died in April 1936, and very shortly thereafter Egon
Pearson began work on asurvey of his father's contribution(E. S. Pearson,
1938a). Neymanwas working on thedevelopmentof a theory of estimation
usingconfidence intervals.In a move that seems utterly astonishing, Pearson,
who had inherited the editorshipof Biometrika, after a certain amountof
equivocation, rejectedthe resulting paper. Pearson thought that the paperwas
too long and toomathematical.It subsequently appeared in the Philosophical
Transactionsof the Royal Society (Neyman, 1937). Joint work between the two
friends hadalmost ceased. Neyman visited theUnited Statesin 1937andmade
a very good impression.His visit to Berkeley resultedin his being offereda
post therein 1938,anoffer that he accepted.
Neyman's moveto theUniversity of California, the growth in the develop-
ment andapplicationsof statisticsat theState Agricultural Collegeat Ames,
Iowa, and theimportantinfluenceof theUniversity of Michigan, where Annals
of Mathematical Statistics,editedby Harry C. Carver, had been foundedin
1930, movedthe vanguardof mathematical statistics to America, whereit has
remained.The tribulationsof actualanddevastating warfare were on Britain's
horizon. Therewas no oneleft on thestatistical battlefieldwho wantedto fight
with Fisher. He wasdisliked by manybut hisclose collaborators, he wasoften
avoided,and he wasmisunderstood.
Fisher enjoyeda considerable reputation as ageneticist, carryingout experi-
mentsandpublishing work thathadconsiderable impact. His studyof natural
selection from a mathematical pointof view led to areconciliationof the
Mendelianand thebiometricapproaches to evolution. In 1943he acceptedthe
post of Arthur Balfour Professorof Genetics at Cambridge University, an
appointmentfor which Egon Pearson must have given thanks. Not that Pearson
228 15. THE STATISTICAL HOTPOT
sufferedtoo muchfrom the lashof Fisher's tongue, for Fisher regardedhim as
a lightweight and, whenever he could, coldly ignored him.
Fisherwasalwayswilling to promotethe applicationof his work to experi-
mentationin a wide varietyof fields. He wasunableto acceptany criticism of
his view of its mathematical foundations. Perhaps the most unfortunateepisode
took placeat the end ofKarl Pearson'slife. Taking as hisreasonan attack
Pearson(1936)hadmade,in a work published very shortly after his death,on
a paper writtenby an Indian statisticianR. S. Koshal (1933), Fisher (1937)
undertook"to examinefrankly the statusof thePearsonianmethods"(p. 303).
Nearly 20 years later,in an author's note accompanying a republicationof
"Professor Karl Pearsonand theMethodof Moments," Fisher reallyexceeds
the boundsof academic propriety:
If peevish intoleranceof free opinionin othersis asign of senility, it is onewhich
he haddevelopedat anearlyage. Unscrupulous manipulation of factual materialis
alsoa striking featureof thewhole corpusof Pearsonian writings, and inthis matter
someblamedoesseemto attachto Pearson'scontemporariesfor not exposinghis
arrogantpretensions.(Fisher, 1950,p. 29.302ain Bennett, 1971)
Of course Karl Pearson wasguilty of thesame sortof invective. Personal
criticism in thedefenseof their mathematical andstatistical stances is afeature
of the style of both men. No academic writer expects to escape criticism-
indeed, lackof criticism generally indicates lack of interest- but thesesort of
polemicscanonly damagethe discipline. Ordinary,andeven some extraordi-
nary, men andwomenrun for cover. Theseconflicts may well be responsible
for the rather uncritical acceptance of the statisticaltools thatwe usetoday,a
point thatis discussedfurther.
FISHER versusNEYMAN AND PEARSON
In an author's note preceding a republicationof a 1939 paper, Fisher reiterates
his oppositionto theNeyman-Pearson approach, referring specifically to the
Cambridge paper:
The principlesbroughtto light [in the following paper] seemto theauthor essential
to thetheoryof testsof significancein general,and tohave been most unwarrantably
ignoredin at leastonepretentious workon "Testingstatisticalhypotheses."Practical
experimentershavenot beenseriously influencedby this work,but in mathematical
departments, at atime when thesewere beginningto appreciatethe part they might
play asguidesin thetheoreticalaspectsof experimentation,its influencehasbeen
somewhatretrograde.(Fisher,1950, p. 35.173ain Bennett,1971).
Fisherset out hisobjectionsto NeymanandPearson's views in 1955. Statistical
FISHER versus NEYMAN AND PEARSON 229
Methodsand Scientific Induction setsout toexaminethe differencesin logic in
the twoapproaches.He acknowledges that Barnard hadobserved that:
Neyman, thinking that he wascorrectingandimproving my ownearly workon tests
of significance,as ameansto the "improvementof natural knowledge,"in fact
reinterpreted themin termsof that technological andcommercial apparatus which is
known as anacceptanceprocedure. (Fisher, 1955, p. 69)
Fisher acknowledges the importanceof acceptance procedures in industrial
settings, noting that whenever he travels by air he gives thanksfor their
reliability. He objects, however, to thetranslationof this modelto thephysical
andbiological sciences. Whereas in a factory the population thatis theproduct
has anobjective reality, suchis not thecasefor thepopulationfrom which the
psychologist'sor thebiologist's samplehasbeen drawn. In thelatter case,he
argues thereis:
a multiplicity of populationsto eachof which we canlegitimately regardour sample
asbelonging;so that the phrase "repeated sampling from the same population" does
not enableus todetermine which population is to beusedto definethe probability
level, for no one ofthemhasobjective reality, all being productsof the statistician's
imagination. (Fisher, 1955, p. 71)
Fishermaintains that significance testing in experimental science depends
only on thepropertiesof theuniquesample that hasbeen observedandthat this
sample shouldbe compared only with other possibilities, that is to say, "to a
populationof samplesin all relevant respects like that observed, neither more
precisenor less precise, andwhich thereforewe think it appropriateto selectin
specifyingthe precisionof theestimate" (Fisher, 1955, p. 72).
Fisher was never ableto cometo terms withthe critical contributionof
NeymanandPearson, namely, the notion of alternative hypotheses anderrors
of the second kind. Once again his objectionsarerootedin what he considersto
be theadoptionof themisleading quality control model. He says, "The phrase
'Errors of the second kind,' although apparently only a harmless pieceof
technicaljargon, is useful asindicatingthe type of mental confusionin which
it wascoined"(Fisher, 1955, p. 73).
Fisheragreesthat the frequencyof wrongly rejectingthenull hypothesiscan
be controlled but disagrees that any specificationof the rate of errors of the
second kindis possible. Nor is sucha specification necessary or helpful. The
crux of his argument restson hisobjectionto using hypothesis testingas a
decisionprocess:
The fashionof speakingof a null hypothesisas"accepted when false," whenever a
test of significancegivesus nostrong reasonfor rejectingit, andwhenin fact it is
230 15. THE STATISTICAL HOTPOT
in someway imperfect, showsreal ignoranceof theresearchworker'sattitude,by
suggestingthat in sucha casehe hascometo anirreversible decision. (Fisher, 1955,
p. 73)
What really should happen when the null hypothesisis accepted?The
researcher concludes that the deviationfrom truth of theworking hypothesisis
not sufficient to warrant modification. Or, saysFisher, perhaps, that the devia-
tion being in the expected direction, to an extent confirms the researcher's
suspicion,but thedata availableare notsufficient to demonstrateits reality. The
implication is clear. Experimental science is anongoing processof evaluation
andre-evaluationof evidence. Every conclusionis aprovisional conclusion:
Acceptanceis irreversible,whether theevidencefor it wasstrongor weak. It is the
result of applying mechanically rules laid down in advance;no thought is given to
the particular case,and thetester'sstateof mind, or his capacity for learning, is
inoperative. (Fisher,1955, pp. 73-74)
Finally, Fisher launches an attack,from the same basic position, on Ney-
man's use of theterm inductive behaviorto replacethe phrase inductive
reasoning. It is clear that Neymanwas looking for a statistical system that
would provide rules.The 1933 paper shows that Neyman andPearson believed
that they wereon thesame trackasFisher:
In dealing withthe problemof statistical estimation, R. A. Fisherhasshown how,
under certain conditions, what may bedescribedas rules of behaviour can be
employed which willleadto results independent of those probabilities [here Neyman
is referringto probabilitiesa priori]; in this connectionhe hasdiscussedthe important
conceptionof what he termsfiducial limits. (Neyman& Pearson,1933, p. 492,
emphasisadded)
It appearsthat the usersof statistical methods have implicitly acceptedthe
notion thatthe Neyman-Pearson approach was anatural, almost an inevitable,
progressionfrom the work of Fisher.It is, however,little wonder, even leaving
the personalitiesandpolemics aside, that Fisher objected to the mechanization
of the scientific endeavor. It wasjust not the way helookedat science.The 1955
paper almost goes out of its way toattack Abraham Wald's (1950) book on
StatisticalDecision Functions, objecting specifically to its characterizationas a
book about experimental design. Neyman (1956) rebutted the criticismsof
Fisher andeffectively restateshis position,but it is apparent that the two are
really arguing at cross-purposes. Responding to Neyman's rebuttal, Fisher
(1957) begins,"If Professor Neyman were in thehabit of learningfrom others
he might profit from the quotationhe givesfrom Yates" (Fisher, 1957, p. 179).
Of interest hereis Fisher'scontinuingdeterminationto defendthe concept
FISHER versus NEYMAN AND PEARSON 231
of' fiducial probability at all costs.He raisesthe concept againandagainin his
writings on inferenceandnever admits that it presentsdifficulties.
The expressions "fiducial probability" and"fiducial argument"areFisher's. No-
body knows just what they mean, because Fisher repudiated his most explicit,but
definitely faulty, definition and ultimately replacedit with only a few examples.
(Savage,1976, p. 466)
The Bayesian argument lends itself to decision analysis. Fisher objected to
both. The fiducial argumenthasbeen characterizedas anelliptical attemptto
arrive at Bayesian-type posterior probabilities without invoking the Bayesian-
type priors. What muddiesthe water was Fisher's insistence that fiducial
probabilitiesareverifiable in thesame way,for example,as theprobabilitiesin
gamesof chanceareverifiable. What makesthe situation even more vexing is
that the perceived relationship between Neyman's confidence intervals and
Fisher'sfiducial theory would makethe latter easierto grasp. But Neyman's
theory of estimationby confidence sets grows out of thesame conceptual roots
asNeyman-Pearson hypothesis testing. No rational compromisewas possible.
The null hypothesis(it has been noted that the term is Fisher'sandthat it
was notusedby NeymanandPearson)is, in Fisherianterms,the hypothesis that
is testedand it implies a particular value (most often zero) for the population
parameter. Whena statistical analysis produces a significant outcome, thisis
taken to beevidence against the nul l hypothesis, evidence against the stated
value for the parameter, evidence against the assertion that nothing has hap-
pened, evidence against a conclusion thatthe experimental manipulation has
had noeffect. The point is that a significant outcome does not appearto be
evidencefor anything.The Neyman-Pearson position is that hypothesis testing
demandsa researchhypothesisfor which we can findsupport. Suppose that the
statistical hypothesisstatesthat in the populationthe correlationis zero, and
indeedit is; thenan obtained sample valueof+0.8, givena reasonablen, would
leadto aType I error. If, on theother hand, the statistical hypothesis statesthat
the population valueis +0.8, but againit is really zero, thenan obtained value
of 0.8 will leadto aType II error. It wi l l immediatelybe argued thatno one
would set thepopulation parameter under the nul l hypothesisas +0.8 unless
there were evidence that it was of this order. However, this implies that
experimenters take into account other evidence than that immediately to hand.
No matter how formal the rules, it is plain that in real life the rules do not
inevitably guide behavior anddecisions,or conclusions about outcomes. As
was noted earlier,my holding a wi nni ng lottery ticket thatwasdrawn froma
million tickets - anextemely improbable event - doesnot necessarily leadto
my rejectingthe hypothesisof chance.
There are, then, difficult problemsin theassessment of therulesof statistical
232 15. THE STATISTICAL HOTPOT
procedures.It is small wonder that they are notwidely appreciated because the
foundingfathers themselves were not clearon theissues, or at anyratein public,
and in their writings, they werenot always clearon the issues. Somefew
examples willsuffice to illustratethe point.
In 1935 Karl Pearson asserted that "testsareusedto ascertain whether a
reasonable graduation curve hasbeen achieved, not to assert whether one or
another hypothesis is true or false" (K. Pearson, 1935, p. 296). All he is doing
here is arguing with Fisher.He himself usedhis own x,
2
test for hypothesis
testing(e.g.,K. Pearson, 1909).
In his famous discussionof the"tea-tasting" investigation, Fisher discusses
the "sensitiveness" of anexperiment.He notes that by increasingthe sizeof the
experiment:
we canrenderit moresensitive, meaningby this thatit will allow of thedetectionof
a lower degreeof sensory discrimination, or, in other words,of a quantitatively
smaller departurefrom the nul l hypothesis. Sincein every casethe experimentis
capableof disproving, but neverof proving this hypothesis, we may saythat the
value of the experimentis increased whenever it permitsthe null hypothesisto be
morereadily disproved.
The same result couldbeachieved by repeatingthe experiment,asoriginally
designed,upon a numberof different occasions. (Fisher, 1935/1966,8th ed., p. 22)
Here Fisher appears to bealluding to what NeymanandPearson formalized
as the powerof thetest- a notion thathe never would accept in their context.
He is also usingthe words "proved" and "disproved" rather loosely, and,
althoughhe might be forgiven, the slip flies in the faceof the routine statements
about the essentially probabilistic nature of statistical outcomes. Elsewhere
Fisherpresentsthe view that each experiment is, as itwere, self-containedin
the context of significance testing:
On the whole the ideas(a) that a test of significancemust beregardedas one of a
seriesof similar testsappliedto asuccessionof similar bodiesof data,and (b)that
the purposeof thetestis to discriminateor "decide"betweentwo or more hypotheses,
havegreatlyobscuredtheir understanding, when taken ascontingent possibilitiesbut
aselementsessential to their logic. (Fisher, 1956/1973, 3rd ed., pp. 45-46)
To practicing researchers, the notion that testsare notusedto "decide"is
incomprehensible. What elsearethey for? Fisher's supporters would saythat
what he meansis that theyare notusedto decidefinally andirreversibly,and
scientistswould surelyagree.
In his 1955 paper, where he objectsto theNeyman-Pearson specification of
the twotypesof error and gives his views (mentioned earlier) of what the
researcherwill conclude whena result fails to reach statistical significance,
PRACTICAL STATISTICS 233
Fishersays:
Theseexamples showhow badly the word "error" is usedin describing sucha
situation. Moreover, it is a fallacy, sowell known as to be astandard example, to
conclude froma test of significance that the null hypothesisis thereby established;
at most it may besaid to be confirmed or strengthened. (Fisher, 1955, p. 73,
emphasisadded)
It might be argued that a careful readingof Fisher'svoluminousworks would
makehis position clear,and that these quotations are selective.The point is
taken,but thepoint may alsobe made thatit is hardly surprising that the later
interpretationsof, andcommentarieson, his influential contribution contain
difficulties andcontradictions.Had theprotagonists been more concerned with
rational debaterather than heated argument, statistics would have had aquite
different history.
PRACTICAL STATISTICS
A striking featureof the general statistics texts produced for coursesin the
psychological sciences is theanonymityof theprescriptions that aredescribed.
A startlingly high numberof incominggraduate students in theauthor'sdepart-
ment wereunaware that the Fratio wasnamedfor a mancalled Fisher,but a
cursoryglancethroughthe indexesof introductory texts reveals why. Pearson's
nameis sometimes mentioned as aconvenient label for a correlation coefficient,
distinguishingit from the rank-order methodof Spearman. Neyman andPearson
Jr. arehardly ever acknowledged. "Power"is treatedwith caution. Controversial
issuesarenever discussed.
Gigerenzer (1987) presents an interestingdiscussionof thesituation: "The
confusion in the statistical texts presented to thepoor frustrated students was
causedin part by theattemptto sell this hybrid as thesinequa non ofscientific
inference"(p. 20).
Gigerenzer'sthesisaddresseswhat he seesasexperimentalpsychology's
fight against subjectivity. Probabilisticmodelsandstatistical methods provided
the discipline witha mechanicalprocessthat seemedto allow for objective
assessment independent of theexperimenter. Parallel constructs of individual
differencesas error and uncertainty as ignorancefurther promoted psychol-
ogy'sview of itself as anobjective science.
It is Gigerenzer's view that theillusion wasmoreor less deliberately created
by the early textbooks and hasbeen perpetuated. The general neglect of
alternativetheoriesand methodsof inference, anonymous presentation, the
silenceon thecontroversies,and"institutionalization,"all conspiredto provide
psychology with its need for objectivity. Gigerenzer makes tellingand
234 15. THE STATISTICAL HOTPOT
important points. But amore mundane explanation can beadvanced.
Surely social scientistscan beforgiven for not being mathematiciansor
logicians, andthosewho took a peekat theliteratureof the 1920sand 1930s
must have been dismayed at thebitternessof thecontroversies.At the same
time, the methods were being popularized by such peopleasSnedecor- and
they seemedto bemethods that worked. It is absolutely clear that psychologists
wantedto constructa researchmethodology that wouldbe acceptedby tradi-
tional science.The controversies were ignored because experimentalists in the
social sciences believed that they were sifting the methodological wheat from
the polemical chaff.The principalsin thearguments were ignored because of
an eagernessto get onwith the practical job,and inthis they were supported by
the master,Fisher. Moreover,the textbook writers, interpreting Fisher, can be
forgiven for their lackof acknowledgment of thefounding fathers- they were,
after all, following the masterwho rarely gave credit to anybody!
What is, perhaps, more contentious is theeffect that all this has had on the
discipline. If it is to beadmitted thatthe logical foundationsof psychology's
mostwidespread method of data assessment areshaky, whatare we tomakeof
the "findings" of experimental psychology?Is thewhole edificeof dataand
theory to becompared withthebuildings in thetownsof the"Wild West" - a
gaudy falsefront, and little of substance behind? This is an unreasonable
conclusion, and it is not aconclusion thatis borne out by even a cursory
examinationof thesuccessful predictions of behaviorand theconfident appli-
cations of psychology, in areasstretchingfrom market researchto clinical
practice, that havea utility that is indisputable.The plain fact of thematteris
that psychologyis usinga set oftools that leaves much to bedesired. Some
partsof the kit perhaps should be discarded; some of them,like blunt chisels,
will let usdown and wemight beinjured. But they seemto have been doing a
job. Psychologyis a success.
Now some wouldqualify that success. An agonizing reappraisal of the
discipline by Sigmund Kochfollowed his struggleswith his editorshipof
Psychology: A Study of a Science (1959-1963). Fromhis statementat the
beginningof that enterprise that psychology was a"disorderly matrix"to his
plea(1980)that 10 yearsafter Miller's urgingin his Presidential Address to the
American Psychological Association (1969) that psychology should be "given
away" (in thesenseof pressingfor more social relevance), Koch thought that it
ought to be"taken back."Although thedrift of thecriticism is inescapable, to
throw in thetowel is, as itwere, bothunenlighteningandunproductive.
Thereis no doubt thatthe disparate natureof its subject matter,and the
sometimes conflicting pronouncements issuing from the research journals, has
led to afragmentationof the discipline, evento denials thatit is, in fact, a
coherent discipline. However two themesare apparentand common to all
PRACTICAL STATISTICS 235
branchesof psychology.One is theclassical statistical method used by thevast
majority of psychological researchers and theother is its history, and, more
particularly, its historic questions, the mind-body dichotomy,the mechanisms
of perceptionandcognition,the nature-nurtureissue, theindividual and society,
the mysteriesof maturationandchange,andmore. This bookis anattemptto
promotean interestin both statisticsandhistory.
References
Acree,M. C. (\91$).Theoriesof statisticalinference in psychological research: A historico
critical study. Unpublished doctoral dissertation, Clark University, Worcester, MA.
Adams, W. J. (1974).Thelife and timesof the central limit theorem. New York: Kaedmon.
Airy, G. B. (1861).On thealgebraicaland numerical theoryof errors of observationsand
the combinationof observations(2nd ed.). London: Macmillan.
Anderson, R. L., & Bancroft,T. A. (1952). Statistical theoryin research. New York:
McGraw-Hill.
Anonymous(1926). Reviewof Fisher's Statistical methodsfor research-workers.British
MedicalJournal, 1, 578-579.
Arbuthnot, J. (1692).Of the lawsof chance:Or a methodof calculationof the hazardsof
gaming. London: Printedby B. Motte andsold by Randall Taylor.
Arbuthnott, J. (1710).An argumentfor divine providence, taken from the constant regularity
observ'din thebirths of both sexes. Philosophical Transactions of the Royal Society, 27,
186-190.
Barnard, G. A. (1958). Thomas Bayes - A biographical note. Biometrika, 45, 293-295.
Bartlett, M. S. (1965).R. A. Fisherand thelast fifty yearsof statisticalmethodology. Journal
of the American Statistical Association, 60, 395-409.
Bartlett, M. S. (1966).Reviewof Logic of StatisticalInference, by IanHacking. Biometrika,
53,631-633.
Baxter, B. (1940). The application of a factorial designto a psychological problem.
PsychologicalRevie\v,47,494-500.
Baxter,B. (1942).A studyof reactiontime using factorial design. Journal of Experimental
Psychology,31,430-437
Bayes,T. (1763). An essay towards solvinga problem in the doctrine of chances.
Philosophical Transactionsof the Royal Society, 53, 370-418.[Reprinted witha bio-
graphical noteby G. A. Barnardin Biometrika, 1958,45,293-315.]
Beloff, H. (Ed.). (1980).A balance sheet on Burt. Supplement to theBulletin of the British
PsychologicalSociety,33.
Beloff, J. (1993).Parapsychology:A concise history. London: Athlone Press.
Bennett,J. H.(1971).Collectedpapersof R. A. Fisher. Adelaide: Universityof Adelaide.
Berkson,J. (1938).Some difficultiesof interpretation encountered in theapplicationof the
chi-squaretest.Journal of the American Statistical Association, 33, 526-536.
Bernard, C. (1927).An introductionto the study of experimental medicine (H. C. Greene,
Trans.).New York: Macmillan. (Original work published 1865)
Bernoulli, D. (1966).Part four of the Art of Conjecturingshowingthe use andapplication
of the preceding treatisein civil, moral and economicaffairs (Bing Sung, Trans.).
Cambridge,MA: Harvard University, Department of Statistics,Tech.ReportNo. 2.
(Original work published 1713)
Bilodeau, E. A. (1952). Statistical versus intuitive confidence. American Journal of
Psychology,65, 211-271.
236
REFERENCES 237
Boring, E. G.(1920).The logic of thenormallaw of error in mental measurement. American
Journal of Psychology,31, 1-33.
Boring, E.G. (1950).A history of experimental psychology. New York: Appleton-
Century-Crofts.
Boring, E. G. (1957). Whenis human behavior predetermined? Scientific Monthly, 84,
189-196.
Bowley, A. L. (1902). Elementsof statistics. London: P. S.King. (6th ed. 1937)
Bowley, A. L. (1906).Presidential address to theeconomic science andstatistics sectionof
the British Association for the Advancementof Science, York. Journal of theRoyal
Statistical Society, 69, 540-558.
Bowley, A. L. (1926). Measurement of theprecision obtainedin sampling. Bulletinof the
International Statistical Institute, 22, 1-62.
Bowley, A. L. (1928).F. Y.Edge-worth'scontributionsto mathematical statistics. London:
Royal Statistical Society.
Bowley, A. L. (1936).The applicationof samplingto economicandsocial problems. Journal
of the AmericanStatistical Association, 31, 474-480.
Box, G. E. P.(1984).The importanceof practicein thedevelopmentof statistics.
Technometrics, 26, 1-8.
Bravais,A. (1846).Sur lesprobability'sdeserreursdesituationd'un point[On theprobability
of errorsin the positionof a point]. Memoiresde I'Academie RoyaledesSciencesde
I'lnstitut deFrance, 9, 255-332.
Brennan,J. F.(1994). History and systemsof psychology.4th Ed.EnglewoodCliffs, NJ:
Prentice-Hall.
Brown, W., & Stephenson, W. (1933).A testof thetheoryof two factors. British Journalof
Psychology,23, 352-370.
Buck, P. (1977). Seventeenth-century political arithmetic:Ci vi l strife andvital statistics. Isis,
68, 67-84.
Burke, C. J.(1953). A brief noteon one-tailed tests. Psychological Bulletin, 50, 384-387.
Burt, C. (1909).Experimental tests of generalintelligence.British Journal of Psychology,3,
94-177.
Burt, C. (1939).The factorial analysisof humanability. I I I . Lines of possible reconcilement.
British Journal of Psychology,30, 84-93.
Burt, C. (1940).Thefactors of the mind. London:University of LondonPress.
Campbell,D. T., & Stanley,J. C.(1963). Experimental andquasi-experimental designs for
researchon teaching.In N. L. Gage (Ed.), Handbookof researchon teaching, (pp.
171-246.Chicago: Rand McNally.
Carrington, W. (1934). The quantitativestudyof trance personalities. 1. Proceedingsof the
Societyfor Psychical Research. 42, \ 73-240.
Carrington, W. (1936). The quantitative studyof trance personalities. 2. Proceedingsof the
Societyfor Psychical Research. 43, 319-361.
Carrington, W. (1937).The quantitativestudy of trance personalities. 3. Proceedingsof the
Societyfor Psychical Research. 44, 189-222.
Carroll, J. B. (1953). An analytical solution for approximating simple structure in factor
analysis. Psychometrika, 18, 23-38.
Cattell, R. B. (1965a).Thescientific analysisof personality. Baltimore, MD: Penguin Books.
Cattell, R. B. (Ed.) (1965b). Handbook of multivariate experimental psychology. Chicago:
RandMcNally.
238 REFERENCES
Chang, W.-C. (1973). A historyof the chi-squaregoodness-of-fit test. Unpublished doctoral
dissertation,University of Toronto,Toronto,Ontario.
Chang, W.-C.(1976).Sampling theoriesandsampling practice. In D. B. Owen (Ed.),On the
history of statisticsand probability (pp.299-315).New York: Marcel Dekker.
Clark, R. W. (1971). Einstein, thelife and times.New York: World.
Cochrane,W. G. (1980).Fisher and theanalysis of variance.In S. E.Fienberg& D. V.
Hinckley (Eds.),R. A.Fisher: An Appreciation (pp.17-34).New York: Springer-Verlag.
Cowan,R. S.(1972). Francis Gallon's statistical ideas: The influenceof eugenics. Isis, 63,
509-528.
Cornfield, J., & Tukey, J. W.(1956). Average values of mean squares in factorials. Annals
of MathematicalStatistics,27, 907-949.
Cowan,R. S.(1977).Natureandnurture:the interplay of biology andpolitics in thework
of FrancisGallon.In W. Coleman& C. Limoges (Eds.), Studies in thehistoryof biology
(Vol. 1, pp. 138-208).Baltimore, MD: Johns Hopkins University Press.
Cowles, M. & Davis, C. (1982a).On the origins of the .05level of statistical significance.
American Psychologist, 37, 553-558.
Cowles, M., & Davis, C. (1982b). Is the .05level subjectively reasonable? Canadian
Journal of Behavioural Science, 14, 248-252.
Cowles,M., & Davis, C. (1987). The subject matterof psychology: volunteers. British
Journal of Social Psychology, 26, 97-102.
Cronbach,L. J. (1957).The twodisciplinesof scientific psychology. American Psychologist,
12, 671-684.
Crum, L. S. (1931). On analytical interpretationsof strawvotesamples. Journalof the
American Statistical Association, 26, 243-261.
Crump, S. L. (1946). The estimationof variance componentsin analysisof variance.
Biometric Bulletin, 2, 7-11.
Crump,S. L. (1951).The present status of variance component analysis. Biometrics, 7,1-16.
Crutchfield, R. S.(1938). Efficient factorial designandanalysisof variance illustratedin
psychological experimentation. Journal of Psychology,5, 339-346.
Daniels,H. E. (1939).The estimation of componentsof variance. Royal Society Journal,
Supplement6, 186-197.
Darwin, C. (1958).Theorigin of species. New York: New American Library.(Original work
published 1859)
David, F. N. (1949). Probability theoryfor statistical methods. Cambridge, England:
Cambridge UniversityPress.
David, F. N (1962).Games, godsand gambling. London: Charles Griffin.
Davis, C., & Gaito,J. (1984).Multiple comparison procedures within experimental research.
CanadianPsychology,25, 1-12.
Daw, R. H., & Pearson, E. S.(1979).AbrahamDe Moivre's 1733 derivationof thenormal
curve: A bibliographical note.In M. Kendall & R. L. Plackett (Eds.), Studies in the
history of statisticsand probability (Vol. II, pp. 63-66)New York: Macmillan. (Re-
printedfrom Biometrika,59, 677-680,1972)
Dawson,M. M. (1914). The development of insurance mathematics. InL. W. Zartman (Ed.),
Yale readingsin insurance(W. H. Price,Revisor, pp. 95-119).New Haven,CT: Yale
University Press. (Original work published in 1901).
Dempster,A. P. (1964).On the difficulties inherentin Fisher'sfiducial argument. Journal
of the American Statistical Association, 59, 56-66.
REFERENCES 239
De Moivre, A. (1967).Thedoctrineof chances:or, A methodof calculatingthe probabilities
of eventsinplay (3rded.).New York: Chelsea. Includesa biographical noteon DeMoivre
by HelenM. Walker. (Original work published in 1756)
De Morgan,A. (1838).An essayon probabilitiesand ontheir applicationto life contingencies
andinsurance offices. In D. Lardner (Ed.). Cabinet Cyclopaedia, (pp. 1-306)London:
Longman, Orme, Brown, Green & Longman,& John Taylor.
Dodd, S. C.(1928).The theoryof factors.Psychological Review, 35,1,211-234; II, 261-279.
Duncan,D. B. (1951).A significancetestfor differences between ranked treatmentsin an
analysisof variance.Virginia Journal of Science, 2, 171-189.
Duncan,D. B. (1955).Multiple rangeandmultiple F tests.Biometrics,11, 1-42.
Eden,T., & Fisher,R. A. (1927).Studiesin crop variation.IV. The experimental determ-
ination of thevalue of top dressings with cereals. Journal of Agricultural Science
77,548-562.
Edgeworth,F. Y. (1887).Observationsandstatistics:An essay on thetheoryof errors of
observationand the firstprinciplesof statistics. Transactions of the Cambridge Philo-
sophicalSociety,14, 138-169.
Edgeworth,F. Y. (1892).Correlated averages. Philosophical Magazine, 34, 190-204.
Edgeworth,F. Y. (1893). Noteon the calculation of correlation betweenorgans.
PhilosophicalMagazine,36, 350-351.
Edwards,A. E. (1950). Experimental designin psychological research. New York: Holt
RinehartandWinston.
Eisenhart,C. (1947). The assumptions underlying the analysisof variance. Biometrics, 3,
1-21.
Eisenhart,C. (1970).On the transition from "Student's"z to "Student's"/.American
Statistician,33, 6-10.
Ellis, B. (1968). Basic conceptsof measurement. Cambridge, England: Cambridge Univ-
ersity Press.
Eysenck,H. J.(1947) Thedimensionsof personality. London: Routledge & Kegan Paul.
Eysenck,H. J., & Eysenck,M. W. (1985) Personalityand individual differences: A natural
scienceapproach.New York: Plenum Press.
Fechner,G. (1966).Elementsof psychophysics. (H. Adler, Trans.).New York: Holt, Rinehart
andWinston. (Original work published 1860)
Feigl, H. (1959).Philosophical embarrassments of psychology. American Psychologist, 14,
115-128.
Ferguson,G. A. (1959). Statistical analysisin psychologyand education.New York:
McGraw-Hill.
Finn, R. W. (1973). Domesday book: A guide. Chichester, England: Phillimore.
Fisher,R. A. (1915). Frequency distributionof valuesof thecorrelationcoefficient in samples
from anindefinitely large population. Biometrika,10, 507-521.
Fisher,R. A. (1918). The correlation between relatives on thesuppositionof Mendelian
inheritance. Transactions of the Royal Societyof Edinburgh,52, 399-433.
Fisher,R. A. (1919).The genesisof twins. Genetics, 4, 489-499.
Fisher,R. A. (192 la). On the"probable error"of a coefficient of correlation deducedfrom
a small sample. Metron, 1, 3-32.
Fisher,R. A. (1921b). Studiesin crop variation. I. An examinationof theyield of dressed
grain from Broadbalk. Journalof Agricultural Science, 11, 107-135.
Fisher,R. A. (1922a).On theinterpretationof X2 from contingencytablesand thecalculation
of P. Journal of the Royal Statistical Society, 85, 87-94.
240 REFERENCES
Fisher, R. A. (1922b). The goodnessof fit of regression formulae and thedistributionof
regressioncoefficients. Journalof the Royal Statistical Society, 85, 597-612.
Fisher,R. A. (1923).Statistical testsof agreement between observation and hypothesis.
Economica, 3,139147.
Fisher, R. A. (1924a).The conditions under whichx
2
measuresthe discrepancy between
observationandhypothesis. Journal of the Royal Statistical Society. 87, 442-450.
Fisher,R. A. (1924b).On adistribution yieldingthe error functionsof several well known
statistics.Proceedingsof the International Congressof Mathematics, Toronto, 2,
805-813.
Fisher,R. A. (1925a).Applications of "Student's"distribution. Metron,5, 90-104.
Fisher,R. A. (1925b). Theoryof statistical estimation. Proceedings of the Cambridge
Philosophical Society, 22, 700-725.
Fisher, R. A. (1926a).Bayes' theoremand thefourfold table. Eugenics Review, 18,32-33.
Fisher, R. A. (1926b).The arrangementof field experiments. Journal of theMinistry of
Agriculture of Great Britain, 33, 505-513.
Fisher, R. A. (1930). Inverse probability. Proceedings of theCambridge Philosophical
Society,26, 528-535.
Fisher, R. A. (1934) Discussionof Dr Wishart's paper. Journal of theRoyal Statistical
Society,Supplement1, 51-53.
Fisher,R. A. (1935).The logic of inductive inference. Journal of theRoyal Statistical
Society,98, 39-54.
Fisher,R. A. (1937). Professor Karl Pearson and themethod of moments. Annalsof
Eugenics,7,303-318.
Fisher, R. A. (1952). Statistical methods in genetics. Heredity, 6, 1-12.
Fisher, R. A. (1955). Statistical methods and scientific induction. Journal of the Royal
Statistical Society,17,69-78.
Fisher, R. A. (1956). Statistical methods andscientific inference. New York: Hafner Press
(3rd ed., 1973).
Fisher, R. A. (1957). Comment on thenotesby Neyman, Bartlett,andWelch in this Journal
(Vol.18, No. 2, 1956).Journal of the Royal Statistical Society, B, 19, 179.
Fisher, R. A. (1966).Thedesignof experiments. (8th ed.). Edinburgh: Oliver andBoyd.
(Original work published 1935)
Fisher,R. A. (1970). Statistical methodsfor research workers. (14thed.). Edinburgh:
Oliver andBoyd. (Original work published 1925).
Fisher,R. A. (1971)."Author's Note" preceding, Professor Karl Pearson and themethodof
moments. In J. H. Bennett (Ed.)., Collected papers of R. A. Fisher (Vol. 4 p. 302).
Adelaide: University of Adelaide. (Reprintedfrom Annals of Eugenics, 1939,7,
303-318)
Fisher,R. A. (1971)."Author's Note" preceding,Thecomparisonof sampleswith possibly
unequal variances. In J. H. Bennett (Ed.)., Collected papers of R. A. Fisher (Vol. 4 pp.
190-198).Adelaide: Universityof Adelaide. (Reprintedfrom Annalsof Eugenics, 1939,
9, 174-180)
Fisher,R. A., & MacKenzie,W. A. (1923). Studiesin crop variation. II. The manurial
responseof different potato varieties. Journal of Agricultural Science,73, 311-320
Fisher Box,J. (1978). R. A. Fisher, thelife of a scientist.New York: Wiley.
FisherBox, J. (1981).Gosset, Fisher,and the tdistribution. American Statistician, 35,61-66.
Fletcher,R. (1991) Science, ideology and themedia: TheCyril Burt scandal. New Bruns-
wick, NJ: TransactionBooks.
REFERENCES 241
Forrest,D. W. (1974). Francis Gallon: The life and work of a Victorian genius. London:
Elek.
Fox, J. (1984).Linear statistical modelsand related methods. New York: Wiley.
Funkhouser,H. G. (1936).A noteon a10th century graph. Osiris, 1, 260-262.
Funkhouser,H. G. (1937). Historical development of thegraphical representation of data.
Osiris, 3, 269-404.
Gaito, J. (1960).Expected means squares in analysisof variance techniques. Psychological
Reports,7, 3-10.
Gaito, J. (1980).Measurement scales andstatistics: Resurgence of an old misconception.
Psychological Bulletin, 87, 564-567.
Galbraith, V. H. (1961). Themakingof Domesdaybook. Oxford: University Press.
Galton, F. (1877). Typical lawsof heredity. Proceedingsof the Royal Institutionof Great
Britain, 8, 282-301.[Also in Nature, 15, 492-495].
Galton, F. (1884).On theanthropometric laboratory at thelate International Health Exhibi-
tion. Journal of the Anthropological Instituteof Great Britainand Ireland, 14, 205-221.
Galton, F. (1885a).Regression towards mediocrity in hereditary stature. Journal of the
Anthropological Instituteof Great Britainand Ireland, 15, 246-263.
Galton, F. (1885b).Opening address to theAnthropological Sectionof theBritish Associa-
tion by thePresidentof theSection. Nature, 32, 507-510.
Galton, F. (1886).Family likenessin stature. Proceedingsof theRoyal Societyof London,
40, 42-73(includesan appendixby J. D.Hamilton Dickson)
Galton, F. (1888a).Co-relationsandtheir measurement, chiefly from anthropometricdata.
Proceedingsof the Royal Societyof London,45, 135-145.
Galton, F. (1888b). Personal identification and description. Proceedingsof the Royal
Institution, 12, 346-360.
Galton, F. (1889).Natural inheritance. London: Macmillan.
Galton,F. (1907)Inquiries into humanfaculty and itsdevelopment (3rd ed.). London: J. M.
Dent. (Original work published 1883).
Galton, F. (1908).Memoriesof my life. London: Methuen.
Galton, F. (1962). Hereditary genius (second edition reprinted). London: Collins and New
York: World. (Original work published 1869; 2nd ed.1892)
Galton,F. (1970).Englishmen ofscience. London: Frank Cass.(Original work published
1874)
Garnett,J. C. M., (1920) The single general factorin dissimilar mental measurements.
British Journal of Psychology,10, 242-258.
Garrett, H. E. & Zubin, J. (1943) The analysisof variancein psychological research.
Psychological Bulletin,40, 233-267.
Gauss,C. F.(1963). Theoriamotus corporum coelestium. (Theory of theMotion of Heavenly
Bodies). New York: Dover. (Original work published 1809)
Gigerenzer,G. (1987).Probabilistic thinkingand the fightagainst subjectivity. In L. Krtiger,
G. Gigerenzer& M. Morgan (Eds.),Theprobabilistic revolution: Ideasin thesciences
(Vol. 2, pp.7-33).Cambridge,MA: MIT Press.
Gini, C. (1928).Une application de la methode representative auxmateriaux du dernier
recensementde lapopulation italienne (ler decembre 1921) [An applicationof the
representative method to material from the last censusof the Italian population(1
December 1921)]. Bulletinof the International Statistical Institute, 23, 198-215.
Ginzburg,B. (1936). The scientific valueof theCopemican induction. Osiris, I, 303-313.
Glaisher,J. W. L. (1872). On the law offacility of errorsof observationand themethodof
leastsquares. Royal Astronomical Society Memoirs, 39, 75-124.
242 REFERENCES
Gould, S. J.(1981)Themismeasureof man. Markham: Penguin Books Canada.
Grant, D. A. (1944) On The analysisof variancein psychological research.' Psychological
Bulletin, 41, 158-166.
Graunt,J. (1975).Natural and political observations mentionedin afollowing indexand
madeuponthe bills of mortality. New York: Arno Press.(Original work published 1662)
Grunbaum,A. (1952). Causalityand thescienceof human behavior. American Scientist, 40,
665-676.
Guilford, J. P.(1936) Psychometric methods. New York: McGraw-Hill.
Guilford, J. P.(1950) Fundamental statistics in psychologyand education. New York:
McGraw-Hill.
Guilford, J. P.(1967) Thenatureof human intelligence. New York: McGraw-Hill.
Guthrie, S.C. (1946). A historyof medicine. Philadelphia: J. B. Lippincott.
Hacking, I. (1965). The logic of statistical inference. Cambridge, England: Cambridge
University Press.
Hacking, I. (1971).Jacques Bernoulli's Art of Conjecturing. British Journal for thePhilos-
ophy of Science,22, 209-229.
Hacking, I. (1975). Theemergenceof probability. Cambridge, England: Cambridge Univers-
ity Press.
Haggard,E. A. (1958)Intraclass correlationand theanalysisof variance.New York: Dryden
Press.
Halley, E. (1693).An estimateof themortality of mankind, drawnfrom curioustablesof the
births and funerals at thecity of Breslaw; with an attempt to ascertainthe price of
annuitieson lives. Philosophical Transactions of the Royal Society, 17, 596-610.
Harman,H. H. (1976).Modern factor analysis. Chicago: University of ChicagoPress.
Harris, J. A. (1913)On thecalculationof intra-classandinter-class coefficients of correlation
from classmoments whenthe numberof possible combinations is large. Biometrika,12,
22-27.
Hartley, D. (1966). Observations on man, his frame, hisduty and his expectations.
Gainesville,FL: Scholars' Facsimiles andReprints. (Original work published 1749)
Hays, W. L. (1963).Statistics.New York: Holt, RinehartandWinston.
Hays,W. L. (1973).Statisticsfor the social sciences. New York: Holt, RinehartandWinston.
Hearnshaw,L. S. (1964).A short historyof British psychology.New York: Barnes& Noble.
Hearnshaw,L. S. (1979).Cyril Burt, psychologist. London: Hodder and Stoughton.
Hearnshaw,L. S. (1987).Theshapingof modernpsychology. London: Routledge andKegan
Paul.
Heisenberg,W. (1927). Thephysical principlesof the quantum theory. Chicago: University
of ChicagoPress.
Henkel, R. E., & Morrison, D. E. (1970). The significance test controversy. London:
Butterworths.
Heron, D. (1911).The dangerof certainformulaesuggestedassubstitutesfor the correlation
coefficient. Biometrika,8, 109-122.
Hick, W. E. (1952). A note on one-tailed andtwo-tailed tests. Psychological Review, 59,
316-318.
Hilbert, D. (1902) Mathematical problems. Bulletin of the American Mathematical Society,
8,437-445;478-79.
Hilgard, E. (1951) Methodsandproceduresin thestudyof learning.In S. S.Stevens (Ed.)
Handbookof experimentalpsychology (pp. 517-567).New York: John Wiley.
Hogben,L. (1957).Statistical theory. London: Allen andUnwin.
REFERENCES 243
Holschuh,N. (1980).Randomizationanddesign:I. In S. E.Fienberg& D. V. Hinkley (Eds.),
R. A. Fisher: An appreciation (pp.35-45).New York: Springer-Verlag.
Holzinger,K. J., & Harman,H. H. (1941). Factor analysis: a synthesisof factorial methods.
Chicago: Universityof Chicago Press.
Hotelling, H. (1933).Analysisof a complexof statistical variables into principal components.
Journal of Educational Psychology, 24, 417-441,498-520.
Hotelling, H. (193 5). Themost predictable criterion. Journal of EducationalPsychology, 26,
139-142.
Hume,D. (1951).An enquiry concerning human understanding. In D.C. Yalden-Thomson
(Ed.). (1951).Hume, Theory of Knowledge(pp. 3-176)Edinburgh:ThomasNelson.
(Original work published 1748)
Hunter,J. E.(1997).Needed:a ban on thesignificance test. Psychological Science, 8, 3-7.
Huxley, J. (1949)Haifa centuryof genetics. (London) Sunday Times, 10th July.
Huygens,C. (1970).On reasoningin gamesof dice.In J. Bernoulli, The art of conjecture
(F. Maseres, Ed. andTrans.).New York: Redex Microprint. (Original work published
1657)
Jacobs,J. (1885).Reviewof Ebbinghaus's Ueberdas Geddchtnis. Mind, 10, 454-459.
Jeffreys, H. (1939).Randomandsystematic arrangements. Biometrika, 31, 1-8.
Jensen,A. R. (1992). Scientificfraud or falseaccusations?The caseof Cyril Burt. In D. J.
Miller & M. Hersen(Eds.),Research fraudin thebehavioraland biomedical sciences.
(pp. 97-124). New York: John Wiley& Sons.
Jevons,W. (1874). Theprinciples of science. London: Macmillan.
Jones,L. V. (1952).Testsof hypotheses: One-sided vs. two-sided alternatives. Psychological
Bulletin, 49,43-46.
Jones,L. V. (1954). A rejoinderon one-tailed tests. Psychological Bulletin, 51, 585-586.
Joynson,R. B. (1989).TheBurt affair. London: Routledge.
Kelley, T. L. (1923). Statistical method. New York: Macmillan.
Kelley, T. L. (1928).Crossroadsin themindof man: a studyof differentiable mental abilities.
StanfordCA: Stanford University Press.
Kempthorne,O. (1976). The analysisof varianceandfactorial design.In D. B. Owen (Ed.)
On the historyof statisticsand probability ( pp. 29-54.)New York: Marcel Dekker.
Kempthorne,O. (1983).A review of R. A.Fisher: An Appreciation. Journal of the American
Statistical Association, 78, 482-490.
Kempthorne,O., & Folks, L. (1971). Probability, statistics,and data analysis. Ames:
Iowa State University Press.
Kendall, M. G. (1952). George UdnyYule, 1871-1951. Journal of the Royal Statistical
Society,115A, 156-161.
Kendall, M. G. (1959).Hiawatha designsanexperiment. American Statistician, 13, 23-24.
Kendall, M. G. (1961). Daniel Bernoulli on maximumlikelihood.Biometrika,48,1-18.[This
paperis followed by atranslationby C. G.Allen of D. Bernoulli's Themost probable
choicebetween several discrepant observations and theformation therefrom of the most
likely induction 1777(?)., together with a commentaryby Leonard Euler(1707-1783).]
Kendall, M. G. (1963).Ronald Aylmer Fisher,1890-1962.Biometrika,50, 1-15.
Kendall, M. G. (1968). Studiesin thehistory of probability andstatistics, XIX. Francis Ysidro
Edgeworth (1845-1926). Biometrika, 55, 269-275.
Kendall, M. G. (1972).The history andfuture of statistics.In T. A. Bancroft(Ed.).Statistical
papersin honor of GeorgeW. Snedecor. Ames: Iowa State University press.
Kendall, M. G., & Babington Smith, B. (1938). Randomness andrandom sampling numbers.
Journal of the Royal Statistical Society, 101, 147-166.
244 REFERENCES
Kenna,J. C.(1973). Gallon'ssolutionto thecorrelation problem: A false memory? Bulletin
of the British Psychological Society, 26, 229-30.
Keuls, M. (1952). The use ofstudentized rangein connection withan analysisof variance.
Euphytica,!, 112-122.
Kevles, D. (1985).In thenameof eugenics. New York: Alfred A. Knopf.
Kiaer, A. N. (1899).Sur lesm&hodesrepresentatives ou typologies appliquees a lastatistique
(On representative methods or typologies appliedto statistics). Bulletinof the Interna-
tional Statistical Institute,11,180-185.
Kiaer, A. N. (1903). Sur lesm&hodesrepresentativesou typologies (on representative
methodsor typologies). Bulletinof the International Statistical Institute, 13, 66-78.
Kimmel H. D. (1957). Three criteriafor the use ofone-tailed tests. Psychological Bulletin,
54,351-353.
Kneale,W. (1949).Probability and induction. Oxford: Oxford UniversityPress.
Koch, S. (Ed.). (1959-1963).Psychology:A study of a science(6 Vols.). New York:
McGraw-Hill.
Koch, S. (1980).Psychology and itshuman clientele: Beneficiaries or victims?In R. A.
Kasschauand F. S.Kessel (Eds.) Psychology and society:In searchof symbiosis, (pp.
30-60).New York: Holt, RinehartandWinston.
Kolmogorov, A. N. (1956).Foundationsof the theoryof probability. (N. Morrison, Trans.)
New York: Chelsea Publishing Co. (Original work publishedin 1933)
Koren, J. (Ed.). (1918).Thehistoryof statistics.New York: Macmillan.
Koshal, R. S.(1933). Applicationof themethodof maximumlikelihood to thederivationof
efficient statisticsfor fitting frequencycurves. Journal of theRoyal Statistical Society,
96,303-313.
Kruskal, W., & Mosteller,F. (1979).Representative sampling, IV: The history of theconcept
in statistics,1895-1939.International Statistical Review, 47, 169-195.
Lancaster,H. O. (1966).Forerunnersof thePearsonX
2
. Australian Journalof Statistics,8,
117-126.
Lancaster,H. O. (1969).Thechi-squared distribution. New York: Wiley.
Laplace,P. S. de(1810). Memoir sur lesapproximationsdes formulesde tres grandes
nombres,et surleur applicationaux probabilites (Memoir on approximationsof formulae
which are functionsof very large numbers, and their applicationto probabilities).
Memoirsde laclassedessciences mathematiques et physiquesde I'institut de France,
Annee 1809, 353-415; suppl. 559-565.
LaplaceP. S. de(1812). Theorieanalytiquedesprobabilites. Paris: Courcier.
LaplaceP. S. de(1951). A philosophical essayon probabilities.(F. W. Truscott & F. L.
Emory, Trans.).New York: Dover. (Original work published 1820)
Lawley, D .N.,& Maxwell, A. E. (1971). Factor analysis as astatistical method. 2nd edition.
London: Butterworth.
Le Cam,L., & Lehmann,E. L. (1974). J. Neyman.On the occasionof his 80th birthday.
Annalsof Statistics,2, vii-xi.
Legendre,A. M. (1805). Nouvelles methodespour la determinationdesorbitesdescometes
[New methodsfor determiningthe orbits of comets],Paris: Courcier.
Lehmann,E. L. (1959).Testingstatistical hypotheses. New York: Wiley.
Lindquist, E. F. (1940). Statistical analysisin educational research. Boston: Houghton
Mifflin.
Lord, F. M. (1953).On thestatistical treatment of football numbers. American Psychologist,
5,750-751.
REFERENCES 245
Lovie, A. D. (1979). The analysis of variancein experimental psychology:1934-1945.
British Journal of Mathematicaland Statistical Psychology, 32, 151-178.
Lovie, A. D. (1983).Imagesof man inearly factoranalysis-psychological andphilosophical
aspects. In S. M. Bem, H. Van Rappard& W. Van Hoorn (Eds.), Studies in thehistory
of psychologyand thesocial sciences, Proceedingsof the first Europeanmeetingof
CHEIRON, 1 (pp. 235-247).Leiden: Leiden University.
Lovie, P., & Lovie A. D. (1993). Charles Spearman, Cyril Burt, and theorigins of factor
analysis. Journalof the History of the Behavioral Sciences. 29, 308-321.
Lovie, P., & Lovie A. D. (1995). The cold equations: Spearmanand Wilson on Factor
indeterminacy. British Journalof Mathematical and Statistical Psychology,48,
237-253.
Lupton, S. (1898).Noteson observations. London: Macmillan.
Lush, J. L. (1972).Early statisticsat Iowa State College. In T. A. Bancroft (Ed.)., Statistical
papers in honour of GeorgeW. Snedecor (pp. 211-226).Ames: Iowa State University
Press.
Macdonell, W. R. (1901).On criminal anthropometryand theidentificationof criminals.
Biometrika, 1, 177-227.
MacKenzie,D. A. (1981).Statisticsin Britain 1865-1930. Edinburgh: Edinburgh University
Press.
Magnello,M. E. (1998).Karl Pearson'smathematizationof inheritance: Fromancestral
heredityto Mendelian genetics(1895-1909). Annalsof Science,55, 35-94.
Magnello, M. E. (1999).The non-correlationof biometrics and eugenics: rival formsof
laboratory workin Karl Pearson'scareer at University College London. Historyof
Science,37, Part 1, 79-106;Part 2, 123-150.
Marks, M. R. (1951).Two kinds of experiment distinguished in termsof statistical operations.
Psychological Review, 58, 179-184.
Marks, M. R. (1953). One-andtwo-tailedtests.Psychological Review, 60, 207-208.
Maunder,W. F. (1972). Sir Arthur Lyon Bowley. An inaugural lecture delivered in the
University of Exeter.(In M. G. Kendall & R .L. Plackett (Eds.), Studies in thehistory of
statisticsand probability. (1977, Vol.II, pp. 459-480).New York: Macmillan.
Maxwell, A. E. (1977).Multivariate analysisin behavioural research. London: Chapman
andHall.
McMullen,L. (!),& Pearson, E.S.(2). (1939). William Sealy Gosset, 1876-1937 (1).
"Student"as aman; (2)."Student"as astatistician. Biometrika, 30, 205-250.
McNemar, Q. (1940a). Samplingin psychological research. Psychological Bulletin, 37,
331-365.
McNemar, Q. (1940b). Reviewof E. F.Lindquist, Statistical analysisin educational
research. Psychological Bulletin, 37, 746-748.
McNemar, Q. (1949). Psychological statistics. New York: Wiley.
Mercer, W. B. ,& Hall, A. D. (1911). The experimental errorof field trials. Journal of
Agricultural Science,4, 107-132.
Merriman, M. (1877).A list of writings relatingto themethodof leastsquares, with historical
and critical notes. Transactionsof theConnecticut Academy of Arts and Sciences,4,
151-232.
Merriman,M. (1884).A textbookon themethodof leastsquares (8th ed.). New York: Wiley.
Miche"a, R. (1938).Lesvariationsde laraisonauXVII
e
siecle;essaisur lavaleurdu langage
employ6en histoire litteraire [Differences in meaningin the 17th century; essay on the
valueof language employed in literary history]. Revue Philosophique de laFranceet de
l'Etranger,126,m-2Ql.
246 REFERENCES
Mill, J. S.(1973).A systemof logic ratiocinativeand inductive. Beinga connected viewof
the principles of evidenceand themethodsof scientific investigation (8th ed., 1872
J. M. Robson, Ed.,Toronto: University of TorontoPress. Original work published1843).
Miller, A. G. (Ed.). (1972).Thesocial psychologyof thepsychological experiment. New
York: Free Press.
Miller, G. A. (1963).RonaldA. Fisher: 1890-1962. American Journal of Psychology,76,
157-158.
Miller, G. A. (1969). Psychology as ameansof promoting human welfare. American
Psychologist,24, 1063-1075.
Nagel, E. (1936).The meaningof probability. Journalof theAmerican Statistical Associ-
ation, 31, 10-30. [Reprinted with a commentaryin J. R.Newman (Ed.),Theworld of
mathematics, 1956, Vol.11, pp. 1398-1414.New York: Simonand Schuster].
Newman,D. (1939). The distributionof the rangein samplesfrom a normal population
expressedin terms of anindependent estimate of standard deviation. Biometrika, 31,
20-30.
Newman,J. R.(Ed.).(1956). Theworld of mathematics. (Vols. 1-4). New York: Simonand
Schuster.
Neyman,J. (1934). On the twodifferent aspectsof therepresentative method: The method
of stratified samplingand themethodof purposive selection. Journal of the Royal
Statistical Society, 97, 558-625.
Neyman,J. (1935).Statistical problemsin agricultural experimentation. Journal of the Royal
Statistical Society, Supplement, 2, 107-154.
Neyman,J. (1937). Outlineof a theory of estimation basedon theclassical theoryof
probability. Philosophical Transactions of the Royal Society, A, 236, 333-380.
Neyman,J. (1941).Fiducial argument and thetheory of confidence intervals. Biometrika
32, 128-150.
Neyman,J. (1956).Note on anarticle by Sir Ronald Fisher. Journal of the Royal Statistical
Society,B, 18, 288-294.
Neyman,J., & Pearson,E. S.(1928). On the use and interpretationof certaintestcriteria for
purposesof statistical inference. Biometrika, 20a, Part 1, 175-240,Part II, 263-294.
Neyman,J., & Pearson, E. S.(1933). The testingof statistical hypothesesin relation to
probabilities a priori. Proceedings of the Cambridge Philosophical Society, 29,
492-510.
Norton, B., & Pearson,E. S.(1976). A noteon thebackgroundto, andrefereeingof, R. A.
Fisher's1918 paper'On thecorrelation between relatives on thesuppositionof Mende-
lian inheritance.' Notesand Recordsof the Royal Society, 31, 151-162.
Ore, O. (1953).Cardano: Thegambling scholar. Princeton, NJ: Princeton UniversityPress.
Osgood,C. E.(1953)Methodand theoryin experimental psychology. New York: Oxford
University Press.
Parten,M. (1966).Surveys, polls,and samples: Practical procedures. New York: Cooper
Square Publishers.
Pearl, R. (1917).The probable errorof a Mendelianclassfrequency.American Naturalist,
51, 144-156.
Pearson, E. S.(1938a).Karl Pearson:An appreciationof some aspects of his life and work.
Cambridge, England: Cambridge University Press.
Pearson, E. S.(1938b).Someaspectsof theproblemof randomization.II. An illustration of
'Student's' inquiry into theeffect of balancingin agricultural experiments. Biometrika,
30,159-171.
REFERENCES 247
Pearson, E. S.(1939a).William Sealy Gosset,1876-1937(2). "Student"as astatistician.
Biometrika, 30, 205-250.
Pearson, E. S.(193 9b).Note on the inverse and direct methodsof estimation in R. D.
Gordon'sproblem. Biometrika,31, 181-186.
Pearson,E. S.(1965).Some incidentsin theearly historyof biometryandstatistics, 1890-94
Biometrika, 52, 3-18.
Pearson, E. S.(1966).TheNeyman-Pearson story: 1926-34. Historical sidelightson an
episodein Anglo-Polish collaboration. In F. N. David (Ed.).,Festschrift for J. Neyman
(pp. 1-23).London: Wiley.
Pearson, E. S.(1968).Some early correspondence between W. S.Gosset,R. A. Fisherand
Karl Pearson, with notes andcomments. Biometrika, 55, 445-457.
Pearson, E. S.(1970).Karl Pearson'slectureson thehistory of statisticsin the17th and18th
centuries. Appendixto E. S.Pearson& M. G. Kendall (Eds.),Studiesin thehistory of
statisticsand probability, (pp. 479-481). London: Griffin.
Pearson,K. (1892).Thegrammar of science. London: Scott.
Pearson,K. (1894). Contributions to themathematical theoryof evolution. Philosophical
Transactionsof the Royal Society, A, 185, 71-110.
Pearson, K. (1895). Contributionsto themathematical theoryof evolution. II. Skewvar-
iations in homogeneous material. Philosophical Transactions of the Royal Society,A,
186, 343-414.
Pearson, K. (1896).Mathematical contributions to thetheoryof evolution. III. Regression,
heredity, and panmixia. Philosophical Transactions of the Royal Society,A, 757,
253-318.
Pearson, K. (1900a).On thecriterion thata given systemof deviationsfrom the probablein
the caseof a correlated systemof variablesis such thatit can bereasonably supposed to
have arisenfrom random sampling. Philosophical Magazine, 50, 157-175.
Pearson, K. (1900b). Mathematical contributionsto the theory of evolution. VII. On the
correlationof charactersnot quantitatively measurable. Philosophical Transactions of
the Royal Society, A, 795, 147.
Pearson, K. (1901). On lines and points of closest fit to systemsof points in space.
Philosophical Magazine, 2-6, 559-572.
Pearson, K. (1903). On thelaws of inheritancein man,I. Biometrika,2, 357-462.
Pearson, K. (1904a).On thelaws of inheritancein man,II. Biometrika,3, 131-190.
Pearson, K. (1904b).Mathematical contributionsto the theory of evolution. XIII. On the
theory of contingencyand itsrelationto associationandnormal correlation. Draper's
CompanyResearch Memoirs, Biometric Series 1. 35pages.
Pearson, K. (1906).Walter Frank Raphael Weldon, 1860-1906.Biometrika,5, 1-52.
Pearson, K. (1907).Reply to certain criticismsof Mr G. U. Yule. Biometrika,5, 470-476.
Pearson, K. (1909).On the test of goodnessof fit of observationto theory in Mendelian
experiments. Biometrika, 9, 309-314.
Pearson, K. (1911). On theprobability that two independent distributions of frequencyare
really samplesfrom thesame population. Biometrika, 8, 250-254.
Pearson, K. (1916a). On abrief proof of the fundamentalformula for testingthe goodness
of fit of frequencydistributionsand of theprobable error of "P." Philosophical Magazine,
31, 369-378.
Pearson, K. (1916b). Mathematical contributionsto thetheory of evolution.XIX. Second
supplementto a memoir on skew variation. Philosophical Transactions of the Royal
Society,A, 216, 429-457.
248 REFERENCES
Pearson, K. (1917). The probable errorof a Mendelian classfrequency.Biometrika,11,
429-432.
Pearson, K. (1920).Noteson thehistory of correlation.Biometrika,13, 25-45.
Pearson, K. (1922).On the x
2
test of goodnessof fit. Biometrika,14,186-191.
Pearson, K. (1924a).Historical noteon theorigin of thenormal curveof errors.Biometrika,
16,402-404.
Pearson, K. (1924b). On the difference and thedoublet testsfor ascertaining whether two
samples have been drawn from the same population. Biometrika, 16,249-252.
Pearson,K. (1926). Abrahamde Moivre. Reply to Professor Archibald. Nature, 117,
551-552.
Pearson,K. (1927). The mathematicsof intelligence. Reviewof C. Spearman,Theabilities
of man, their natureand measurement. Nature, 120, 181-183.
Pearson,K (1914-1930). The life, letters and labours of Francis Galton. Cambridge,
England: Cambridge University Press. (Vol.1, 1914; Vol. 2, 1924; Vol.3a, 3b,1930)
Pearson,K. (1935). Statistical tests.Nature, 136, 296-297.
Pearson,K. (1936)Methodof momentsandmethod of maximum likelihood. Biometrika,
28, 34-59.
Pearson, K. (1978). The history of statistics in the 17th and 18th centuries against the
changing backgroundof intellectual,scientific and religious thought.( E. S. Pearson,
Ed.). London: CharlesGriffin.
Pearson,K., & Heron,D. (1913). On theoriesof association. Biometrika, 9, 159-315.
Pearson, K., & Moul, M. (1927).The mathematicsof intelligence. Biometrika, 19,246-291.
Peirce,B. (1852). Criterionfor the rejectionof doubtful observations.TheAstronomical
Journal, 2,161-163.
Peters,C. C.(1943). Misusesof theFisher statistics. Journal of EducationalResearch,36,
546-549.
Petrinovich, L. F., & Hardyck, C. D. (1969).Error ratesfor multiple comparison methods:
Some evidence concerning the frequency of erroneous conclusions. Psychological
Bulletin, 71, 43-54.
Petty, W. (1690). Political arithmetick. London: Printedfor Robert Clavel and Henry
Mortlock.
Phillips, L. D. (1973).Bayesian statistics for social scientists. London: Nelson.
Picard, R. (1980).Randomizationanddesign:II. In S. E.Fienberg& D. V. Hinkley (Eds.)
R. A. Fisher: An appreciation (pp.46-58).New York: Springer-Verlag.
Playfair, W. (1801a). Commercialandpolitical atlas. (3rd ed.). London: Printed by T. Burton.
Playfair, W. (1801b).Statistical breviary; shewing on aprinciple entirely new, the resources
of every stateand kingdomin Europe. London: Printedby T. Bensleyfor J. Wallis,
Egerton, VernorandHood, BlackandParry,andTibbet andDidier.
Poisson,S.- D.(1837). Recherches sur la probabilite desjugements en matiere criminelleet
en matiere civile, precedee desregies general du calcule desprobabilites. (Research on
the probability of judgmentsin criminal andcivil matters,precededby generalrulesfor
the calculationof probabilities). Paris: Bachelier.
Popper,K. R. (1959). Thelogic of scientific discovery. London: Hutchinson.
Popper,K. R. (1962).Conjecturesandrefutations: Thegrowthof scientific knowledge.New
York: Basic Books.
Porter,T (1986)Therise of statistical thinking. Princeton, NJ: Princeton UniversityPress.
REFERENCES 249
Quetelet,L. A. (1849).Letters addressedto H.R.H. the Grand Dukeof SaxeCoburgand
Gothaon theTheory of Probabilitiesasapplied to themoral and political sciences.(O.
G. Downes, Trans.). London: Charles & Edwin Leyton. Authorized facsimileof the
original book producedin 1975by Xerox University Microfilms, Ann Arbor, MI.
(Original work published1835)
RAND Corporation(1965). A million random digits-with 100,000 normal deviates. New
York: FreePress.
Reichenbach, H. (1938).Experienceand prediction. An analysis of thefoundationsand
structureof knowledge. Chicago: University of ChicagoPress.
Reid, C. (1982). Neyman-from life. New York: Springer-Verlag.
Reitz, W. (1934). Statistical techniques for thestudyof institutional differences. Journal of
ExperimentalEducation,3, 11-24.
Rhodes,E.G. (1924). On theproblem whethertwo given samplescan besupposedto have
been drawnfrom the same population. Biometrika, 16, 239-248.
Robinson,C. (1932). Straw votes. New York: Columbia UniversityPress.
Robinson,C. (193 7). Recent developments in thestraw-pollfield. Public Opinion Quarterly,
1 (3), 45-56,and 1(4), 42-52.
Rosenthal,R., & Rosnow,R. L. (Eds.). (1969).Artifact in behavioral research. New York:
Academic Press.
Rowe, F. B. (1983). Whatever becameof poor Kinnebrook? American Psychologist, 38,
851-852.
Rozeboom,W. W. (1960). Thefallacy of thenull hypothesis significance test. Psychological
Bulletin, 57, 416-428.
Rucci, A.J.,& Tweney, R.D. (1980) Analysisof varianceand the"SecondDiscipline" of
scientific psychology:A historical account. Psychological Bulletin, 87, 166-184.
Russell,B. (1931). Thescientific outlook. London:Allen andUnwin.
Russell,B. (1961). Historyof Westernphilosophyand itsconnection with politicaland social
circumstancesfrom theearliest timesto thepresent day. London: Allen andUnwin.
(Original work published1946)
Russell,Sir John, (1926). Field experiments: How they aremadeandwhat theyare.Journal
of the Ministry of Agriculture of Great Britain,32, 989-1001.
Rutherford,E., & Geiger, H. (1910). The probability variationsin the distribution of a
particles. Philosophical Magazine, 20, 698-707.
Ryan,T. A. (1959).Multiple comparisonsin psychological rsearch. Psychological Bulletin,
56, 26-47.
Savage,L. J. (1962).Thefoundationsof statistical inference. New York: Wiley.
Savage,L. J. (1976).On rereadingR. A. Fisher. Annalsof Statistics,4, 441-500.
ScheffiS, H. (1953).A methodfor judging all contrastsin theanalysisof variance. Biometrika,
40, 87-104.
Scheff6, H. (1956a)Alternative modelsfor theanalysisof variance. Annalsof Mathematical
Statistics,27, 23-36.
ScheffS,H. (1956b) q'mixed model for theanalysisof variance. Annalsof Mathematical
Statistics,27, 251-271.
Scheffe,H. (1959). Theanalysisof variance.New York: Wiley.
Seal,H.L. (1967).The historical development of theGauss linear model. Biometrika, 54,
1-24.
250 REFERENCES
Seng,Y .P. (1951). Historical survey of thedevelopmentof sampling theoryandpractice.
Journal of the Royal Statistical Society, 114,2] 4-231.
Sheynin,O. B. (1966).Origin of thetheoryof error.Nature, 211,1003-1004.
Sheynin,O. B. (1978).S. - D.Poisson'swork in probability.Archivesfor History of Exact
Sciences,18,245-300.
Sidman,M. (1960).Tacticsof scientific research.New York: BasicBooks.
Simpson,T. (1.755).A letter to theRight Honourable George Earl of Macclesfield, President
of the Royal Society,on theadvantage arisingin taking the meanof a numberof
observations,in practical astronomy. Philosophical Transactions of the Royal Society,
49, 82-93.
Simpson,T. (1757).On the advantage arisingfrom taking the mean of a number of
observations, in practical astronomy, wherin the odds thatthe resultin this way ismore
exactthanfrom onesingle observationis evincedand theutility of themethodin practise
clearly madeappear.In Miscellaneous tractson some curiousand very interesting
subjectsin mechanics, physical astronomy and speculative mathematics (pp. 64-75).
London: Printedfor J. Nourse.
Skinner, B. F. (1953).Scienceand human behavior. New York: Macmillan.
Smith, D. E. (1929)A source bookin mathematics. New York: McGrawHill.
Smith, K. (1916). On the "best" values of the constants in frequency distributions.
Biometrika,11, 262-276.
Snedecor, G. W. (1934).Calculationand interpretationof analysisof varianceand covar-
iance. AmesIA: CollegiatePress.
Snedecor,G. W. (1937). Statistical methods. Ames, IA: Collegiate Press.
Soper,H. E. (1913). On the probable errorof the correlation coefficient to a second
approximation.Biometrika,9, 91-115.
Soper,H. E., Young,A. W., Cave,B. M., Lee,A., & Pearson,K. (1917).On thedistribution
of the correlation coefficientin small samples. A co-operative study. Biometrika, 11,
328-413.
Spearman,C. (1904a). General intelligence, objectively determined and measured. Amer-
ican Journal of Psychology,15, 201-293.
Spearman,C. (1904b).The proof andmeasurement of association between two things.
American Journalof Psychology,15, 72101.
Spearman,C. (1927). Theabilities of man. New York: Macmillan.
Spearman,C. (1933).The factor theoryand itstroubles. Journal of EducationalPsychology,
24, 521-524.
Spearman,C. (1939).The factorial analysisof humanability. II. Determinationof factors.
British Journal of Psychology,30, 78-83.
St. Cyres, Viscount (1909).Pascal. London: Smith, Elder and Co.
Stanley,J. C.(1966).The influenceof Fisher's""The designof experiments"on educational
research thirty years later. American Educational Research Journal, 3, 23-229.
Stephan,F. F.(1948). Historyof the use ofmodern sampling procedures. Journal of the
American Statistical Association, 43, 12-39.
Stephen,L. (1876).History of English thought in theeighteenth century (Vols. 1 -2). London:
Smith, Elderand Co.
Stephenson, W. (1939).The factorial analysisof human ability.IV Abilities definedas
non-fractional factors. British Journal of Psychology,30, 94-104.
Stevens, S .S.(1951).Mathematics, measurement andpsychophysics. In S.S. Stevens (Ed.).,
Handbookof experimentalpsychology (pp.1-49).New York: Wiley.
REFERENCES 251
Stigler, S. M. (1977).Eight centuriesof sampling inspection: the trial of the Pyx. Journal
of the American Statistical Association, 72, 493-500.
Struik, D. J. (1954).A concise historyof mathematics. London: Bell.
"Student"(1907).On theerror of countingwith a haemacytometer. Biometrika, 5,351-360.
"Student"(1908a).The probable errorof a mean. Biometrika, 6, 1-25.
"Student"(1908b).Probable error of a correlationcoefficient. Biometrika,6, 302-310.
"Student"(1925).New tablesfor testingthe significanceof observations. Metron, 5,25-32.
"Student"(1927). Errorsof routine analysis. Biometrika, 19, 151-164.
Tedin, O. (1931). The influenceof systematic plot arrangements upon the estimateof error
in field experiments. Journal of Agricultural Science, 21, 191-208.
Thomson,G. H. (1939a).The factorial analysisof human ability.I. The present positionand
the problems confrontingus.British Journal of Psychology,30, 71-77.
Thomson,G. H. (1939b).Thefactorial analysisof human ability- agreement anddisagree-
ment in factor analysis:a summingup. British Journal of Psychology,30, 105-108.
Thomson,G. H. (1946).Thefactorial analysisof human ability. London: Universityof
LondonPress.(First edition 1939)
Thomson,G. H. (1969).Theeducationof an Englishman. Edinburgh: Moray House Publ-
ications.
Thomson,W. (Lord Kelvin). (1891). Popular lectures and addresses. London: Macmillan.
Thorndike,E. L. (1905). Measurements of twins. Archivesof Philosophy,Psychology,and
Scientific Methods,No. 1. NewYork: SciencePress.
Thorndike,E. L., & Woodworm,R. S.(1901).Theinfluenceof improvement in onemental
function upontheefficiency of otherfunctions.Psychological Review, 8, 247-261.
Thurstone,L. L. (1935).Thevectorsof the mind. Chicago: University of ChicagoPress.
Thurstone,L. L. (1940).Current issuesin factor analysis. Psychological Bulletin, 37,
189-236.
Thurstone,L. L. (1947)Multiple factor analysis. Chicago: University of ChicagoPress.
Tippett, L. H. C. (1925). On theextremeindividualsand therangeof samples takenfrom a
normal population. Biometrika, 17, 364-387.
Todhunter,I. (1865). A historyof the mathematical theoryof probability from thetimeof
Pascalto that of Laplace. CambridgeandLondon: Macmillan. [Reprintedin 1965by
the Chelsea Publishing Co.,New York].
Traxler, R. H. (1976).A snagin thehistory of factorial experiments. In D. B. Owen (Ed.),
On the historyof probability and statistics(pp. 283-295). New York: Marcel Dekker.
Tukey, J. W.(1949). Comparingindividual meansin theanalysisof variance. Biometrics,
5,99-114.
Tukey, J. W. (1953). The problem of multiple comparisons. Unpublished manuscript,
Princeton University, Princeton, NJ.
Turner, F. M. (1978). The Victorian conflict between science andreligion: A professional
dimension. Isis,69, 356-376.
Underwood,B. J. (1949).Experimentalpsychology.New York: Appleton-Century-Crofts.
Venn, J. (1888). The logic of chance. (3rd ed.),London: Macmillan. (Original work
published1866).
Venn, J. (1891).On thenatureandusesof averages. Journal of the Royal Statistical Society,
54, 429-448.
von Mises, R. (1919). Grundlagender wahrscheinlichkeitsrechnung (Principles of pro-
bability theory). Mathematische Zeitschrift, 5, 52-99.
252 REFERENCES
von Mises,R. (1957). Probability, statisticsand truth. (2nd rev. Englished.preparedby
H. Geiringer).London: AllenandUnwin.
Wald, A. (1950).Statistical decision functions. New York: Wiley.
Walker, E. L. (1970).Psychologyas anatural and social science. Belmont, CA: Brooks/
Cole.
Walker, H. M. (1929). Studiesin thehistoryof statistical method. Baltimore, MD: Williams
and Wilkins.
Wallis, W. A., & Roberts,H. V. (1956).Statistics:A newapproach.New York: FreePress.
Watson.J. B.(1913).Psychology as thebehaviorist viewsit. Psychological Review, 20,
158-177.
Weaver,W. (1977).Lady Luck: Thetheoryof probability. Harmondsworth: Penguin Books.
(Original work publishedin 1963).
Welch, B. L. (1939).On confidence limitsand sufficiency with particular referenceto
parametersof location. Annalsof Mathematical Statistics, 10, 58-69.
Welch, B. L. (1958)."Student"andsmall sample theory. Journal of the American Statistical
Association,53, 777788.
Weldon, W. F. R. (1890). The variations occurringin certain DecapodCrustacea.-!.
Crangonvulgar is. Proceedingsof the Royal Societyof London,47, 445-453.
Weldon, W.F.R. (1892).Certain correlated variations in Crangon vulgaris. Proceedings of
the Royal Societyof London,51, 2-21.
Weldon, W.F.R.(1893).On certain correlated variations in Carcinus moenas. Proceedings
of the Royal Societyof London,54, 318-329.
Westergaard, H. (1932). Contributionsto thehistoryof statistics. London: P. S.King.
Whitney, C. A. (1984, October). Generating and testing pseudorandom numbers. Byte
p. 128.
Wilson, E. B. (1928).Reviewof Theabilities of man, their natureand measurementby
Charles Spearman. Science, 67, 244-248.
Wilson, E. B. (1929a).Reviewof Crossroadsin themindof man by T. LKelley. Journal
of General Psychology, 2, 153-169.
Wilson, E. B. (1929b). Comment on Professor Spearman's note. Journal of Educational
Psychology,20, 217-223.
Wishart, J. (1934). Statisticsin agricultural research. Journal of the Royal Statistical Society
Supplement,I, 26-51.
Wolfle, D. (1940).Factor analysisto 1940.Chicago:University of ChicagoPress.
Wood, T. B., & Stratton,F. J. M.(1910).The interpretationof experimental results. Journal
of Agricultural Science, 3, 417-440.
Woodworth,R. S., & Schlosberg,H. (1954) Experimental psychology. New York: Holt,
Rinehart& Winston.
Yates, F. (1935).Complex experimentation. Journal of the Royal Statistical SocietySuppl-
ement^,181-247.
Yates,F. (1939).The comparative advantages of systematicandrandomized arrangements
in the designof agricultural andbiological experiments. Biometrika, 30,440-466.
Yates,F. (1951). The influence of Statistical Methodsfor ResearchWorkers on the
developmentof thescienceof statistics. Journal of the American Statistical Association,
46, 19-34.
Yates,F. (1964)Sir RonaldFisherand thedesignof experiments. Biometrics,20,307-321
Yule, G. U. (1897a).Noteson theteachingof thetheoryof statisticsat University College.
Journal of the Royal Statistical Society, 60, 456-458.
REFERENCES 253
Yule, G. U. (1897b).On thetheoryof correlation. Journal of theRoyal Statistical Society,
60, 812-854.
Yule, G. U. (1900).On theassociationof attributesin statistics.Philosophical Transactions
of the Royal Society, A, 194, 257-319.
Yule, G. U. (1906).On aproperty which holds good for all groupingsof a normal distribution
of frequencyfor two variables. Proceedings of the Royal Society, A, 77, 324-336.
Yule, G. U. (1912). On the methodsof measuring association between two attributes.
Journal of the Royal Statistical Society, 75, 579-642.
Yule, G. U. (1921). Reviewof W. Brown & G.H.Thomson, The essentialsof mental
measurement. British Journal of Psychology,2, 100-107.
Yule, G. U. (1936). Karl Pearson,1857-1936.Obituary Noticesof Fellowsof the Royal
Societyof London,2, 73-104.
Yule, G. U. (1938a).Notes of Karl Pearson's lectures on Theoryof Statistics. Biometrika,
30, 198-203.
Yule, G. U. (1938b). A test of Tippett's random sampling numbers. Journal of theRoyal
Statistical Society, 101, 167-172.
Yule, G. U. (1939).[Review of] Karl Pearson:An appreciationof some aspects of his life
and work. By E. S.Pearson. Cambridge: University Press, 1938. Nature, 143, 220-222.
Yule, G. U., & Kendall,M. G. (1950).An introductionto thetheoryof statistics (14th ed.)
London: CharlesGriffin. (1st ed., Yule, 1911)
Author Index
A
Acree,M.C.,57,61,236
Adams, W. J., 125-126,236
Airy,G.B., 107, 115,236
Anderson,R. L., 213,236
Arbuthnot, J., 51-52,235
B
Babington Smith,B., 86-88,243
Bancroft,!. A., 213,235
Barnard,G. A., 77,235
Bartlett, M. S., 83,177,235
Baxter,B., 208,236
Bayes,T., 77-78,235
Beloff,H., 163,235
Beloff, J., 209,235
Bennett,J. H., 113,235
Berkson,J., 83,236
Bernard,C., 18, 236
Bernoulli, J., 9, 52, 59, 125,236
Bilodeau, E. A., 22,236
Boring, E. G., 13, 24, 64, 65,237
Bowley, A. L., 95-96,98-99,108,114,
237
Box, G. E. P, 7,237
Bravais, A., 146-147, 237
Brennan,J. F.,207, 237
Buck, P., 50, 237
Burke, C. J.,203, 237
Burt, C., 5,156,162-164,167,237
c
Campbell,D. T., 206, 237
Carrington,W., 209, 237
Carroll, J. B., 169, 237
Cattell, R. B., 170, 237
Cave,B.M., 118,250
Chang, W.-C.,95, 99,107,238
Clark, R. W., 23-24,238
Cochrane,W. G., 188,238
Cornfield, J., 214,238
Cowan, R.S.,4, 130,238
Cowles, M., 17, 21,200,221,238
Cronbach,L. J., 34,189,208, 215,238
Crum, L. S., 94, 238
Crump, S. L. 213,238
D
Daniels, H. E., 213,238
Darwin, C., 1, 3,129-131,238
David, F. N., 56, 58,65,238
Davis, C., 17, 21,196-197,200, 221, 238
Daw, R. H., 64,238
Dawson,M.M., 51,53,238
Dempster,A. P., 83,238
De Moivre, A., 10,14,20,52-53,63-65,
69, 72-76,239
De Morgan,A., 12,61,239
Dodd, S. C., 34, 239
Duncan,D. B., 197,239
E
Eden,T., 178,239
Edgeworth,F. Y., 90, 98,110, 117, 145-
146,155,239
Edwards,A. E., 213,239
Eisenhart,C., 117, 195,213-214, 239
Ellis, B., 40-41,239
Eysenck,H. J., 170,239
254
F
Fechner,G., 33,239
Feigl, H., 25, 239
Ferguson,G. A., 213
Finn, R. W., 48, 239
Fisher,R. A.,4-6,38,62,81-82,99,102
110-111,113-115,118-122,171,177-
181,183-184,186-200,202-203,205-
206,210,212-213,216,225,210,228-
230,232-233,239, 240
Fisher Box,J., 112, 114, 117,119-121,
179, 181, 183, 200, 224-226,240
Fletcher,R., 163,240
Folks, L., 204, 243
Forrest, D. W., 130, 137,241
Fox, J., 175, 199,247
Funkhouser,H. G., 54, 241
G
Gaito, J., 44,196, 197,214,241
Galbraith,V. H., 48,241
Gallon, F., 2, 4-5,13-14,46, 128-142,
145-146,155,159,247
Garnett,J. C. M., 161,247
Garrett, H. E., 209-210,247
Gauss,C. F.,64-65,91-92,126,
181-182,247
Geiger, H., 71, 249
Gigerenzer,G., 233, 247
Gini, C., 99,247
Ginzburg,B., 28, 247
Glaisher,J. W. L., 92,247
Gould, S.J.,155,242
Grant, D. A., 210-211,242
Graunt,J., 8,48-52,242
Griinbaum, A., 25, 242
Guilford,J. P., 169,213,242
Guthrie, S. C., 172,242
H
Hacking, I., 57, 59, 63, 80, 83, 125, 242
Haggard,E. A., 193, 242
Hall, A. D., 101,245
Halley,E., 51-53,242
Hardyck, C. D., 196,245
AUTHOR INDEX 255
Harman,H.H., 170,242
Harris, J. A., 190,242
Hartley, D., 176, 242
Hays, W. L., 195,242
Hearnshaw,L. S., 28,156, 162-164,210,
242
Heisenberg,W., 23, 46, 242
Henkel,R. E., 83,199,242
Heron,D., 149-151,242,245
Hick, W. E., 203, 242
Hilbert, D., 66, 242
Hilgard, E., 206, 242
Hogben,L., 82, 242
Holschuh,N.,19,183,243
Holzinger, K. J., 160, 170,243
Hotelling, H., 155,213,243
Hume,D., 30,243
Hunter,J. E., 83, 243
Huxley, J., 18,129,243
Huygens,C., 19, 51, 59, 63, 243
J
Jacobs,J., 37, 243
Jeffreys, H. 103,243
Jensen,A. R., 163,243
Jevons,W., 54,243
Jones,L. V., 203, 243
Joynson,R. B., 163, 243
K
Kelley,T.L., 164-165,243
Kempthorne,O., 184, 189, 204, 226, 243
Kendall, M. G., 6,37-38,64, 81-82, 86-
88, 110, 138, 185, 195, 213,216,243
Kenna,J. C., 5, 244
Keuls,M, 197, 244
Kevles,D., 15, 244
Kiaer, A. N., 96-97, 244
Kimmel, H. D., 203, 244
Kneale,W., 60, 244
Koch, S., 234, 244
Kolmogorov, A. N., 66-67, 244
Koren, J., 48, 244
Koren, R. S.,228, 244
Kruskal, W., 96-97, 244
L
256 AUTHOR INDEX
Lancaster, H. O., 107, 244
Laplace, P. S. de,10,22,61,64-65,79,
92-93,95,126, 244
Lawley,D.N., 154, 156,244
Le Cam,L., 222, 244
Lee, A., 118,189,250
Legendre,A.M.,91, 244
Lehmann,E. L., 219, 221-222,244
Lindquist, E. F.,210-211,213, 244
Lord, F. M, 44, 244
Lovie, A. D., 34,161-165,194,198,208-
210,212, 245
Lovie, P., 161-163,165,245
Lupton,S.,115,245
Lush, J. L., 194, 198,245
M
Macdonell,W. R., 117,155,245
MacKenzie,D. A., 4, 15, 105,130-131,
144,150,152,245
MacKenzie,W. A. 121, 187,240
Magnello,M. E., 15,245
Marks, M. R., 203,245
Maunder,W. F., 96
Maxwell, A. E., 154, 156,244
McMullen, L., 115,245
McNemar,Q., 104, 213,245
Mercer,W. B., 101,245
Merriman,M.,91,115,245
Michea,R., 58, 245
Mill, J. S.,173-174, 245
Miller, A. G., 19,46,245
Miller, G. A., 19,234,245
Morrison, D. E., 83,199, 242
Mosteller,F., 96-97,206, 244
MouLM., 161, 248
N
Nagel,E., 61-63,246
Newman,D., 196,246
Newman,J. R., 5,59-60,184,246
Neyman,J., 85,98-100,198,
201-203,219-225, 227, 230,246
Norton, B., 112,246
O
Ore, O., 8,57-58,246
Osgood,C. E.,205,246
P
Parten,M., 94, 246
Pearl,R., 111,246
Pearson, E. S.,16,64,103,108,112-113,
115-118,202, 217-218, 220-224, 227,
230, 238, 246, 247
Pearson, K., 4,16, 18, 44, 49, 51, 53,64,
75-76, 102, 106,108-114,117-118,
127,137,141,143-151,153,155,157-
158, 161,182,189, 200, 206, 217, 222,
232, 248
Peirce,B., 62,107,248
Peters,C. C.,194,248
Petrinovich,L. F., 196,248
Petty,W., 48-50,248
Phillips, L. D., 79,248
Picard,R., 103,248
Playfair, W., 53-54, 248
Poisson,S.-D., 58-59,70-72,248
Popper,K. R., 30, 32, 248
Porter,!.,15,248
Q
Quetelet,L. A. 46, 55, 91, 249
R
RAND Corporation,88, 249
Reichenbach, H., 32,249
Reid, C., 217, 219, 221-222, 225-227,
249
Reitz, W., 207,249
Rhodes,E.G., 217,249
Roberts,H. V., 138,252
Robinson,C., 94,249
Rosenthal,R., 46,249
Rosnow,R. L., 46,249
Rowe,F. B., 45, 249
Rozeboom,W. W., 202,249
Rucci, A. J., 207-208, 211-213,249
Russell,B., 23,28,154,249
Russell, J., 179,249
Rutherford,E., 71,249
Ryan,T. A., 196-197,249
S
Savage,L. J., 102,231,249
Scheff6,H., 197,213,249
Schlosberg, H., 206, 252
Seal,H. L., 181-182,249
Seng,Y. P., 96-97,250
Sheynin,O. B., 72, 107,250
Sidman,M, 207,250
Simpson,T., 53, 89,125,250
Skinner,B. F., 24, 250
Smith, D. E., 59, 250
Smith, K., 112,250
Snedecor, G. W., 123,193,194,213,234,
250
Soper,H.E., 110, 118,250
Spearman, C, 5,156-164,166,250
St. Gyres, Viscount, 58, 250
Stanley,!. C., 206,212,237
Stephan,F. F.,93-94,250
Stephen,L., 130,250
Stephenson, W., 167,250
Stevens,S. S., 36, 40-44,206,250
Stigler, S. M., 86, 87, 257
Stratton,F. J. M., 200,252
Struik, D. J.,9,257
"Student", 17, 102-103, 115-118,121,
122, 196,200,257
T
Tedin,0.,103,257
Thomson,G. S., 5,156, 165-168,257
Thomson,W. (Lord Kelvin), 36,257
Thorndike,E. L., 119, 172,257
Thurstone,L. L., 5,34,163,166-169,257
Tippett, L. H. C., 88,196,257
Todhunter,L, 64-65,257
AUTHOR INDEX 257
Traxler, R. H., 226,257
Tukey, J. W., 196-197,257
Turner,P.M., 130,257
Tweney,R. D., 207-208, 211-213,249
U
Underwood,B. J.,206,257
V
Venn, J., 12, 62, 81, 89-90,257
von Mises,R., 62,81-82,107,257,252
W
Wald, A., 230,252
Walker, E. L., 24,252
Walker, H. M., 12,147,252
Wallis, W. A., 138,252
Watson.J. B.,22,207,252
Weaver,W., 58, 252
Welch, B.L., 117,202,252
Weldon, W. F. R., 14,141-143,146,252
Westergaard, H., 7, 93, 252
Whitney, C. A., 88, 252
Wilson, E. B., 161-162, 165, 252
Wishart,J., 210,252
Wolfle, D., 34, 160-161,163,252
Wood, T. B. 200,252
Woodworth,R. S.,172, 206,252
Y
Yates,F., 84,104,189,194,216,226,252
Young, A. W., 118,250
Yule, G. U., 5,37-38,88, 108,110, 138,
146,148-151,213,253
Z
Zubin,J.,209,210,247
Subject Index
A
Actuarial science,51
Analysis of variance, 5, 35, 102-104,
177-181, 187-195
Anthropometric laboratory, 132
Arithmetic mean,18, 89-91
Average,18, 88
B
Bayes'theorem,77-80,83, 231
Behaviorism,22
Bills of mortality, 8,49-52
Binomial distribution,10-11,64-65,
68-70
Biometrics,4, 14-18
C
Causality,seealso Determinism, 30
Censuses, 47, 93
Central limit theorem,95, 107, 124-126
Centroid method, 168
Chi square,
controversy,110-114
distribution, 105-114
test,109
Communalities,168
Confidenceintervals,100, 199-203
Control, 20, 31,38, 171-178
Mill's methods of enquiry and, 173-
176
statistical,176-181
Correlation,2, 35, 129, 138-146,174
coefficient of, 5,116,141-146,158
controversies,146-153
Correlation(cont.),
distributionof coefficient, 118
intraclass,5, 119,123,189-193
nominal variables,148-151
probableerror, 117
D
Data organization, 47-55
Degreesof freedom, 110-114
De Mere's problems, 9, 58-59
Determinism,21-26, 118
in psychology,33-35
E
Error estimation,10, 63, 90,101-103
Estimation,theoryof, 98-101
Eugenics,4, 15,130-131, 152-153,187
Evolution, 1-3, 17, 129,143
natural selection,1, 129, 130,143
Expected mean squares, 213-215
Experimentaltexts, 205-207
Experiments,
designof, 31,101,171-185
tea-tasting,182-184
F
F ratio, 123,193,233
distribution, 121-123
Factor analysis, 154-170
Factor rotation,169
Freewill, 24
Frequencydistributions,68
258
G
Gaming,8, 56-59
Generallinear model,27, 93,181-183
Genetics,15,17-18
Gower Street, 151,225
Graphical methods,53-55
H
Heredity, 131
Hypotheses,
alternative, 217-224,229-233
null, 57,184,217-219
testing, 7, 222-224, 228-233
I
Induction, 27-30, 175,230
Inference,29, 31-33
early, 47-50
Fisherian,81-83
practical, 77-84
statistical,7, 31, 33, 217
Insuranceandannuities,50-53,63
Intelligence,158-161
Inventories,6, 47-^48
L
Law of error, 10, 12,89
Law of large numbers, 59-60
Least squares, 89, 91-93,148, 182
Likelihood criterion,82, 218-220
Logical positivism,45-46
M
Mean,seearithmetic mean
Measurement,2,36-38,40
error, 38, 44
definition, 40
interval andratio scales,42
nominal scale,41, 148-150
ordinal scale,41
Multiple comparisons,195-199
SUBJECT INDEX 259
N
Naturalism,130
Neyman-Pearson theory, 217-224
controversy with Fisher, 228-233
Normal Distribution, 10, 12-14, 64-66,
72-76, 107-108, 125, 128
O
One-tail andtwo-tail tests,203-204
Operationalism,45-46
p
Parameters, 7, 95
Pascal'striangle,9, 59,69
Personal equation, 45
Poisson distribution, 70-72
Political arithmetic,48-50
Polling, 94-95
Population,7, 82, 85, 87, 95-101,105
Power,222-224,232-233
Prediction,30
Primarymental abilities,169
Principal component analysis, 155
Probabilisticanddeterministic models, 26
Probability, 7-9, 32, 56
anderror estimation, 63
andweight,32
beginningsof theconcept,56-60
distributions,68-76
fiducial, 81-83,99,202,231
inverse,65, 77-80, 119, 221
mathematical theory,66
meaningof, 60-63
subjective,61,221
Probableerror, 127,139,200
R
Randomness, 85-88,172
randomnumbers,85-88
random sampling, 87
randomization,94, 101-104, 177-179,
188
260 SUBJECT INDEX
Regression,3, 17,129-138,135
Representative sampling, 93,96
Rothamsted Experimental Station, 6,101,
120,176,188
Royal Society,8, 17,49-51
s
Sampling,
distributions,16,105-126
in practice,95-98
random,98
representationandbias,93-95
Science,21-26,36-37,45
Significance, 17,19,83,183-184,199-
200,218
Standard deviation, 73-75,116, 127
Standard error, 33,141
Standard scores, 4, 75,127-129,137
Statistics,
arguments,224-228
criticism of, 18-20,233-235
definition of, 6
Fisher v. Neymanand Pearson,228-
233
Fisherian,186-187,224
in journalsandpapers,207-212
Statistics (cont.),
in practice,233
in psychology,33-35
textbooks,212-213
parapsychology, 209
vital, 8, 49-51
T
/ratio, 196,218
distribution, 114-121
Tetrad difference,160
Trial of thePyx, 86
Type I error,95, 196-198,201, 223,231
Type II error, 223, 229,231
U
Uncertainty principle,23,46
V
Variation, 1,4,132,187
Z
2 scores,seestandard scores