Be OK

The SAGE
Handbook of
Spatial Analysis
The SAGE
Handbook of
Spatial Analysis
Edited by
A. Stewart Fotheringham and
Peter A. Rogerson
Los Angeles London New Delhi Singapore

Editorial arrangement and Chapter 1 Stewart Fotheringham, Peter A. Rogerson 2009
Chapter 2 Robert Haining 2009 Chapter 14 Luc Anselin 2009

Chapter 3 David Martin 2009 Chapter 15 D. Ballas, G. P. Clarke 2009
Chapter 4 Urka Demar 2009 Chapter 16 Lance Waller 2009
Chapter 5 Shashi Shekhar, Vijay Gandhi, Pusheng Chapter 17 Andrew B. Lawson, Sudipto
Zhang, Ranga Raju Vatsavai Banerjee 2009
Chapter 6 Marie-Jos Fortin, Mark R.T. Dale 2009 Chapter 18 Peter A. Rogerson 2009
Chapter 7 David Wong 2009 Chapter 19 Geoffrey M. Jacquez,
Chapter 8 Robin Dubin 2009 Jaymie R. Meliker 2009
Chapter 9 Peter M. Atkinson, Christopher Chapter 20 Manfred M. Fischer 2009
D. Lloyd 2009 Chapter 21 Harvey J. Miller 2009
Chapter 10 Eric Delmelle 2009 Chapter 22 Morton E. OKelly 2009
Chapter 11 Chris Brunsdon 2009 Chapter 23 Atsuyuki Okabe, Toshiaki Satoh 2009
Chapter 12 Vincent B. Robinson 2009 Chapter 24 Michael F. Goodchild 2009
Chapter 13 A. Stewart Fotheringham 2009 Chapter 25 Reginald G. Golledge 2009
First published 2009
Apart from any fair dealing for the purposes of research or

private study, or criticism or review, as permitted under the
Copyright, Designs and Patents Act, 1988, this publication may
be reproduced, stored or transmitted in any form, or by any
means, only with the prior permission in writing of the publishers,
or in the case of reprographic reproduction, in accordance with the
terms of licences issued by the Copyright Licensing Agency.
Enquiries concerning reproduction outside those terms should be
sent to the publishers.
SAGE Publications Ltd

1 Olivers Yard
55 City Road
London EC1Y 1SP
SAGE Publications Inc.

2455 Teller Road
Thousand Oaks, California 91320
SAGE Publications India Pvt Ltd

B 1/I 1 Mohan Cooperative Industrial Area
Mathura Road, Post Bag 7
New Delhi 110 044
SAGE Publications Asia-Pacific Pte Ltd

33 Pekin Street #02-01
Far East Square
Singapore 048763
Library of Congress Control Number: 2008921399
British Library Cataloguing in Publication data
A catalogue record for this book is available from the British Library
ISBN 978-1-4129-1082-8
Typeset by CEPHA Imaging Pvt. Ltd., Bangalore, India

Printed in Great Britain by The Cromwell Press Ltd, Trowbridge, Wiltshire
Printed on paper from sustainable resources
Acknowledgement: Research presented in Chapter 13 was supported by a grant to the National Centre for
Geocomputation by Science Foundation ireland (03/RP1/1382) and by a Strategic Research Cluster grant
(07/SRC1/1168) from Science Foundation ireland under the National Development Plan. The author gratefully
acknowledges this support.
Contents
Notes on Contributors vii
1. Introduction 1
A. Stewart Fotheringham and Peter A. Rogerson
2. The Special Nature of Spatial Data 5

Robert Haining
3. The Role of GIS 25

David Martin
4. Geovisualization and Geovisual Analytics 41

Urka Demar
5. Availability of Spatial Data Mining Techniques 63

Shashi Shekhar, Vijay Gandhi, Pusheng Zhang and Ranga Raju Vatsavai
6. Spatial Autocorrelation 89
Marie-Jose Fortin and Mark R.T. Dale
7. The Modifiable Areal Unit Problem (MAUP) 105

David Wong
8. Spatial Weights 125

Robin Dubin
9. Geostatistics and Spatial Interpolation 159

Peter M. Atkinson and Christopher D. Lloyd
10. Spatial Sampling 183

Eric Delmelle
11. Statistical Inference for Geographical Processes 207

Chris Brunsdon
vi CONTENTS
12. Fuzzy Sets in Spatial Analysis 225

Vincent B. Robinson
13. Geographically Weighted Regression 243

A. Stewart Fotheringham
14. Spatial Regression 255

Luc Anselin
15. Spatial Microsimulation 277

D. Ballas and G. P. Clarke
16. Detection of Clustering in Spatial Data 299

Lance A. Waller
17. Bayesian Spatial Analysis 321

Andrew B. Lawson and Sudipto Banerjee
18. Monitoring Changes in Spatial Patterns 343

Peter A. Rogerson
19. Case-Control Clustering for Mobile Populations 355

Geoffrey M. Jacquez and Jaymie R. Meliker
20. Neural Networks for Spatial Data Analysis 375

Manfred M. Fischer
21. Geocomputation 397

Harvey J. Miller
22. Applied Retail Location Models Using Spatial Interaction Tools 419
Morton E. OKelly
23. Spatial Analysis on a Network 443

Atsuyuki Okabe and Toshiaki Satoh
24. Challenges in Spatial Analysis 465

Michael F. Goodchild
25. The Future for Spatial Analysis 481

Reginald G. Golledge
Index 487
Notes on Contributors
Andrew B. Lawson is Professor in the Department of Biostatistics, Bioinformatics and

Epidemiology at the Medical University of South Carolina. His research interests focus
on the development of statistical methods in spatial and environmental epidemiology,
disease surveillance, health data mining and spacetime problems, directional statistics and
environmetrics. Professor Lawson is a WHO advisor in Disease Mapping and Risk Assessment.
He has a wide range of publications in the area of epidemiology, including 8 books and over
20 book chapters, and he gave numerous invited courses on spatial epidemiology in the UK,
Sweden, Belgium, New Zealand, Australia, Canada and the USA.
Atsuyuki Okabe received his PhD from the University of Pennsylvania and the degree
of Doctor of Engineering from the University of Tokyo. He is currently Professor of the
Department of Urban Engineering at the University of Tokyo where he served as Director of
the Center for Spatial Information Science (19982005). Professor Okabes research interests
include GIS, spatial analysis, and spatial optimization, and he has published many papers
in journals, books and conference proceedings on these topics. He is a co-author of Spatial
Tessellations: Concepts and Applications of Voronoi Diagrams (John Wiley, 2000), the editor
of GIS-based Studies in the Humanities and Social Sciences (Taylor & Francis, 2005). He
serves on the Editorial Boards of seven international journals including International Journal
of Geographical Information Science.
Chris Brunsdon is Professor of geographic information at the Department of Geography,

Leicester University. His research interests include the methodologies underlying spatial
statistical analysis and GIS. In particular he is interested in the analysis of crime patterns,
house prices and health-related data. Professor Brunsdon has played a role in the development
of Geographically Weighted Regression, a technique of analysis that models geographical
variations in the relationships between variables. He is member of Editorial Board of Computers
Environment and Urban Systems.
Christopher D. Lloyd is Lecturer in the School of Geography, Archaeology and Palaeoecology

at the Queens University, Belfast. His research interests focus on the analysis of spatial data
(in both social and environmental contexts), geostatistics, spatial analysis, remote sensing and
archaeology. His research key concern has been with the use and development of local models
for spatial analysis. Dr Lloyd is author of Local Models for Spatial Analysis (Boca Raton: CRC
viii NOTES ON CONTRIBUTORS
Press, 2006) and co-author of The Atlas of the Island of Ireland: Mapping Social and Economic
Change (Maynooth: AIRO/ICLRD, 2008).
David Martin is Professor in the School of Geography, University of Southampton. He is

director of the ESRC Census Programme and a co-director of the ESRC National Centre for
Research Methods. His research interests include socioeconomic application of GIS, census
population modeling, census geography design and geography of health. He is co-editor of
GIS and Geocomputation (Taylor and Francis, 2000), The Census Data System (Wiley, 2002)
and Methods in Human Geography: a Guide for Students Doing a Research Project, Second
Edition (Pearson, 2005). Professor Martin is a member of the editorial advisory boards of
Transactions in GIS, Transactions of the Institute of British Geographers and associate editor
of the Journal of the Royal Statistical Society Series A: Statistics in Society.
David W. Wong is Professor in the Department of Geography and Geoinformation Science, at

George Mason University. His research interests fall into three main categories: investigating
the modifiable areal unit problem (MAUP) effects; spatial dimensions of segregation and
ethnic diversity; and GIS applications of spatial analytical techniques. He has co-authored two
books: Statistical Analysis with ArcView (Wiley & Sons, 2001) and Statistical Analysis and
Modeling of Geographic Information (Wiley & Sons, 2005). He has served as reviewer for
various journals, funding agencies and organizations. He is on the editorial board of several
journals: Computers, Environment and Urban Systems, Geographical Analysis, and Journal of
Geographic Information Sciences.
Dimitris Ballas is a Senior Lecturer in the Department of Geography at the University of

Sheffield. He received his PhD in Geography from the University of Leeds in 2001. His research
interests include economic geography; spatial dimensions of socio-economic polarisation and
income and wealth inequalities; socio-economic applications of GIS; geographical impact of
area-based and national social policies; basic income policies; and social justice; geography of
happiness and well-being. He is the lead author of the book Geography matters: simulating
the impacts of national social policies and a co-author of the books Post-Suburban Europe:
Planning and Politics at the Margins of Europes Capital Cities and Poverty, wealth and
place in Britain, 1968 to 2005. He has fifteen papers in peer-reviewed international academic
journals, eight peer-reviewed edited book chapters and over fifty conference papers.
Eric Delmelle is Assistant Professor in the Geography and Earth Sciences Department at the
University of North Carolina (Charlotte) where he teaches GIS, geovisualization and spatial
optimization. He received his PhD in geography from the State University of New York at
Buffalo. His research interests focus on spatial sampling optimization and geostatistics, non-
linear allocation problems, geovisualization and GIS.
Geoffrey M. Jacquez is President of BioMedware Incorporated, and Adjunct Associate

Professor of Environmental Health Sciences at the University of Michigan. He received
his PhD from the Department of Ecology and Evolution at State University of New York
at Stony Brook. Dr. Jacquez develops and applies spatial statistics to elucidate underlying
spacetime processes in the environmental, biological and health sciences. His research
includes applications in disease clustering, epidemiology, environmental monitoring and
NOTES ON CONTRIBUTORS ix
population genetics. Dr. Jacquez is currently Principal Investigator on three grants from the
National Cancer Institute to develop spatial statistical methods and software. He also publishes
extensively in the fields of spatial statistics, GIS and epidemiology.
Graham Clarke is Professor in the School of Geography, Faculty of Environment at the

University of Leeds. His research interests include GIS, urban services, retail and business
geography, urban modelling and geography of crime, income and welfare. Dr Clarke is
co-author of Geography Matters (Joseph Rowntree Foundation, 2005), and Retail Geography
and Intelligent Network Planning (Wiley, Chichester, 2002). He is committee member of
The Academy of Learned Societies for the Social Sciences, Executive Director of Regional
Science Association International and serves on the Editorial Board of European Journal of
Geography.
Harvey J. Miller is Professor and Chair of the Department of Geography at the University of
Utah. His research and teaching interests include GIS, spatial analysis and geocomputational
techniques applied to understanding how transportation and communication technologies shape
individual lives and urban morphology. Since 1989, he has published approximately 50 papers
in journals, books and conference proceedings on these topics. He is co-author of Geographic
Information Systems for Transportation: Principles and Applications (Oxford University Press,
2001) and co-editor of Geographic Data Mining and Knowledge Discovery (Taylor and Francis,
2001) and Societies and Cities in the Age of Instant Access (Springer, 2007). Harvey serves on
the editorial boards of several scientific journals and in 20052011 he is serving as co-Chair
of the Transportation Research Board, Committee on Spatial Data and Information Science of
U.S. National Academies.
Jaymie R. Meliker is Assistant Professor of Preventive Medicine in the Medical Center at State
University of New York at Stony Brook. He received his PhD in 2006 from the Department of
Environmental Health Sciences, University of Michigan School of Public Health. Dr. Melikers
research contributes to the fields of exposure science, GIScience, health geography, and envi-
ronmental epidemiology by developing methodologies for integrating sources of spatial, tempo-
ral, and spatio-temporal variability in environmental health applications. Prior to joining Stony
Brook, he worked as a Research Scientist at BioMedware, Inc., pioneering the development
of spatio-temporal software and statistical algorithms for addressing public health concerns.
Luc Anselin is Faculty Excellence Professor and Director of the Spatial Analysis Laboratory
in the Department of Geography at the University of Illinois, Urbana-Champaign. He is also a
Senior Research Associate at the National Center for Supercomputing Applications at UIUC.
Dr. Anselins research deals with various aspects of spatial data analysis and geographic
information science, ranging from exploratory spatial data analysis to geocomputation, spatial
statistics and spatial econometrics. He has published widely on topics dealing with spatial and
regional analysis, including a much cited book on Spatial Econometrics (Kluwer, 1988); over
a hundred refereed journal articles and book chapters, as well as a large number of reports and
technical publications.
Lance A. Waller is Professor in the Department of Biostatistics at the Rollins School of

Public Health, Emory University. His research involves the development of statistical methods
x NOTES ON CONTRIBUTORS
to analyze spatial and spatio-temporal patterns. His recent areas of interest include spatial
point process methods in alcohol epidemiology and conservation biology (sea turtle nesting
patterns), and hierarchical models in disease ecology. Dr Waller is Chair of American Statistical
Association Section on Statistics and the Environment (2008). He is also President-Elect
of International Biometric Society Eastern North American Region (2008), and serves as
Associate Editor of Biometrics, Bayesian Analysis.
Manfred M. Fischer is Professor of Economic Geography and Director of Institute for

Economic Geography and GIScience at Vienna University of Economics and Business. His
research spans a broad array of fields including regional and urban economics, housing and
labor market research, transportation systems analysis, innovation economics, spatial behavior
and decision processes, spatial analysis and spatial statistics, and GIS. He is one of the leading
scholars in the field of GeoComputation. Based on the expertise and the scientific impact
Dr. Fischer has gained a high reputation both nationally and internationally. In 1995 he
was elected as a member of the International Eurasian Academy of Sciences, in 1996 as a
corresponding member of the Austrian Academy of Sciences and in 1999 as a foreign member
of the Royal Netherlands Academy of Arts and Sciences. Dr. Fischer has published over 250
scientific publications, including 28 monographs and edited books.
Marie-Jose Fortin is Professor in the Department of Ecology and Evolutionary Biology

and Head of Landscape Ecology Laboratory at the University of Toronto, Canada. She
has four main research areas: spatial ecology, forest ecology, landscape ecology and spatial
statistics. She is co-author of Spatial Analysis: A Guide for Ecologists (Cambridge University
Press, 2005), and has published over 150 research papers in peer-reviewed journals, book
chapters, conferences, and invited lectures. Professor Fortin is assistant editor for Theoretical
Ecology, subject editor for Ecology and Ecology Monographs, and Editorial Board member of
Ecosystems.
Mark Dale is Professor in the Department of Biological Science and Dean in the Faculty of
Graduate Studies and Research at the University of Alberta, Canada. He received his PhD from
Dalhousie University, Canada. His current research interests involve methods for detecting and
analyzing the spatial relationships of plants in populations and communities and spatial analysis
and spatial statistics with applications in ecological systems. Professor Dale is co-author of
Spatial Analysis: A Guide for Ecologists. (Cambridge University Press, 2005) and he served
as an associate editor for Canadian Journal of Botany.
Michael F. Goodchild is Professor of Geography at the University of California, Santa Barbara.

He also serves as Chair of the Executive Committee for the National Center for Geographic
Information and Analysis (NCGIA), and Director of NCGIAs Center for Spatially Integrated
Social Science. His current research interests center on GIS, spatial analysis, the future of
the library, and uncertainty in geographic data. His major publications include Geographical
Information Systems: Principles and Applications (1991); Environmental Modeling with GIS
(1993); Accuracy of Spatial Databases (1989); GIS and Environmental Modeling: Progress and
Research Issues (1996); Scale in Remote Sensing and GIS (1997); Interoperating Geographic
Information Systems (1999); and Geographical Information Systems: Principles, Techniques,
Management and Applications (1999); in addition he is author of some 300 scientific papers.
NOTES ON CONTRIBUTORS xi
He is the recipient of numerous awards including the Educator of the Year Award from
the University Consortium for Geographic Information Science, a Lifetime Achievement
Award from Environmental Systems Research Institute, Inc., the American Society of
Photogrammetry and Remote Sensing Intergraph Award and the Horwood Critique Prize of
the Urban and Regional Information Systems Association.
Morton E. OKelly is Professor and Chair of the Department of Geography at the Ohio
State University. His research interests include location theory, transportation, network design
and optimization, spatial analysis and GIS. Dr. OKelly co-authored two books: Geography of
Transportation, 2nd edition (Prentice Hall, 1996) and Spatial Interaction Models: Formulations
and Applications (Kluwer Academic: Amsterdam, 1989), as well as over 75 research papers
in peer-reviewed journals, book chapters and conference proceedings.
Peter M. Atkinson is Professor and Head of School of Geography and Director of the
University Centre for Geographical Health Research at the University of Southampton.
His research interests focus on geostatistics, spatial statistics, remote sensing, and spatially
distributed dynamic modelling applications for environmental problems and hazards.
He is co-editor of International Journal of Remote Sensing Letters and associate editor
of International Journal of Applied Earth Observation and Geoinformation. Professor
Atkinson is also Fellow of the Royal Geographical Society and Fellow of the Royal
Statistical Society.
Peter A. Rogerson is Professor of Geography and Biostatistics at the University at Buffalo.

His research interests are in the area of demography and population change, epidemiology,
spatial statistics and spatial analysis. His current work is focused upon the development
of new methods for the quick detection of newly emergent clusters in geographic data.
Professor Rogerson has authored and co-authored four books with the most recent Statistical
Detection and Monitoring of Geographic Clusters (Chapman and Hall/CRC, 2008), and over
85 research papers in peer-reviewed journals. He also developed GeoSurveillance 1.1: Software
for Monitoring Change in Geographic Patterns.
Pusheng Zhang is currently with the Microsoft Virtual Earth team. He received his PhD
in Computer Science from the University of Minnesota. His research interests include local
search engine design, spatial and temporal databases, data mining and geographic information
retrieval. Dr Zhang is a member of the IEEE Computer Society.
Ranga Raju Vatsavai received his PhD in Computer Science from the University of Minnesota
where he also worked as Research Fellow in Remote Sensing Laboratory. Currently Dr Vatsavai
is employed at the Oak Ridge National Laboratory. His broad research interests are centered
on spatial and spatio-temporal databases and data mining.
Reginald G. Golledge is a Professor of Geography at the University of California, Santa

Barbara. His research interests include behavioral geography, spatial cognition, cognitive
mapping, individual decision-making, household activity patterns and the acquisition and use of
spatial knowledge across the life-span. Professor Golledge has written or edited 14 books, more
than 50 chapters in books, and over 120 papers in academic journals, and 80 miscellaneous
xii NOTES ON CONTRIBUTORS
publications including technical reports, book reviews, and published research notes. He has
presented more than 100 papers at local, regional, national, and international conferences
in geography, regional science, planning, psychology, and statistics. Professor Golledge
received an Association of American Geographers (AAG) Honors Award in 1981. He is an
Honorary Life-Time Member of the Institutes of Australian Geographers and a Fellow of the
American Association for the Advancement of Science. He received an International Geography
Gold Medal Award from the LAG in 1999. In 1998 he was elected Vice President of the AAG;
in 19992000 he was elected AAG President.
Robin A. Dubin is Professor of Economics in the Weatherhead school of Management at Case

Western Reserve University. Her research interests involve urban and regional economics, real
estate, and spatial econometrics. Professor Dubin has published numerous research papers in
peer-reviewed journals, book chapters, conferences, and invited lectures.
Robert Haining is Professor of Human Geography at Cambridge University. Between

2002 and 2007 he was Head of Geography Department at Cambridge University. Professor
Haining has published extensively in the field of spatial data analysis, with particular
reference to applications in the areas of economic geography, medical geography and
the geography of crime. His interests also include the integration of spatial data analysis
with GIS and he developed SAGE, a software system for analysing area health data. His
previous book, Spatial Data Analysis in the Social and Environmental Sciences (Cambridge
University Press, 1993) was well received and cited internationally. Professor Haining is
a member of the editorial board of Journal of Geographical Systems and Computational
Statistics.
Shashi Shekhar is McKnight Distinguished University Professor in the University of

Minnesota, Minneapolis, MN, USA. His research interests include spatial databases, spatial
data mining, GIS, and intelligent transportation systems. He is a co-author of a textbook on
Spatial Databases (Prentice Hall, 2003), co-edited the Encyclopedia of GIS, (Springer, 2008)
and has published over 200 research papers in peer-reviewed journals, books, conferences,
and workshops. He is co-Editor-in-Chief of Geo-Informatica: An International Journal on
Advance in Computer Science for GIS and has served on the editorial boards of Transactions
on Knowledge and Data Engineering.
A. Stewart Fotheringham is Science Foundation Ireland Research Professor and Director of

the National Centre for Geocomputation at the National University of Ireland in Maynooth.
His research interests include: the integration of spatial analysis and GIS; spatial statistics,
exploratory spatial data analysis and spatial modelling. He is one of the originators
of Geographically Weighted Regression. Professor Fotheringham is a founding editor of
Transactions in GIS and is on a number of editorial boards, including Geographical Analysis,
Annals of the Association of American Geographers and Geographical Systems. He has
published six books, over 20 book chapters and over 100 journal articles.
Sudipto Banerjee is Associate Professor in the Division of Biostatistics at the University

of Minnesota. He received his PhD in Statistics from the University of Connecticut, Storrs.
Dr Banerjees research focuses upon the analysis of data arising from spatial processes,
NOTES ON CONTRIBUTORS xiii
Bayesian interpolation and prediction (methods and smoothness of spatial processes. He is

also interested in the application of Bayesian methodology in biostatistics. Dr. Banerjee has
co-authored a textbook on spatial statistics, Hierarchical Modeling and Analysis for Spatial
Data (CRC Press/Chapman and Hall, 2004), was a field editor for the Encyclopedia of GIS
(Springer, 2008) and serves as Associate Editor for Journal of the Royal Statistical Society
Series C (Applied Statistics), Statistics in Medicine and Bayesian Analysis.
Urka Demar is a lecturer at the National Center for Geocomputation at the National
University of Ireland, Maynooth. She has a PhD in Geoinformatics from the Royal Institute
of Technology, Stockholm, Sweden. Her research interests include Geovisual Analytics and
Geovisualisation. She is combining computational and statistical methods with geovisualisation
for knowledge discovery from spatial data. Additionally, she is interested in spatial analysis
and mathematical modelling of spatial phenomena. She has an established cooperation with
researchers at the Helsinki University of Technology with whom she is working on spatial
analysis of networks for crisis management.
Vijay Gandhi is Masters Student in Computer Science at the University of Minnesota, Twin
Cities. After graduating from Computer Science and Engineering at Madras University he
worked in the field of business intelligence and data warehousing. Currently he is involved in
research on spatial databases and spatial data mining.
Vincent B. Robinson is Associate Professor in the Department of Geography and Planning at

the University of Toronto at Mississauga. His research interests involve intelligent geographic
information systems, geographical modelling, and remote sensing, land use/cover change,
biogeography and landscape ecology. Professor Robinson published extensively on issues and
challenges of incorporating fuzzy sets technique in ecological modeling. He is also Executive
Committee Member of Project Open Source, Open Access at the University of Toronto.
Toshiaki Satoh is currently a researcher in Research & Development Center of PASCO

Corporation, a surveying and GIS consulting company. He received a Bachelors degree from
Tohoku University in 1992, a Masters degree from Tokyo Institute of Technology in 1994 and
Ph.D. of Eng. degree from the University of Tokyo in 2007, respectively. His main interests
of research are network spatial analysis and computer visualization.
1
Introduction
and Peter A. Rogerson
1.1. WHAT IS SPATIAL ANALYSIS? different types of sensors, location is also

being captured by an increasing variety of
Spatial data contain locational information as technologies such as GPS, WiMAX, LiDAR,
well as attribute information. That is, they and radio frequency identity (RFID) tags as
are data for which some attribute is recorded well as by more traditional means such as
at different locations and these locations are surveys and censuses. Some of the resulting
coded as part of the data. Spatial analysis is data sets can be extremely large. Satellites,
a general term to describe a technique that for example, regularly record terrabytes of
uses this locational information in order to spatial data; LiDAR scanners can capture
better understand the processes generating millions of geocoded data points in minutes;
the observed attribute values. and GPS-encoded spatial video generally
Spatial analysis is important because it produces 24 frames per second each of
is increasingly recognized that most data which may have around a million geocoded
are spatial. Examples of common types pixels. The world is rapidly becoming
of spatial data include census data, traffic one large spatial sensor with RFID tags,
counts, patient records, the incidence of CCTV cameras and GPS linked devices
a disease, the location of facilities and recording the location of objects, animals and
services, the addresses of school pupils, people.
customer databases, and the distributions of Spatial data and the processes gener-
animal, insect or plant species. Along with ating such data have several properties
various attributes collected by hand or by that distinguish them from their aspatial
2 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
equivalents. Firstly, the data are typically 2 Those techniques collectively known as
not independent of each other. Attribute exploratory data analysis which consist of
values in nearby places tend to be more methods to explore data (and also model
similar than are attribute values drawn outputs) in order to suggest hypotheses or to
from locations far away from each other. examine the presence of unusual values in
the data set. Often, exploratory data analysis
This is a useful property when it comes
involves the visual display of spatial data
to predicting unknown values because we
generally linked to a map.
can use the information that an unknown
attribute value is likely to be similar to 3 Those techniques that examine the role of
neighbouring, known values. The subfield randomness in generating observed spatial
of geostatistics has grown up based on this patterns of data and testing hypotheses about
premise. However, if data values do exhibit such patterns. These include the vast majority
spatial autocorrelation, this causes problems of statistical models used to infer the process
for statistical techniques that assume data are or processes generating the data and also to
drawn from independent random samples. provide quantitative information on the likelihood
Special statistical methods, such as spatial that our inferences are incorrect.
regression models, have been developed
4 Those techniques that involve the mathematical
to overcome this problem. Equally, it is modelling and prediction of spatial processes.
often hard to defend the assumption of
stationarity in spatial processes. That is, it
is often assumed that the process generating This book will cover examples of all four
the observed data is the same everywhere. types of spatial analysis.
Spatial non-stationarity exists where the
process varies across space. Again, special
statistics, such as Geographically Weighted
Regression, have been developed to handle 1.3. SPATIAL ANALYSIS IN
this problem. PERSPECTIVE
It is difficult to say exactly when spatial

analysis began in earnest but the beginnings
are generally cited in the late 1950s and
early 1960s, although much earlier examples
1.2. TYPES OF SPATIAL ANALYSIS
of individual pioneering work can be found
While there are many different techniques of (e.g, Spottiswoode, 1861). Certainly, the
spatial analysis, they can be categorized into decades of the 1960s and 1970s were
four main types: periods when quantitative methodologies
diffused rapidly within disciplines such as
geography and regional science and when
much pioneering and fundamental research
1 Those spatial analytical techniques aimed at
was carried out. There then followed a
reducing large data sets to a smaller amount of
more meaningful information. Summary statis- period of relative decline for various reasons
tics, various means of visualizing data and a as outlined by Fotheringham (2006) when
wide body of data reduction techniques are often many of the newer paradigms in human
needed to make sense of what can be extremely geography were starkly anti-quantitative.
large, multidimensional data sets. Perhaps also many of the early examples
INTRODUCTION 3
of spatial analysis were overly concerned the reasons why the analysis of spatial data
with form rather than with process and needs separate treatment;
were rightly criticized for this focus. In
addition, it is possible that expectations the main areas of spatial analysis;
for quantitative methods may have initially
the key debates within spatial analysis;
been too high. For example, many believed
that spatial modelling, when coupled with examples of the application of various spatial
adequate data and rapidly increasing com- analytical techniques;
puting power, would lead society to solve
many of the pressing issues in urban and problems in spatial analysis; and
regional areas.
Significant advances in spatial analysis areas for future research.
during the past two decades have brought
about a new era of interest in the field. Although there is inevitable (and desir-
The period of relative decline has now been able) variability in the structure and nature
replaced by one of great enthusiasm for the of the individual chapters, in a broad
potential of spatial analysis. This potential sense the contributions have the following
has been recognized and embraced by aims:
researchers from many fields, ranging from
public health and criminal justice, to ecology
and environmental studies, as evidenced by describe the current situation within the
various contributions to this volume. eld, highlighting the main advances
It is now widely recognized in a broad that have taken place, as well as current
debates;
range of disciplines that spatial analysis has
an important role to play in making sense describe the problems that still exist, indicating
of the large volumes of spatial data we now where future research may be best directed;
have available and the demand for spatial
analysis has never been stronger. It thus is indicate key works in the eld and provide an
an important time to produce this Handbook extensive bibliography for the area;
of Spatial Analysis describing many of the
major areas of spatial analysis. describe the use of the technique in several
disciplines; and
maintain a balance between concepts, theories

and methods.
1.4. OVERVIEW OF THE
HANDBOOK
Rapid improvements in the development
The book is designed to capture the state- and availability of high-quality datasets,
of-the-art in a broad spectrum of spatial along with the power and features of
analytical techniques that can be applied to geographic information systems that
spatial data across a very wide range of now increasingly provide capabilities for
disciplinary areas. advanced forms of spatial analysis, have
Our intent has been to provide a retrospec- propelled the field forward. Consequently,
tive and prospective view of spatial analysis the field of spatial analysis is currently in
that covers: the midst of an exciting growth period,
where many new tools and methods for REFERENCES

analyzing spatial data are being developed,
Spottiswoode, W. (1861). On typical mountain ranges:
and where applications are being made an application of the calculus of probabilities
in an increasing number of fields. This to physical geography. Journal of the Royal
Handbook represents a summary of these Geographical Society of London, 31: 149154.
developments and applications, as well as
a sense of the intense interest that the field
now enjoys.
2
The Special Nature of
Spatial Data
Robert Haining
This chapter describes some of the special properties of spatial data from properties
or distinguishing features of spatial data that are due to the chosen representation of
opening the way to methodological issues geographical space and from properties that
that will be treated in more depth in later are a consequence of measurement processes
chapters. The use of the term special by which data are collected for the purpose
should not be taken to imply that no other of storage in the spatial data matrix (SDM).
types of data possess these features. Spatial The SDM is what the analyst works with.
data analysis is a sub-branch of the more We conclude by considering the implications
general field of quantitative data analysis of these properties for the methodology of
and has sometimes suffered from not paying spatial data analysis.
sufficient attention to that fact. Many of the Geographic Information Science (GISc) is
data properties that will be encountered are the generic label that is frequently used, par-
found in other types of (non-spatial) data but ticularly by geographers, to define the area of
when found in spatial data, may possess a science that involves the analysis of spatially
particular structure or properties may arise in referenced data that is data where each
particular combinations. case has some form of locational co-ordinate
The chapter will first define what is meant attached to it. Data is the lynch pin in the
by spatial data and then identify properties. process of doing science and it is essential
It will be helpful, in order to put structure on that methodologies for spatial data analysis
this discussion, to distinguish fundamental are tuned to the properties of spatial data.
The science undertaken with spatial data suppressed if analysis is concerned with only
is usually observational rather than experi- a single time period but may be retained
mental. This is important. Much spatial data if there are to be a series of comparative
are not collected under controlled situations. studies through time or if different attributes
We often cannot choose the values of were recorded at different times and the
independent variables in order to generate a analyst needs to be aware of this. Such
satisfactory experimental design. There is no data may come from a variety of different
replication (in order, for example, to assess sources including national censuses; public
the effects of measurement error) and the or private agency records (e.g., national
analyst must take the world as he or she health services, police force areas, consumer
finds it. There may be further problems in surveys); satellite imagery; environmental
specifying what the appropriate locational surveys; and primary surveys. The data may
co-ordinate is when studying certain types be collected from a census or from a sampling
of processes and outcomes. All this has process. For the purposes of analysis data
implications for the quality of spatial data and from different sources may be required. Stud-
for the methodologies that can be employed. ies in environmental epidemiology utilise
We worry not only about the quality of our health, demographic, socio-economic and
data but exactly what it is we are observing environmental data. These data may come
in any given situation. A consequence of this with differing degrees of quality and may not
is that much of the data collected may be all be collected on the same areal framework
used to build a model of the situation under (Brindley et al., 2005).
study which can then be used to estimate To understand the properties of spatial
parameters and test hypotheses. We shall data we need to understand the relationship
see that some of the fundamental properties between equation (2.1) and the real world
of spatial data raise major problems in from which the data are taken. In order to
this regard. undertake data analysis the complexity of the
real world must be captured in finite form
through the processes of conceptualization
and representation (Goodchild, 1989; Guptill
2.1. SPATIAL DATA AND THEIR and Morrison, 1995; Longley et al., 2001).
PROPERTIES We shall focus here only on the issues
associated with capturing spatial variation,
A spatial datum comprises a triple of
but the reader should note that there are
measurements. One or more attributes (X)
conceptualization and representation issues
are measured at a set of locations (i) at time t,
associated with the way attributes and time
where t may be a point or interval of time.
are captured as well.
So, if k attributes are measured at n locations
The first step in this process, which
at time t, we can present the spatial data in
ultimately leads to the construction of
the form:
the SDM, involves conceptualizing the
geography of the real world. There are
{xj (i; t) ; j = 1, . . ., k; i = 1, . . ., n}. (2.1) two views of the geographical world in
GISc the field and the object views.
The field view conceptualizes space as
Equation (2.1) expresses in shorthand much covered by surfaces with the attribute
of the content of the SDM. The record of varying continuously across the space. This
when the observation was taken (t) may be is particularly appropriate for many types
THE SPECIAL NATURE OF SPATIAL DATA 7
of environmental and physical attributes. relationship to other polygons captured using

The object view conceptualizes space as a neighbourhood weights matrix.
populated by well-defined indivisible objects, The conceptualization of a geographic
a view that is particularly appropriate for space as a field or as an object is
many types of social, economic and other largely dictated by the attribute. However,
types of data that refer to populations. representation the process by which
Objects are conceptualized as points, lines or information about the geography of the
polygons. real world is made finite using geomet-
These two views constitute models of the ric constructs involves making choices
real world. In order to reduce a field to a finite (Martin, 1999). These choices include the
number of bits of data then the surface may be size and configuration of polygons, the
represented using a finite number of sample location and density of sample points.
points at which the attribute is recorded or
it may be represented using a raster grid.
Pixels are laid down independently of the
2.1.1. Fundamental properties
underlying field and its surface variation.
Alternatively, the surface may be represented Fundamental properties are inherent to the
by polygons that partition the space into areas nature of attributes as they are distributed
with uniform characteristics (e.g., vegetation across the earths surface. There is a
zones). How well any field is captured by fundamental continuity (structure) to
these different representations will depend attributes in space that derives from the
on the density of the points or the size of underlying processes that shape the human
the raster in relation to surface variability. and physical geographical world. We shall
There is a large theoretical and empirical discuss examples of these processes in
literature on the efficiencies of different section 2.2.2. The geographical world would
spatial sampling designs for example be a strange place if levels of attributes
the properties of random, systematic and changed suddenly and randomly as we
stratified random sampling given the nature moved from one point in space to another
of variation in the surface to be sampled close by. Continuity is also a fundamental
(see, e.g., Cressie, 1991; Ripley, 1981). The property of attributes observed in time. If we
process of discretizing in this way involves a know the level of an attribute at one position
loss of information on surface variability. in space (time) we can make an informed
This loss of detail on variability also arises estimate of its level at adjacent locations
when selecting a representation based on (points in time). The information that is
the object view. A city may comprise many carried in a piece of data about an attribute
households (points) but for confidentiality at a given location provides information on
reasons information about households is what the level of the attribute is at nearby
aggregated into spatially defined groups locations. However as distance increases then
(polygons) output areas in the case of the the similarity of attribute values weakens and
2001 UK census, enumeration districts prior in the GISc literature this is often referred
to 2001 (Martin, 1998). Again aggregation to as Toblers First Law of Geography (
into polygons involves a loss of information. . . . near things are more related than distant
There may be a further loss of information in things). Although Toblers First Law is
capturing the polygon itself in the database. clearly an oversimplification, and in relation
It may be captured using a representative to some types of spatial variation just plain
point (such as its centroid) and its spatial wrong, it is nonetheless a useful aphorism.
Testing for spatial autocorrelation was whereas space has no such order. The two
one of the high-profile research agendas in dimensional nature of space means that
geography during the quantitative revolution. dependency structures might vary not just
Geographers adapted spatial autocorrelation with distance but direction too giving rise
statistics based on the join-count statistic, to anisotropic dependency structures with
the cross product statistic and the squared structure along the northsouth axis differing
difference statistic that had been developed from the eastwest axis. The presence of
for quantifying spatial structure on regular spatial autocorrelation, that attribute values
areal frameworks (grids). These statistics are not statistically independent, has funda-
were developed to test for statistically signifi- mental implications for the conduct of spatial
cant spatial autocorrelation on irregular areal analysis.
frameworks (Cliff and Ord, 1973). The Spatial autocorrelation, in statistical terms,
null hypothesis (no spatial autocorrelation) is a second order property of an attribute
was assessed against a non-specific alter- distributed in geographic space. In addition
native hypothesis (spatial autocorrelation is there may be a mean or first-order component
present). We shall see how this argument was of variation represented by a linear, quadratic,
developed in later years with the introduction cubic (etc.) trend. We can think of these
and use by geographers of models for spatial as two different scales of spatial variation
variation. although the distinction may be hard to make
In the earth sciences, dealing principally and quantify in practice. As Cressie (1991)
with point data from surfaces, the quan- remarks: What is one persons (spatial)
tification of structure was based on the covariance may be another persons mean
use of the empirical semi-variogram which structure (p. 25). It has often been remarked
uses a squared difference statistic (Isaaks that spatial variation is heterogeneous. This
and Srivastava, 1989). The advantage of type of decomposition (plus a white noise
the latter route was that it led naturally to element to capture highly localized hetero-
model specification and model fitting using geneity) is one way of formally capturing that
theoretical semi-variograms. Of course these heterogeneity using what are termed global
quantitative measures and tests of hypothesis models. Another approach is to only analyze
depend on the scale of analysis. That is, they spatial subsets, that is allow model structure
depend on the size of the polygons in terms to vary locally.
of which data are reported, the inter-point
distance between samples on a continuous
surface. Thus the chosen representation has
2.1.2. Properties due to the
an important influence on the quantification
chosen representation
of this fundamental property and hence
its presence within any spatial dataset. If We have already noted that the extent to
samples are taken at sufficient distances apart which our data retains fundamental properties
the level of spatial autocorrelation is likely to depends on the chosen representation. We
be much reduced relative to the case where now turn to look at other properties that
samples are taken close together. stem directly or indirectly from the chosen
Autocorrelation statistics are also used representation.
to capture temporal structure in attribute Representing spatial variation using poly-
values but there are important differences gons is employed in many branches of
with the spatial situation. Time has a natural science that handle spatially referenced
uni-directional flow (from past to present) data. Two of the generic consequences of
working with data aggregates are: intra- at the cost of statistical precision. Data errors
areal unit heterogeneity and inter-areal unit or small random fluctuations in numbers
heteroscedasticity. of events (household burglaries; disease
Whether the data refer to a continu- outcomes) will have a big effect on the
ously varying phenomenon (field view) or calculation of rates when populations are
aggregations of individuals like households small. Take the case of a standardized
(object view) the effect of bundling data into mortality ratio. If the expected count is
spatial aggregates has the effect of smoothing small, for example 2.0, then the ratio itself
variation. In the case of environmental data (observed count divided by the expected
and the use of pixels then the degree of count) rises or falls by 0.5 with each
smoothing will clearly depend on the size of addition or subtraction of a single case. This
the pixels. The larger the pixels the greater will have implications for determining the
the degree of smoothing. A non-intrinsic statistical significance of counts whether
partition, where the polygons are defined in there are significantly more cases than would
terms of attribute variability with the aim be expected on the basis of chance alone. It
of maximizing within unit homogeneity and will also have implications for determining
maximizing between-unit heterogeneity will the statistical significance of differences in
not produce this effect to the same extent. counts between areas which in turn raises
This second process shares common ground problems for the detection of significant
with the process of regionalization to which crime hotspots or disease clusters.
it is sometimes compared. In summary, there is a trade-off that is
Intra-unit heterogeneity is a particular linked to the number of individual elements
problem for many types of social science in a polygon. A polygon containing few
data particularly in those cases where area individuals will tend to be more homo-
boundaries are chosen arbitrarily as was the geneous but statistical quantities, such as
case with the UK census for example prior rates and ratios, tend to be unreliable in
to 2001. Attributes reported for an area may the sense that small errors and random
represent percentages or means of attribute fluctuations can impact severely on the
values associated with the individuals (people calculated values. Polygons containing many
or households) that have been aggregated and individuals will generate robust rates and
the analyst may have no information on the ratios but often conceal much higher levels
variability around the mean. If an ecological of internal heterogeneity.
or contextual attribute is calculated for an In practice an area is sometimes partitioned
area (social capital say, or area deprivation) into polygons of varying size and this can
again the calculation is conditional on the yield a secondary effect on data properties.
chosen representation and the scale of the A rate calculated for a polygon where
partition. the denominator attribute is small has a
One of the conclusions that might be drawn larger variance than a rate computed for a
from this is that it is better to have small areal polygon where the denominator attribute is
aggregates rather than large ones. Assuming large. Moreover there is a mean-variance
spatial structure, a reasonable supposition dependence in the rate statistics. Take the
given the discussion in section 2.1.1, then case where the denominator is the number of
smaller areas should be more homogeneous households (n(i)). Rates are observed counts
than larger areas and their mean values of some attribute (number of burglaries) in
should be more representative of their areas polygon i (O(i)) divided by the number of
population. But such spatial precision comes households. It follows from the binomial
model for O(i) that: However there are problems here when
making comparisons. The standard error
tends to be large for areas with small
E[O(i)/n(i)] = (1/n (i)) E[O(i)] = p (i) ; populations and small for areas with large
populations because of the effect of popu-
Var[O(i)/n (i)] = (1/n (i))2 Var[O(i)] lation size on E(i). So extreme ratios tend
= p(i)(1 p(i))/n(i) to be associated with small populations but
(2.2) ratios that are significantly different from 1.0
tend to be associated with areas with large
populations (Mollie, 1996).
where E[ . . . ] and Var[ . . . ] denote mean These examples are intended to illustrate
and variance and p(i) is the probability the way in which data properties can
that any individual in area i (e.g., number be induced by the chosen representation.
of households) has the characteristic (e.g., In certain circumstances the geographical
been burgled) that is being counted. The structure of the representation (for example
mean and the variance in equation (2.2) are the geography of which areas have large
clearly not independent. It also follows from and which have small denominator values)
equation (2.2) that the standard error of the could induce a geographical structure on the
estimate of the rate p(i) which is: statistics which when mapped could then give
rise to a misleading impression about trends
or patterns in the data.
[ p(i) (1 p(i)) /n(i)]1/2
is inversely related to the number of 2.1.3. Properties due to

households. It follows that any real spatial measurement processes
variation in rates could be confounded by
The final step in the creation of the
variation in n(i) (the number of households)
SDM involves obtaining measurements on
or alternatively spatial variation in rates could
the attributes of interest given the chosen
be an artifact of any spatial structure in
representation.
n(i) (see Gelman and Price, 1999, who give
Data quality can be assessed in terms of
examples from disease mapping in the USA).
four characteristics: accuracy, completeness,
Standardized ratios provide an estimate of
consistency and resolution. As noted above,
the true but unknown area-specific relative
a spatial datum comprises a triple of
risk of the selected disease under the
measurements: the attributes, location and
assumption of an independent Poisson model
time. Thus the quality of each of these three
for the observed counts. It follows from the
measurements needs to be assessed against
properties of the Poisson distribution that
the four characteristics. What is of interest
the standard error of the standardized ratio
here, however, is how measurement problems
is O(i)1/2/E(i). Using a normal approxima-
might introduce certain properties into the
tion for the sampling distribution of the
data (Guptill and Morrison, 1995).
standardized ratio, SR(i), approximate 95%
A common assumption in error analysis
confidence intervals can be computed:
is that attribute errors are independent. This
is likely to hold less often in the case
of spatial data. Location error may lead
SR (i) 1.96 O(i)1/2/E(i) .
to overcounts in one area and undercounts
in adjacent areas because the source of there are under or overcounts arising from
the overcount is the set of nearby areas the reporting process. Spatially uniform
that have lost cases as a result of the data incompleteness raises problems for
location error. So, count errors in adjacent analysis but spatial variation in the level
areas may be negatively correlated (Haining, of data incompleteness with, for example,
2003, pp. 6770). Location error can be undercounting, more serious in some parts of
introduced into a spatial data set as a result the study area than others, can seriously affect
of having to put data, collected on different comparative work and the interpretation of
spatial frameworks, onto a common spatial spatial variation. Missing or inaccurately
framework. Areal interpolation methods are located cases in a point pattern of events may
used but these are based on assumptions result in failure to detect a local cluster of
about how attributes are distributed within cases (Kulldorff, 1998).
areal units and these assumptions often Incompleteness in cancer data leads to
cannot be tested. The consequence is that forms of under or overcounting which give
further levels and patterns of error are rise to spatial variation that is an artifact of
introduced into the database (Cockings et al., how the data were collected. In the case of
1997). official crime statistics geographical differ-
In the case of remotely sensed data, ences between large counties in England may
the values recorded for any pixel are not be due to differences in police investigative
in one-to-one relationship with an area of and reporting practices. On the intra-urban
land on the ground because of the effects scale, burglaries in suburban areas will, on
of light scattering. The form of this error the whole, be well reported for insurance
depends on the type and age of the hardware purposes, but in some inner-city areas there
and natural conditions such as sun angle, may be under reporting either because there
geographic location and season. The point is no incentive or because of fear of
spread function quantifies how adjacent pixel reprisals. The Census provides essential
values record overlapping segments of the denominator data for computing small area
ground so that the errors in adjacent pixel rates. However refusals to cooperate can lead
values will be positively correlated (Forster, to undercounting and the 1991 Census in the
1980). The form of the error is analogous to UK was thought to have undercounted the
a weak spatial filter passed over the surface population by as much as 2% because of
so that the structure of surface variation, in fears that its data would be used to enforce
relation to the size of the pixel unit, will the new local poll tax. Inner city areas
influence the spatial structure of error cor- show higher levels of undercounting than
relation. Linear error structures also arise in suburban areas where populations are easier
remotely sensed data (Short, 1999). Finally, to track. Finally, since there are 10-year gaps
we note that the effects of error propagation between successive censuses, population in-
may further complicate error properties when and out-flows in many areas may be such as
arithmetic or cartographic operations are to preserve the essential socio-economic and
carried out on the data and source errors demographic characteristics of the areas. On
are compounded and transformed via these the other hand some areas of a city, especially
operations (Haining and Arbia, 1993). inner-city areas, may experience population
Data incompleteness may induce false mobility and redevelopment which result in
patterns in spatial data. Data incompleteness marked shifts that have implications for the
refers to the situation where there are reliability of the data in the years following
missing data points or values or where the Census.
Figure 2.1 Processes involved in constructing the spatial data matrix and the data
properties that are present or introduced at each stage.
Finally, in the case of some imagery, some We divide this section into situations where
areas of the image may be obscured because spatial properties can be exploited to help
of cloud cover. A distinction should be drawn solve problems and situations where spatial
between data that are missing at random properties introduce complications for the
from data that are missing because of some conduct of data analysis.
reason linked to the nature of the population
or the area. Weather stations temporarily
out of action because of equipment failure 2.2.1. Taking advantage of spatial
produce data missing at random. On the data properties to tackle
other hand, mountainous areas will tend problems
to suffer from cloud cover more than
adjacent plains and there will be systematic Consider the following problems:
differences in land use between such areas.
This distinction has implications for how Samples of attribute values have been taken
successfully missing values can be estimated across an area. The analyst would like to
and whether the results of data analysis will construct a map to describe surface variation
be biased because some component of spatial using the information contained in the sample.
variation is unobservable. Perhaps instead the analyst just wishes to
Figure 2.1 provides a summary of the estimate the surface at a point, or set of points,
points raised in this section. where no sample has been taken and estimate
the prediction error.
A spatial database has been assembled but

2.2. IMPLICATIONS OF DATA the database contains data that are missing at
PROPERTIES FOR THE random in the sense that there are no underlying
reasons (such as suppression or condentiality)
ANALYSIS OF SPATIAL DATA
why the particular values are missing. The analyst
wants to estimate these missing values.
In this section we turn to a consideration of
the implications of the properties of spatial
data for the conduct of spatial analysis. Again In both these cases we might expect to
we shall simply introduce ideas which will exploit some formalized version of the notion
be taken up in more detail in later chapters. that data points near together in space carry
information about each other. Both of these

2.2.2. Where spatial data
examples constitute a form of the spatial
properties introduce
interpolation problem and solutions such as
complications for
kriging exploit the spatial structure inherent
data analysis
in the surface as well as the configuration of
the sample points to provide an estimate of Spatial analysis is often called upon to
surface values together with an estimate of address scientific questions relating to out-
the prediction error (Isaak and Srivastava, comes (numbers of cases of a disease, dis-
1989). It is intuitive that any solution that tribution of house prices, regional economic
did not use the information contained in the growth rates) that are a consequence of
location co-ordinates of sample data values processes that by their nature are spatial.
would be considered an inefficient solution. Haining (2003) identifies four generic groups
Consider another group of problems: of spatial processes. A diffusion process is
one where some attribute is taken up by
a population so that at any point in time
Aggregated data are obtained on race
some individuals have the attribute (e.g., an
(black/white) and voting behaviour (did vote/did
infectious disease) and some do not. If the
not vote). Counts in the 2 2 table are known
diffusion process operates in ways that are
but the real interest lies in the voting behaviour
at the constituency level.
constrained by distance then there is likely
to be spatial structure in the geography of
Unemployment estimates have been obtained those who do and those who do not have
from a survey for each of a number of small the attribute in question. An exchange and
areas in a region. The small area estimators transfer or mixing process is one where
are unbiased but, because of small sample places become similar in attribute values
sizes have low precision. Conversely the region (per capita income; employment) as a result
wide estimator has high precision, but as an of flows of goods or services that bind
estimate for any of the small area levels of their economic fortunes together or where
unemployment is biased. A similar situation
patterns of movement and mixing perhaps at
arises when estimating relative risk levels across
different scales introduce a measure of spatial
the small areas of a larger region using the
standardized mortality ratio.
homogeneity into structures. A third type of
spatial process is an interaction process in
which outcomes at one location (e.g., the
In both these cases there is again an oppor- price of a commodity) are observed and as
tunity to exploit some formalized version of a result of the competition effect influence
the notion that data points nearby in space outcomes (prices) at another location. Finally,
carry information about each other. One there is a dispersal process in which
solution is to borrow information or borrow individuals spread across space (such as the
strength so that the low precision of small dispersal of seeds around a parent plant)
area estimates are raised by using data from so that counts reflect the geography of the
nearby areas (Mollie, 1996; King, 1997). dispersal mechanism.
These nearby areas provide additional data These generic spatial processes processes
(helping to improve precision) and because that operate in geographic space generate
they are nearby should reflect an underlying data where spatial structure emerges as a fun-
situation that is close to the small area in damental property of the data. Process shapes
question so will not introduce a serious level or at least influences attribute variation and
of bias. the resulting data that are collected possess
dependency structures that reflect the way the or interesting features in data including pos-
process plays out across geographic space. sible data errors and formulating hypotheses.
Not all processes of interest are spatial Exploratory spatial data analysis (ESDA)
in the sense described above. Many of the undertakes these activities with respect to
processes of interest to geographers play spatial data so that cases can be located on
out across geographic space in response to a map and the spatial relationships between
the place-based characteristics of areas (the cases assumes importance because they carry
particular mix of attributes they possess) information that is likely to be relevant to
and the spatial relationships between those the analysis (Cressie, 1984; Haining et al.,
areas. Outcomes in places (whether for 1998; Fotheringham and Charlton, 1994). It
example economic, social, epidemiological is important to be able to answer questions
or criminological) are not necessarily merely such as: where does that subset of cases on
the consequence of the properties of those the scatterplot or that subset of cases on the
places as places but may also be the boxplot, occur on the map? What are the
consequence of relational and contextual spatial patterns and spatial associations in this
influences. The distance between places; geographically defined subset of the map? In
the difference between adjacent places in the case of regression modelling do the large
terms of relevant attributes; the overall positive residuals, for example, cluster in one
configuration of places across a region, are area of the map?
all facets of relation and context that may ESDA and the software that supports
impact on outcomes and modify the role of ESDA needs to be able to handle the spa-
place in influencing outcomes. Two places tial index and be able to handle the special
may be identical in terms of their place- queries that arise because of the spatial refer-
based characteristics but differ significantly encing of the data. Thus the map becomes an
in terms of their relational and contextual essential visualization tool (Dorling, 1992).
attributes with neighbouring areas and these The linkage between a map window and
differences may explain why (for exam- other graphics windows, so that cases can
ple) two similarly affluent neighbourhoods be simultaneously highlighted in more than
experience quite different levels of assault one window, becomes an essential part of the
and robbery; why two similarly deprived conduct of ESDA (Andrienko and Andrienko,
neighbourhoods experience quite different 1999; Monmonier, 1989).
levels of health outcomes. Visualizing spatial data raises particular
We now examine briefly how these fea- problems, in part because of some of the
tures of how attribute values are generated properties discussed in earlier parts of this
impact on the choice of methodology for chapter. We highlight two here. First, it has
the purpose of data analysis. We distinguish been noted that data values, particularly rates
between exploratory spatial data analysis and and ratios, may not be strictly comparable
model-based forms of analysis that allow because standard errors are population size
hypothesis testing and parameter estimation. dependent. So if areas vary substantially
in terms of population counts (used as
the denominators for a rate) then extreme
Exploratory spatial data analysis values and even patterns detected by visual
Exploratory data analysis (EDA) comprises a inspection might be associated with that
collection of visual and numerically resistant effect rather than real differences between
techniques for summarizing data properties, areas. Second, areas that partition a region
detecting patterns in data, identifying unusual might be very different in physical size.
This may mean that the viewer of a map normal distribution. Pearsons product
has their attention drawn to certain areas of moment correlation coefficient (r) is the
the cartographic display (those areas with statistic used to measure the association
physically large spatial units) whilst other between X and Y . If the observations on the
areas are ignored. This may be particularly two variables are independent (there is no
important if in fact it is the small areas spatial autocorrelation in either X or Y ), then
that have the larger populations so that it if the null hypothesis is of no association
is their rates and ratios (rather than the between X and Y then a test statistic
rates and ratios associated with the physically is given by:
larger but less densely populated areas) that
are the more robust. One solution to this 1/2
problem is to use cartograms so that areas are (n 2)1/2 |r| 1 r 2 (2.3)
transformed in physical extent to reflect some
underlying attribute such as population size
(Dorling, 1994). This comes at a cost because which is t distributed with (n 2) degrees of
the individual areas in the resulting cartogram freedom.
may be hard for the analyst to place. There These distributional results do not hold if
may be a need for a second, conventional, X and Y are spatially correlated. The problem
map linked to the cartogram, so the analyst is that when spatial autocorrelation is present
can highlight areas on the cartogram and see the variance of the sampling distribution of r,
where they are on the conventional map. which is a function of the number of pairs
Conventional visualization technology is of observations n, is underestimated by the
often based on the assumption that all conventional formula which treats the pairs
data values are of equal status so that of observations as if they were independent.
the viewer can extract information from The effect of spatial autocorrelation on tests
visual displays without worrying about the of significance have been extensively studied
statistical comparability of the data values (for reviews see Haining, 1990, 2003) and
that are displayed. This assumption may shown to be very severe when both X and Y
break down when dealing with spatially have high levels of spatial autocorrelation.
aggregated data (Haining, 2003). Clifford and Richardson (1985) obtain an
adjusted value for n (n ) which they call the
effective sample size. This value, n , can
Model tting and hypothesis testing be interpreted as measuring the equivalent
If n data values are spatially autocorrelated number of independent observations so that
then one of the consequences of this for the the solution to the problem lies in choosing
application of standard statistical inference the conventional null distribution based on n
procedures is that the information content rather than n. An approximate expression for
of the data set is less than would be the this quantity is:
case if the n values were independent. This
means that the degrees of freedom available
1
for testing hypotheses is not a simple function n = 1 + n2 trace Rx Ry (2.4)
of n. We shall take the example of testing for
significant bivariate correlation between two
variables to illustrate this point. where Rx and Ry are the estimated spatial
Suppose n pairs of observations, correlation matrices for X and Y respectively.
{(x(i), y(i))}i are drawn from a bivariate (For a discussion of estimators see Haining,
1990, pp.118120.) The null hypothesis of no property and thus might be thought of as the
association between X and Y is rejected if: simplest departure from spatial independence
can be written as follows (Besag, 1974;
1/2 Cressie, 1991, p. 407):
1/2
n 2 |r| 1 r 2 (2.5)

E X(i) = x(i) X( j) = x( j) jN(i)
exceeds the critical value of the t distribution
=+ w(i, j) [X( j) ] ,
with (n 2) degrees of freedom.
jN(i)
This illustrates a general problem. Since
the n observations are positively spatially i = 1, . . ., n (2.6)
autocorrelated, the information content of
the sample is over-estimated if n is used it
and:
needs to be deflated. The sampling variance
of statistics are underestimated leading the

analyst to reject the null hypothesis when no Var X(i) = x(i) X( j) = x( j) jN(i) = 2 ,
such conclusion is warranted at the chosen
significance level. For the effects of spatial i = 1, . . ., n
dependency on the analysis of contingency
tables see, for example, Upton and Fingleton
(1989) and Cerioli (1997). where E[ . . . | .] and Var[ . . . | .] denote
To make further progress in understanding conditional expectation and variance respec-
the importance of spatial data properties and tively, is a first-order parameter and
the complications they introduce we need is the spatial interaction parameter. The
to introduce models for spatial variation Markov property means observations are
or data generators for spatial variation. conditionally independent given the values
Such models are important. By specifying a at neighbouring sites. {w(i, j)} denotes
model to represent the variation in the data the neighbourhood structure of the system
(including the spatial variation), the analyst of areas and w(i, j) = 1 if i and j are
is able to construct tests of hypothesis with neighbours ( j N(i)) and w(i, i) = 0 for
greater statistical power than is possible if all i. W is the n n matrix of {w(i, j)}
testing is against a non-specific alternative. and is sometimes called the connectivity
There are a number of possible formal matrix. It is a requirement that lies between
models for spatial variation of which the (1/min ) and (1/max ) where min and max
simultaneous spatial autoregressive (SAR), are the smallest and largest eigenvalues
the conditional spatial autoregressive (CAR) of W. For a fuller introduction to the
and the moving average (MA) models are Markov property for spatial data including
probably the best known. We will briefly look how to construct higher-order spatial Markov
at the first two but the interested reader will models see, for example, Haining (2003,
need to follow up the literature to gain a pp. 297299). This approach allows the
fuller understanding of these models and their construction of a hierarchy of models of
properties (Whittle, 1954; Besag, 1974, 1975, increasing complexity. As noted in Haining
1978; Ripley, 1981; Cressie, 1991; Haining, (2003), however, the Markov property does
1978, 1990, 2003). not have the natural appeal it has in the case
A multivariate normal CAR model which of time series, because space has no natural
satisfies the first order (spatial) Markov ordering. So the neighbourhood structure can
often seem rather arbitrary especially in the compared with methods that take account
case of the non-regular areal frameworks of the spatial autocorrelation in the errors.
used to report Census and other social and Second, if the usual least squares formula for
economic data. the sampling variances of these regression
If the analyst of regional data does not estimates is applied, the variances will be
attach importance to satisfying a Markov seriously underestimated. The formulae are
property another option is available called no longer valid and conventional F and t
the SAR model specification. A form of this tests of hypothesis are also not valid. We shall
model was first introduced into statistics by take a very simple example to illustrate these
Whittle (1954). Let e be independent normal points, where the parameter to be estimated
IN(0, 2 I) where I is the identity matrix and tests of hypothesis relate to a constant
and e(i) is the variable associated with site mean .
i (i = 1, . . ., n). Define the expression: Suppose n independent observations {x(i)}
are drawn from a N (, 2 ) distribution. The
sample mean, x, is an unbiased estimator for
X (i) = + w (i, j) [X( j) ] , and the variance of the sample mean is:
jN(i)
+ e(i), i = 1, . . ., n. (2.7)
Var (x) = 2/n. (2.8)
where is a parameter. The bounds on are If 2 is unknown then it is estimated by:

set by the largest and smallest eigenvalues
of W just as in the case of the CAR model.
This is the model most often seen in the s2 = (1/ (n 1)) (x(i) x)2 (2.9)
i=1, ..., n
spatial analysis and regional science literature
although the reason for its hegemony is far
from clear and seems to be largely based so that:
on a combination of historical accident (in
the sense that time series modelling preceded
Var (x) = (1/n (n 1)) (x(i) x)2 .
spatial data modelling and methods were
i=1, ..., n
transferred across) and subsequent lock-in.
(2.10)
These models can be embedded into,
for example, regression models either as
additional covariates (as in the case of equa- If the n observations are not independent
tion (2.7)) or as models for the error structure then although the sample mean is still
where the errors (in practice the residuals) unbiased as an estimator of , assuming each
are tested and found to show evidence of x(i) has the same variance ( 2 ), the variance
spatial autocorrelation (Anselin, 1988; Ord, of the sample mean is (see, for example,
1975). It is well known that fitting regression Haining, 1988, p. 575):
models by ordinary least squares when errors
are spatially (positively) autocorrelated gives
rise to some damaging consequences. First, Var (x) = 2/n + 2/n2
although we shall obtain consistent estimates
of the regression parameters (there may Cov (x(i), x( j))
be some small sample bias), the sampling i j(i<j)
variance of these estimates may be inflated (2.11)
where Cov(x(i), x( j)) denotes the spatial The estimator (2.14) is the best linear
autocovariance between x(i) and x( j). So, if unbiased estimator (BLUE) of . Note that
there is positive spatial dependence and 2 in the case of independence V = I (the
is known then 2/n underestimates the true identity matrix with 1s down the diagonal
sampling variance of the sample mean. If 2 and zeros elsewhere) and equation (2.14)
is unknown and is estimated by equation (2.9) reduces to the sample mean. In the case
then if there is positive spatial dependence V = I two modifications to the sample
the expected value of s2 is (see, for example, mean are occurring. First, the denominator
Haining, 1988, p. 579): for positive spatial dependence will be less
than n. Second, the presence of V1 in the
numerator of equation (2.14) downweights
E s2 = 2 [(2/n(n 1)) the contribution of any attribute x(i) which is
highly correlated with other attribute values
Cov (x(i), x( j))]
{x( j)} that is, where x(i) is part of a cluster
i j(i< j)
of observations.
(2.12)
The variance of is:
so that equation (2.9) is a downward biased

] = 2 (1T V1 1)1
Var[ (2.15)
estimate of 2 . This further compounds the
underestimation of the sampling variance.
Modified methods to take account which reduces to 2/n if V = I.
of spatial dependence are often based Since the sample mean is an unbiased
on the following argument (see, for estimator of , one modification is to replace
example, Haining, 1988). Assume the data equation (2.8) with equation (2.15). The term
xT = (x(1), . . ., x(n)), where T denotes the (1T V1 1) is proportional to Fishers infor-
transpose, are drawn from a multivariate mation measure (Haining, 1988, p. 586). It
normal spatial model with mean vector identifies the information about contained
given by 1 and n by n variancecovariance in an observation. Now equation (2.9) is not
matrix = 2V given, say, by one of the the maximum likelihood estimator for 2 .
models described above. (In the case of This is given by:
the CAR model (2.6), V = (I W)1 .) The
log likelihood for the data is:
2 = n1 (x 1)T V1 (x 1). (2.16)

(n/2) ln 2 2 (1/2) ln |V| 1/2 2
A further refinement is to replace equa-
(x 1)T V1 (x 1) (2.13) tion (2.9) with equation (2.16) substituting
the sample mean for in equation (2.16)
where V1 plays a role equivalent to the
where 1 is a column vector of 1s and |V|
second term in the right-hand side of
denotes the determinant of V. For simplicity
equation (2.11).
we assume V is known. The maximum
The general results given by equations
likelihood estimator of is:
(2.11) and (2.12) are why adjustments
to conventional methods are needed. The
1 evidence suggests that it is the effect of
= 1T V1 1
1T V1 x . (2.14)
the second term on the right-hand side of
equation (2.11) that is the more serious, spatial dependency and intra-area hetero-
at least in the usual situation of positive geneity when modelling a discrete valued
spatial dependence, and that one way to deal response variable such as the count of the
with this is to adjust n in equation (2.8) number of cases of a disease across a region
thereby increasing the sampling variance of using the Poisson model. Spatial dependency
the sample mean. The size of the adjustment and heterogeneity are important causes of
to n will be sensitive to the estimates of the overdispersion. For example consider a local
spatial autocorrelation in the data or, if a diffusion process in which individuals are
spatial model is fitted to the data, the choice more likely to be infected if they are close
of model. The problem is further complicated to someone already infected. The result is
if, as is usually the case, V is not known and that counts of the number of cases will
so must be estimated from the data. reveal Poisson overdispersion because there
Before leaving the normal model it is will be areas with large counts (due to the
important to note that aggregated spatial local infection process) and areas with zero
data may violate another of the statistical counts where the process has not yet started.
assumptions of least squares regression. It These considerations require the analyst both
was remarked in section 2.1 how rates and to carry out tests for overdispersion and
ratios based on areas with very different where necessary take appropriate action.
population counts will have different stan- The effects of overdispersion in generalized
dard errors. It follows that the assumption of linear modelling are rather similar to those
homoscedasticity (or constant error variance) described for the normal model when spatial
is likely to be violated when developing autocorrelation is detected. If overdispersion
models to explain how rates or ratios is present, ignoring it tends to have little
vary over a region. Data transformations or impact on point estimates of the regression
weighted least squares estimators are used parameters (the maximum likelihood estima-
to address these problems (Haining, 1990, tor is consistent, although some small sample
pp. 4950) but such adjustments may need bias might be present). However, standard
to be implemented whilst also addressing error estimates for regression parameters are
the problems created by residual spatial underestimated. Type I errors associated with
autocorrelation (Haining, 1991). In addition the model are underestimated which is par-
to the problems created by failure to satisfy ticularly problematic in relation to predictors
statistical assumptions, spatial data often that are close to the significance threshold.
create data-related problems in regression If the objective is to build a parsimonious
modelling (Haining, 1990, pp. 332333). For model, the presence of overdispersion may
example, the fit of a trend surface model result in an analyst constructing a model
can be influenced by the configuration of more complicated than necessary, and that
the sample data points on the surface where, overestimates the variance explained.
as a result of the particular distribution, Ways of tackling this problem may depend
certain values have high leverage (Unwin and on the reasons for the overdispersion.
Wrigley, 1987); the particular shape of the A conventional approach is through the
study region may also influence the trend use of a variance inflation factor (Dobson,
surface model fit (Haining, 1990, p. 372). 1999). Where the cause is inter-area spa-
These and other issues are reviewed in tial autocorrelation then a discrete valued
Haining (1990, pp. 4050). auto-model may be used which is analogous
We conclude this section by remarking on to equation (2.6) (see Besag, 1974). More
the implications of intra-area and inter-area recently attention has focused on the use
of spatial random effects models using we have access to only one realization of the
CAR models fitted using WinBUGS (Law process and in order to give our inferences
et al., 2006). These models allow for some broader validity other assumptions need
overdispersion through the random effects to be invoked such as that this realization
term. This is an area of current research is representative of the underlying process.
in spatial modelling since the development There may be no way to test such an
of good modelling tools for discrete val- assumption.
ued response variables has rather taken a The modifiable areal units problem
back seat whilst attention for many years (MAUP) reminds us that results obtained
has focused perhaps disproportionately from analyzing aggregate data are dependent
on the normal model (Law and Haining, on the particular scale of the partition, and,
2004). at the given scale, the particular boundaries
used. In general, statistical relationships
between attributes are stronger the larger
the spatial aggregates because variances
2.3. DRAWING INFERENCES are reduced. Boundary shifts can influence
whether or not disease clusters or crime hot
One of the main purposes of undertaking spa- spots are detected at any scale because if
tial statistical analysis is to make population boundaries happen to cut through the middle
inferences on the basis of the data collected. of a cluster this may dilute the effect over
In concluding this chapter we consider some two or more areas.
of the inference pitfalls associated with the The analysis of aggregated data is par-
analysis of spatial data. ticularly problematic and not just because
What is the population about which of the MAUP. It is important to remember
inferences are made in an observational that conclusions drawn from aggregate data
science? If data are point samples from a can only be transferred to the individual
continuous surface then the population might level under certain conditions. The ecological
be the surface itself. Of course the realized fallacy is the uncritical transfer of findings
surface may be thought of as only one at the group level to the individual level. As
of many possible realizations (the rest not the famous example cites, the suicide rate
having been observed). However, with or in Germany in the 17th century may have
without the concept of a superpopulation been larger in areas with higher percentages
of surfaces, making inferences from point of Catholics but that does not mean Catholics
samples to the (realized) surface population were more prone to commit suicide than
does represent a legitimate target. This Protestants. Quite the reverse as individual
argument is less convincing when the data level data revealed. Aggregation bias raises
represent a complete census for example the serious problems for epidemiological studies
data refer to areas and a complete (or nearly based on aggregate data and is one reason
complete) enumeration has been carried why it is considered the weakest of the
out. What is the population about which different methodologies for assessing dose
inferences are being made now? A frequent response relationships even though this
answer to this is that the underlying process may be the only realistic way of obtaining
is stochastic (chance is an inherent part of the reasonably sound measures of exposure to an
process) so that inferences are directed at the environmental risk factor. The problem is that
process (its parameters and covariates) rather it is not difficult to construct examples where
than the map. The problem with this is that there are complete sign reversals when going
Figure 2.2 Spatial data properties and how they impact at different stages of analysis.
from the ecological to the individual level finite digital database and the way spatial
study (Richardson, 1992). data are collected and attributes measured.
The converse of the ecological fallacy Many of these properties were recognized
is the atomistic (or individualistic) fallacy early in geographys quantitative revolution
which assumes relationships identified at the most notably the lack of independence
individual level apply at the group level. in data values collected close together in
There may be group level or contextual space. Geographers then and since have
effects that need to be taken into account made important contributions to the devel-
as for example in the study of youth opment of relevant statistical theory and
offending, where the risk of becoming an practice.
offender may not depend only on personal Geographers continue to develop new
and household level risk factors but also methods for describing spatial variation and
neighbourhood and peer group effects. This new methods for modelling processes that
then raises the problem of defining what the operate across geographical space. At present
neighbourhood is. there are two strong traditions which provide
Figure 2.2 provides a summary of the focuses for research. On the one hand there
points raised in sections 2.2 and 2.3. are methodologies based on whole map or
global statistics that seek to capture data
properties through models that are fitted to
all the data. On the other hand there are
2.4. CONCLUSIONS methodologies based on local statistics that
process geographically defined subsets of the
Spatial data possess a number of dis- data and do not seek to impose a single
tinctive properties that derive from the statistic or model on the whole data set
fundamental nature of geographic space and (Anselin, 1995, 1996; Getis and Ord, 1996;
the way processes unfold in geographic Fotheringham and Brunsdon, 2000). They
space, the way that spatial variation is represent different ways of responding to the
represented for the purpose of storage in a need to develop methodologies to meet the
analytical challenges posed by the special Marechal, A., (eds), Geostatistics for Natural
nature of spatial data. Resources Characterization, pp. 2144. Dordrecht:
Reidel.
Cressie, N. (1991). Statistics for Spatial Data. New York:
Wiley.
REFERENCES
Dobson, A.J. (1999). An Introduction to Generalized
Andrienko, G.L. and Andrienko, N.V. (1999). Interactive Linear Models. Boca Raton: Chapman & Hall.
maps for visual data exploration. International Dorling, D. (1992). Stretching space and splicing
Journal of Geographical Information Science, 13: time: from cartographic animation to interactive
355374. visualization. Cartography and Geographic Informa-
Anselin, L. (1988). Spatial Econometrics: Methods and tion Systems, 19: 215227.
Models. Dordrecht: Kluwer Academic. Dorling, D. (1994). Cartograms for visualizing human
Anselin, L. (1995). Local indicators of spatial geography. Hearnshaw, H.M. and Unwin, D.J., (eds),
association LISA. Geographical Analysis, 27: Visualization in Geographic Information Systems,
93115. pp. 85102. New York: J. Wiley & Sons.
Anselin, L. (1996). The Moran scatterplot as an ESDA Fisher, R. (1935). The Design of Experiments.
tool to assess local instability in spatial association. Edinburgh: Oliver & Boyd.
In: Fischer, M., Scholten, H.J. and Unwin, D., (eds), Forster, B.C. (1980). Urban residential ground cover
Spatial Analytical Perspectives on GIS, pp. 111125. using LANDSAT digital data. Photogrammetric
London: Taylor & Francis. Engineering and Remote Sensing, 46: 547558.
Besag, J.E. (1974). Spatial interaction and the statistical Fotheringham, A.S., Brunsdon, C. and Charlton, M.
analysis of lattice systems. Journal of the Royal (2000). Quantitative Geography: Perspectives on
Statistical Society, B, 36: 192225. Spatial Data Analysis. London: SAGE.
Besag, J.E. (1975). Statistical analysis of non-lattice Fotheringham. A.S. and Charlton, M. (1994). GIS and
data. The Statistician, 24: 179195. exploratory spatial data analysis: an overview of
Besag, J.E. (1978). Some methods of statistical some research issues. Geographical Systems, 1:
analysis for spatial data. Bulletin of the International 315327.
Statistical Institute, 47: 7792. Gelman, A. and Price, P.N. (1999). All maps of
Brindley, P., Wise, S.M., Maheswaran, R. and Haining, parameter estimates are misleading. Statistics in
R.P. (2005) The effect of alternative representations Medicine, 18: 32213234.
of population location on the areal interpolation of Getis, A. and Ord, J.K. (1996). Local spatial statistics:
air pollution exposure. Computers, Environment and an overview. In: Longley, P. and Batty, M., (eds),
Urban Systems, 29: 455469. Spatial Analysis: Modelling in a GIS environment, pp.
Cerioli, A. (1997). Modied tests of independence in 261277. Cambridge: Geoinformation International.
2 2 tables with spatial data. Biometrics, 53: Goodchild, M.F. (1989). Modelling error in objects
619628. and elds. In: Goodchild, M. and Gopal, S.,
Cliff, A.D. and Ord, J.K. (1973). Spatial Autocorrelation. (eds), Accuracy of Spatial Databases, pp. 107113.
London: Pion. London: Taylor & Francis.
Clifford, P. and Richardson, S. (1985). Testing the Guptill, S.C. and Morrison, J.L. (1995). Elements of
association between two spatial processes. Statistics Spatial Data Quality. Oxford: Elsevier Science.
and Decisions, Suppl. No. 2: 155160.
Haining, R.P. (1978). The moving average model for
Cockings, S., Fisher, P.F. and Langford, M. (1997). spatial interaction. Transactions of the Institute for
Parametrization and visualization of the errors British Geographers, NS3: 202225.
in areal interpolation. Geographical Analysis, 29:
Haining, R.P. (1988). Estimating spatial means with
314328.
an application to remotely sensed data. Commu-
Cressie, N. (1984). Towards resistant geostatistics. nications in Statistics, Theory and Methods, 17:
In: Verly, G., David, M., Journel, A.G. and 573597.
Haining, R.P. (1990). Spatial Data Analysis in the Social Longley, P.A., Goodchild, M.F., Maguire, D.J. and
and Environmental Sciences. Cambridge: Cambridge Rhind, D.W. (2001). Geographical Information
University Press. Systems and Science. Chichester: Wiley.
Haining, R.P. (1991). Estimation with heteroscedastic Martin, D.J. (1998) Optimizing Census Geography: the
and correlated errors: a spatial analysis of separation of collection and output geographies.
intra-urban mortality data. Papers in Regional International Journal of Geographical Information
Science, 70: 223241. Science, 12: 673685.
Haining, R.P. (2003) Spatial Data Analysis: Theory and Martin, D.J. (1999). Spatial representation: the
Practice. Cambridge: Cambridge University Press. social scientists perspective. In: Longley, P.A.,
Haining, R.P. and Arbia, G. (1993). Error propaga- Goodchild, M.F., Maguire, D.J. and Rhind, D.W.,
tion through map operations. Technometrics, 35: (eds), Geographical Information Systems: Volume 1.
293305. Principles and Technical Issues, 2nd edition.
pp. 7189. New York: Wiley.
Haining, R.P., Wise, S.M. and Ma, J. (1998). Exploratory
Spatial Data Analysis in a geographic information Mollie, A. (1996). Bayesian mapping of disease. Markov
system environment. The Statistician, 47: 457469. Chain Monte Carlo in Practice: Interdisciplinary
Statistics, pp. 359379. London: Chapman & Hall.
Isaaks, E.H. and Srivastava, R.M. (1989). An Intro-
duction to Applied Geostatistics. Oxford: Oxford Monmonier, M.S. (1989). Geographic brushing:
University Press. exhancing exploratory analysis of the scatterplot
matrix. Geographical Analysis, 21: 8184.
King, G. (1997). A Solution to the Ecological Inference
Problem. Princeton, New Jersey: Princeton University Richardson, S. (1992). Statistical methods for
Press. geographical correlation studies. In: Elliot, P.,
Cuzich, J., English, D. and Stern, R., (eds),
Kulldorff, M. (1998) Statistical methods for spatial Geographical and Environmental Epidemiology:
epidemiology: tests for randomness. In: Gatrell, A. Methods for Small Area Studies, pp. 181204.
and Lytnen, M., (eds) GIS and Health, pp. 4962. Oxford: Oxford University Press.
London: Taylor & Francis.
Ripley, B.D. (1981). Spatial Statistics. New York: Wiley.
Law, J. and Haining, R.P. (2004) A Bayesian approach
to modelling binary data: the case of high intensity Unwin, D.J. and Wrigley, N. (1987). Towards a general
crime areas. Geographical Analysis, 36: 197216. theory of control point distribution effects in trend
surface models. Computers and Geosciences, 13:
Law, J., Haining R., Maheswaran, R. and Pearson, T. 351355.
(2006) Analysing the relationship between smoking
and coronary heart disease at the small area level. Whittle, P. (1954) On stationary processes in the plane.
Geographical Analysis, 38: 140159. Biometrika, 41: 434449.
3
The Role of GIS
David Martin
3.1. INTRODUCTION although progress has been made towards

some level of integration between spatial
The role of geographical information systems analytical tools and GIS, few analytical
(GIS) in spatial analysis has for the most functions are actually available as commands
part been indirect, and less obvious than from within GIS. Goodchild (2000) fears
might at first be expected. It is probably that despite the many interconnections, the
true to say that throughout the history of gap between GIS and spatial analysis may
GIS, researchers concerned with specific sub- actually be widening. It is suggested that
fields of spatial analysis have bemoaned in the early years of development, GIS
the fact that proprietary GIS software has practitioners were more likely to possess
been an inadequate tool for their work. some measure of technical expertise and
Certainly Goodchild et al. (1992) were be interested in spatial analytical methods,
able to identify an extensive research and although the available tools were limited.
development agenda for the incorporation of The spatial analytical functionality of GIS
spatial analytical tools within GIS, yet more has increased over time, but this has been
recent reviews such as those by Longley overshadowed by the massive increase in the
and Batty (1996a, 2003a) have been equally number and range of GIS implementations,
able to identify the discrepancies between such that the average GIS user is now
the requirements of the spatial analyst and in command of a more powerful analytical
the functionality provided by mainstream toolkit, but has little increased ability to
GIS software. This is not to say that make use of it. In other words, the most
there have not been many steps taken in typical GIS use has moved from a more
the development of spatial analysis tools. analytical role to a more operational one,
Ungerer and Goodchild (2002) note that alongside a huge growth in the number of
software systems and users which comprise 1998; Heywood et al., 2006; DeMers, 2002a;
the GIS community. Nevertheless, GIS has Longley et al., 2001) and it is not the purpose
contributed to the development of spatial of this chapter to cover again the basic
analytical methods more indirectly through a principles of GIS. It is, however, necessary
huge growth in the data resources, structures to offer working definitions of GIS and
and basic tools available. It is worth noting spatial analysis so that their relationship can
here that sometimes in the relevant literature be effectively reviewed. What has probably
it is not entirely clear whether authors are become the classic GIS definition is restated
referring to GIS in the narrower sense of geo- by Goodchild (2000), for example, as a
graphical information systems or the broader system for creating, storing, manipulating,
field of geographical information science visualizing and analyzing geographical infor-
(Goodchild, 1992). GIScience incorporates mation. Although slightly different terms
both GISystems and spatial analysis, and are used, the concept of GIS as a toolbox
the discussion in this chapter focuses on the containing these core functions has become
relationship between these two components. nearly universal. Whereas specialist database
The remainder of this chapter seeks to or visualization software may exist in isola-
explore the complex and much-contested tion, the combination of these elements in an
relationship between GIS and spatial anal- integrated software environment is generally
ysis. Section 3.2 considers the definitions considered necessary in order to justify the
of each and reviews the extent to which claim that a software tool is actually a GIS.
they have become integrated. We then turn, Fotheringham and Rogerson (1993) spec-
in sections 3.3 and 3.4, to examine some ify that spatial analysis is not just aspatial
different models whereby spatial analysis analysis applied to spatial data: it is inherent
and GIS software tools have been connected in the analytical procedures with which
and consider a selection of more detailed we are concerned here that they aim to
examples. The principal barriers and opportu- reveal and characterize explicitly spatial
nities for closer integration between GIS and patterns and processes. More subtly, there is
spatial analysis are presented in section 3.5 something of a distinction between spatial
and the chapter concludes by attempting to data manipulation and analysis, although
assess the likely convergence or divergence the exact dividing line is dependent on the
between these families of spatial processing commentators view of spatial analysis itself.
techniques in future. By its nature, this Techniques for spatial data manipulation,
chapter inevitably touches on many areas that perhaps most extensively developed in the
are discussed in more detail elsewhere in this language of cartographic modelling (Berry,
volume, but the focus here is to explore the 1987; DeMers, 2002b), offer an extensive
interaction between GIS and spatial analysis, suite of functions for reclassification, overlay,
and more specifically the contributory role mathematical, distance and neighbourhood
of GIS. operations on map layers which can be
assembled into sophisticated scenarios, of
which perhaps the most frequently cited
example is site suitability analysis. Although
3.2. GIS AND SPATIAL ANALYSIS: it is possible for the spatial data manipulation
MADE FOR EACH OTHER? tools within a GIS to be assembled in such a
way as to carry out spatial analysis tasks, they
There are very many GIS textbooks available are generally not considered to constitute
(for example Burrough and McDonnell, spatial analysis per se. There is thus a sense
THE ROLE OF GIS 27
in which spatial analysis requires spatial data indicating that geocomputational approaches
manipulation, but manipulation is not in itself may serve to fill gaps in the spatial analysis
analysis. toolkit rather than represent an entirely new
A distinction can be found between those development. In the following discussion, we
who adopt a relatively narrow definition of adopt a broad definition, which encompasses
spatial analysis as the extension of statistical a wide range of specialist GIS functions
analysis into the spatial domain, such as whose purposes are primarily analytical
Bailey (1988) and those who would offer a rather than operational. This approach is
much broader view, including visualization, helpful in understanding the extent to
cartographic modelling and computationally which GIS has contributed to the contextual
intensive geographical analyses. Bailey and environment of a wide variety of spatial
Gatrell (1995) choose to distinguish between thinking and analysis tasks, but has had rather
spatial analysis and spatial data analysis, less obvious impact on the generation of
the latter describing the situation in which tools for narrowly defined statistical spatial
methods are applied to the description and analysis.
explanation of processes operating in space The early development of GIS and spatial
through the use of observational data within analysis techniques were rather separate,
a conventional statistical framework. This with GIS growing from extensive inventory
narrower definition has strong roots in applications such as the Canada GIS (CGIS)
quantitative geography (see Fotheringham (Tomlinson et al., 1976) concerned with the
et al., 2000), but tends to marginalize practical management of natural resources,
specialized analytical operations within GIS while most spatial analytical techniques can
such as hydrological modelling using grid- trace their roots to the quantitative revolution
based functions or network-based modelling in academic geography of the 1960s and
for route-finding applications. These tools do 1970s. Typically, spatial analytical methods
not contribute to the more narrowly defined were developed in the context of limited
statistical spatial analysis but nevertheless spatial data and software tools, frequently
make an important contribution to analytical being programmed in isolation by the
GIS use. researcher to take advantage of the research
A further area of development is that potential of specific datasets. Widespread
which has been termed geocomputation adoption of such methods was impossible
(Longley et al., 1998), in which highly due to the absence of suitably structured
computationally-intensive techniques have data and widely available software tools.
been applied to categories of spatial analyt- The need for a large body of transferable
ical problems which simply could not have and well-structured spatial data, for example
been tackled by conventional means. The incorporating the topological relationships
critical reader may find few fundamentals to required for many types of analysis involving
distinguish geocomputation from a broadly- adjacency or contiguity, was a precondition
defined spatial analysis, except for the use for any broad adoption of spatial analysis
of new data types and computing environ- methods and it is in this development of
ments. This work is also characterized by spatial data infrastructure that GIS can be
a concentration on some of the areas in seen to have played a critical role. GIS pro-
which traditional analytical methods have vides the essential tools for manipulation and
been weak: particularly the application of pre-processing of spatial data that are likely
high-powered computing to the study of to be required by the spatial analyst. There
spatio-temporal dynamics, perhaps again is thus great attraction to the prospect of
GIS
Software tools, Data production,

algorithm exchange formats,
development tool availability
Spatial analysis
Figure 3.1 Development inuences between GIS and spatial analysis.
somehow embedding a wide variety of spatial majority of these contributions do not

analysis tools in an existing GIS environment actually use standard GIS software in order
and thereby creating a rich integrated spatial to undertake their core spatial analysis
analysis environment, although the achieve- functions, while many employ general
ment of such integration has been elusive. purpose statistical packages or even develop
Figure 3.1 illustrates the principal cycle of separate spatial analysis software with
interaction, whereby GIS use has promoted various levels and types of connection to
data production, standardization of formats GIS. Miller and Wentz (2003) argue that
and the availability of general purpose tools, in fact the use of GIS may actually be
which have in turn fostered the development limiting the types of spatial analysis which
of spatial analysis techniques. Relatively few is undertaken by many users due to the
of these analytical techniques have gone on restrictions that the GIS model places on
to influence the functions and algorithms thinking about spatial relationships and
available in mainstream GIS. A relatively interactions. Their contention is that GIS
weak effect is observable in the opposite offers a much richer universe of spatial
direction, whereby GIS development per se data representation strategies than are
has led to new forms of spatial analysis. commonly adopted. Certainly, representation
The direct impact of spatial analysis on and analysis are closely linked. Martin
the broader spatial data infrastructure has (1999a) shows how different representations
been very small and is not shown in the of a disease phenomenon can lead to quite
figure. different ways of thinking and analyzing
Over time, several books have addressed its spatial form according to whether the
the theme of spatial analysis in GIS, for disease process is seen as a point pattern,
example, Fotheringham and Rogerson line vector, areal prevalence or continuous
(1993), Longley and Batty (1996b), density surface. Miller and Wentzs (2003)
Fotheringham and Wegener (2000) and particular concern is that the assumptions and
Longley and Batty (2003b). Each of these is alternatives of the conventional Euclidean
characterized by a series of detailed chapters conception of space go unquestioned by
addressing aspects of spatial analysis that most GIS users. Marble (2000) also identifies
have been implemented at the edge of overly simplistic representational models
existing GIS technology. Interestingly, the as one of the obstacles to demonstrating
THE ROLE OF GIS 29
the real applicability of spatial analysis, implementations whose objectives are facil-
citing the prevalence of simple distance ity management and inventory applications.
and the absence of direction from most The ONS example is a useful one to
spatial analytical work, despite its relevance illustrate the GIS-spatial analysis relationship
to practical decision-making. This view because it subsequently evolved to become
that the contribution of GIS to spatial the basis of a spatial referencing system for
analysis is strongly tied to its provision census outputs that provide a rich source
of the underlying representational models of socio-economic data for spatial analysis.
is consistent with Goodchilds (1987) Importantly, the contribution of the GIS
suggestion that an ideal GIS would be application was not in the provision of
one which incorporated a data model analysis tools per se but as the means
finely tuned to the needs of spatial analysis. of contributing to the wider spatial data
At that relatively early stage in GIS take-up, infrastructure, including user awareness and
he was able to observe that no contemporary debate. In many ways this is a microcosm
commercial product met the ideal and that of the historical role of GIS in spatial
there would be little economic incentive analysis.
for the development of a GIS incorporating Couclelis (1998) makes some observa-
such a spatial analytical model, while tions about the contrast between GIS and
applications rather than abstract concepts geocomputation which are also illustrative
are the drivers of proprietary software of the GIS-spatial analysis relationship.
development. GIS has been characterized by large scale,
Very many GIS users are not actually high-visibility practical applications, result-
concerned with statistical spatial analysis, but ing in great commercial and organizational
have entirely valid requirements involving interest, combined with the intuitive and
the management, query and reporting of visual appeal of map-based manipulation
spatially referenced data. For example, the by computer. Geocomputation, and spatial
UK census agency, the Office for National analysis more generally, does not enjoy these
Statistics (ONS), implemented a GIS for advantages: the more sophisticated analytical
the design of the 2001 census of pop- methods are often lacking in immediate or
ulation, starting with a prototype system obvious commercial application, are often
in the mid-1990s (Martin, 1999b). The hard to visualize and are far from intuitive to
initial objective was the simple replacement novice users. We can conclude that, although
of the existing labour-intensive process of related, GIS software is not the principal
creating maps for the guidance of census driver of spatial analytical tool development.
enumerators. A significant multi-user GIS Almost always, advanced spatial analysis
involving sophisticated data management of methods are developed separately from
multiple data sources, including a national GIS, but in an environment in which data
address-level database, was established with availability, especially in standard formats,
no spatial analytical ambitions, the primary is due to the wider adoption of GIS.
objective being to deliver printed maps Widespread use of GIS has brought about
and address listings for 175,000 census spatial data infrastructures and exchange
enumeration districts. Although aspects of mechanisms that make possible the practical
this system could clearly have been devel- implementation of spatial analyses that would
oped with spatial analytical purposes in otherwise have been quite intractable. GIS
mind, it shared its principal objectives with have thus come to provide the environment
perhaps the majority of commercial GIS rather than the tools for innovative spatial
analysis, with explicit software connections analysis tasks, it is hard to identify strong
between the two coming much later, if at all. advantages to this approach. Indeed the
isolation from the data layers available in
GIS and the obvious risk of reinventing the
wheel in the authoring of such tools serve
3.3. CLOSE COUPLED, LOOSE to make such a strategy unattractive. At the
COUPLED, UNCOUPLED? opposite extreme, where spatial analysis
functions are fully integrated within GIS
Ungerer and Goodchild (2002) provide a tab- software, there is a risk of promoting nave
ular representation of strategies for coupling or inappropriate use of complex techniques
GIS and spatial analysis, which is itself based due to a lack of specialist insight in spatial
on a classification by Goodchild et al. (1992). analysis. Openshaw (1996) identifies one
The coupling strategies are further illustrated element of this in what he terms the user
in Figure 3.2 and range from isolated, through modifiable areal unit problem in which
loose and close coupled to integrated: only in the well-recognized modifiable areal unit
the case of full integration are spatial analysis problem (Openshaw, 1984) is compounded
functions actually performed within the GIS by the availability of software that allows
software itself. The extent of integration users extensive opportunities for creating
possible will to some extent be determined their own spatial aggregation schemes with-
by the software architecture of specific GIS out any necessary understanding of the
employed. While it is clearly possible to impacts on spatial analysis of the resulting
write stand-alone software to perform spatial data. Of the intermediate positions, loose
External
data
Isolated Spatial GIS

analysis
Loose Spatial
data GIS
coupled analysis
GIS
Close Spatial
coupled analysis data
Spatial GIS
Integrated
analysis
data
Figure 3.2 Models of relationship between spatial analysis and GIS software (after Ungerer
and Goodchild, 2002 and Goodchild et al. 1992).
THE ROLE OF GIS 31
coupling generally involves file import and with associated attributes, generic input
export at each analysis stage but little new and output file formats are used and the
programming, whereas close coupling seeks software operates independently of any
to overcome this necessity by investing in GIS. An editing tool has been developed in
programming that more smoothly moves data Microsoft Visual Basic (VB) which provides
between the two software applications, for a user interface to the developers Fortran
example by developing software routines that regression program and produces outputs
directly access the GIS database as shown in which are intended for further analysis in
Figure 3.2. other software, including GIS. It is in the
very nature of geographically weighted
regression that the results are themselves
spatial data, comprising parameter estimates
3.4. CASE STUDIES and other statistics relating either to every
sample location or every point on a regular
In this section we briefly review a range spatial grid. Interpretation of these results
of case studies in which spatial analysis requires cartographic visualization, but it is
software is more or less closely coupled expected that the user will undertake this
to GIS. Examples are provided of each using other software, for which purpose two
of the situations illustrated in Figure 3.2. GIS output file formats are offered. Code is
Some of these examples will be encountered also available for running GWR within the
elsewhere in this book, but the objective in statistical package R, although this provides
considering them here is not to provide an no direct data management or mapping
overview of the analysis methods, but to functions.
review the role of GIS in the implementation A second example is GeoDa (http://
of these spatial analysis tools. www.csiss.org/clearinghouse/GeoDa/) which
incorporates limited data manipulation, but
has a range of spatial analysis functions
and visualization tools and works with the
3.4.1. Isolated
less sophisticated GIS data structures such
Some isolated spatial analysis tools have as Shapefiles. Anselin (1999) explains how
very specific and limited applications while this type of exploratory spatial data analysis
others are well-developed spatial analysis can bridge the gap between cartographic
toolkits. These programs rarely justify the visualization and statistical analysis. GeoDa
term of GIS in their own right, as one or is a tool for exploratory spatial data analysis
more of the basic GIS operations (often (ESDA), and allows the user to work with
in the data creation, storage and manipula- linked plots and interactive visualizations,
tion domains) are entirely missing or very a distinctive characteristic of ESDA tools
elementary. (Brunsdon and Charlton, 1996). The spatial
GWR, the software produced by analysis methods present in GeoDa focus
Fotheringham et al. (2002) for geogra- on measures of spatial association, partic-
phically weighted regression (http://ncg. ularly the calculation of local indicators
nuim.ie/GWR), serves as an example of an and weights. Spatial data manipulation func-
explicitly spatial analysis method which has tions are limited, but do allow for point
been implemented entirely separately from and polygon data through tools for the
GIS software. In this case, although the input creation of centroids and Thiessen polygons.
data are conventional spatial coordinates The software can thus be used to provide
additional spatial analysis functions to the as a stand-alone tool but to supply a spatial
GIS user through file export, or to provide analysis function to the GIS user that is not
stand-alone analysis of suitably structured otherwise available within the GIS software
point or polygon data (Anselin, 2005). environment. In this sense it provides addi-
Accession (http://www.accessiongis.com/) tional external functionality to the GIS user,
provides another interesting example, who must manually export and transfer the
whereby a software tool has been necessary data.
produced specifically for the calculation The history of AZM demonstrates some-
of geographical accessibility. Higgs (2005) thing of the separate origins of GIS and
provides an extensive review of health spatial analysis tools noted above. Openshaw
accessibility modelling in GIS, but notes (1977) describes an automated zoning proce-
that attempts to incorporate public transport dure (AZP) initially developed to run on an
accessibility are underdeveloped. This tool exemplar dataset comprising a limited set of
has been designed to undertake precisely regular cells, which could be aggregated into
that task, and thereby illustrates an approach clusters according to a variety of objective
to the concerns of Miller and Wentz (2003) functions. Although the method was of
by combining conventional spatial network demonstrable practical utility, the absence
analysis with the very unconventional spaces of widely available topologically structured
of public transport timetables. The software census or administrative area boundaries
offers a wider range of conventional GIS and the small problem size that could
functions than GWR or GeoDa but is still be handled by available computing power
not a fully developed general-purpose GIS, meant that the method was hardly applied
its unique functionality being the spatial until Openshaw and Rao (1995) returned
analysis of accessibility using a combination to the problem, using 1991 census data
of timetable and network data. and mid-1990s computing to demonstrate its
practical large-scale application. Effectively,
the practical application of the method had
to wait until GIS development had fostered
3.4.2. Loose coupled
the general availability of the necessary
AZM (http://www2.geog.soton.ac.uk/users/ data in a suitable topological structure.
martindj/davehome/software.htm) is a loose- AZM is based around Openshaws AZP and
coupled tool because it does not undertake is closely related to the system used to
any data management or display itself, but create output areas for the 2001 census
requires data import and export from a GIS. of population in England and Wales, itself
In this case, the software is intended for a loose-coupled configuration with zone
automated zone design and best-matching of design software processing topologically
incompatible zonal systems (Martin, 2003a) structured data exported from an ArcInfo GIS
and is dependent on an external GIS to application.
provide the topological data structure which
is a central requirement of zone design.
More recently, the software has been re-
3.4.3. Close coupled
engineered, again to take direct advantage of
widely-used Shapefiles, with the additional SAGE (Spatial Analysis in a GIS Environ-
topological structuring being undertaken ment) is another example of a system devel-
within the software. This is an interesting oped as a spatial analysis toolkit (Haining
example because its purpose is not to be used et al., 2001) but this time calling software
THE ROLE OF GIS 33
routines within the ArcInfo GIS. Although ArcInfo and ArcGIS software have moved
SAGE consisted of external custom-written to different operating systems and hardware
code, data were held within the GIS, whose architectures, and eventually the adoption
functionality was also called for specific of different scripting languages, making
data manipulation functions and cartographic SAGE unusable with more modern versions.
visualization. The software was developed External, non-commercial tools such as
specifically to overcome perceived analytical SAGE cannot realistically hope to track
shortcomings in the GIS, yet with a desire the relatively rapid software redesign cycle
not to reinvent those important functions of leading GIS software. The analytical
which were already well provided for. functions embedded in SAGE were not
Specifically, SAGE attempted to enhance the absorbed into the GIS software, so there
GIS functionality in the areas of visual- has actually been a decrease in the range
ization and statistical techniques. Although of tools available to the spatial analyst.
cartographic visualization is one of the Isolated and loose-coupled tools, relying only
central functional elements of GIS, scientific on generic spatial data transfer formats,
visualization, particularly that involving real- will probably survive several GIS software
time interaction with datasets, is generally versions without the need for significant
absent from GIS software. SAGE incor- reprogramming. Similarly, fully integrated
porated exploratory analyses through the tools have the potential to evolve with the
use of linked windows common to many GIS itself if they are actually adopted as part
ESDA applications. The specific motivation of the core product. Close-coupling however
for creation of SAGE was the analysis is perhaps the most problematic software
of health events. Haining et al. (2001) architecture, carrying a high risk of being left
explain the rationale for creating a spatial behind by developments in the GIS and the
analysis software suite integrated with a greatest maintenance burden for the spatial
proprietary GIS, citing the inconvenience analysis programmer if they are to ensure the
of having to transfer data between two continued utility of their tool.
software tools, but also the unnecessary Ungerer and Goodchild (2002) describe
duplication of effort when external tools a close-coupled component object model
need to provide their own basic mapping (COM) approach to linking GIS and spatial
and spatial manipulation functions which are analysis software. Their tool is an extension
already well-provided for by GIS. At the written for ArcInfo which undertakes spatial
core of the spatial analysis tool were two interpolation by creating an instance of a
separate programs, one providing the spatial statistics package, using it to run an analysis
analysis and the other a linkage tool, both on the GIS data and then placing the results
running in client/server mode with the GIS. within the GIS. This is just one step short
SAGE provided a range of classification of writing spatial analysis functions that are
and regionalization functions in addition to fully integrated with the host GIS. Their
spatial statistical analyses. implementation uses Microsoft Visual Basic
The fate of systems such as SAGE for Applications (VBA) which has become
is typical of many such attempts in that common as a macro language across multiple
although a great deal was achieved, the software packages, overcoming some of the
lack of true integration between the two restrictions of software-specific macro pro-
software systems and the academically driven gramming languages found, for example, in
motivation for the analysis program resulted earlier GIS versions. Clearly a programming
in divergence. Subsequent releases of the language of this type could be used to
develop entirely integrated spatial analysis assembled using AML were programmed in
tools but this example demonstrates its power C and called from within the AML routines
as a means for finding a common lan- so that the resulting analysis functions were
guage for close-coupling GIS with external presented to the user as additional commands
statistical software. within the GIS. Embedding of this type is
generally robust against incremental updating
of GIS software but becomes obsolete when
major changes to software architecture take
3.4.4. Integrated
place, affecting the spatial database and
In addition to those analytical functions macro programming language on which it
which are actually included as part of is based.
the core software, examples of customized Evans and Steadman (2003) describe a
spatial analysis operations fully integrated more modern application, interfacing a land
within GIS may be found at all periods use transport model known as TRANUS with
in GIS development. These are generally a desktop GIS. The objectives are to quickly
the result of spatial analysts being able to visualize the results of the transport model
directly access macro programming func- and to provide a means of exporting data
tions. Early instances involved languages for further analysis in additional software
such as ArcInfos Arc Macro Language environments. The TRANUS GIS module
(AML), while more recent examples are has been built using ArcObjects technol-
likely to use Microsoft VBA, perhaps inter- ogy from ESRIs ArcGIS which effectively
facing directly with components of the GIS allows Microsoft VB to be used to customize
software such as ArcObjects. interfaces and develop further software.
Ding and Fotheringham (1991) describe Automated procedures handle the transfer
an application called STACAS (SpaTial of results between the transport modelling
AutoCorrelation and ASsociation analysis) and GIS tools. In this instance visualization
that was completely embedded within the in the GIS is not the final objective, with
GIS software, being assembled from ArcInfo model outputs being passed on from the GIS
functions and custom-written programs. As to other external analysis tools. Effectively
with GeoDa described above, analysis of the GIS provides the visualization and post-
spatial association requires knowledge of the processing of specialized model results. The
spatial relationships between GIS objects, for GIS environment is additionally relevant as
example the adjacencies between polygons the context for the creation and manipulation
and distances between points or polygon of many of the data layers that contribute
centroids. It is also necessary to link attribute to the original transport modelling. Interest-
values with these locations and of value to ingly, the authors note that a question mark
display the resulting measures of association hangs over the demand for such integrated or
in cartographic form. For all of these closely coupled solutions.
reasons there is a considerable attraction to
embedding the analytical functions within
the GIS environment where the spatial rela-
tionships and support functions are already 3.5. BARRIERS AND
available. Ding and Fotheringhams solution OPPORTUNITIES
was to construct their analysis routines using
ArcInfos own macro programming language, Brown (2000) argues strongly that after
AML. Calculations that could not be readily so many years of discussion, not enough
THE ROLE OF GIS 35
progress has been made towards the genuine spatial statistical functions, there is every
integration of spatial analysis and GIS, possibility that they find increasing use in
especially when considered from the per- the presentation of results and visualizations
spective of the substantive researcher who from complex analyses run externally, for
has practical analysis requirements but is example of climate change, environmental
not able to engage in the development of sensitivity or neighbourhood property
software tools. He notes that the growth prices. The increasing pool of low-level
of GIS has been propelled by the spread of users remains at the same time one of the
less sophisticated GIS (such as ArcView) that greatest opportunities for spatial analytical
are less readily turned to spatial analytical development, yet a barrier to the emergence
applications. The result is that while there of a well-skilled user base.
is widespread use of GIS, this is often Goodchild (2000) sees four tensions
nave or at least goes little further than in the popularization of spatial analy-
cartographic visualization. It follows from sis through incorporation of tools within
this reasoning that it is the spatial analytical GIS software: (a) populism and elitism,
tools embedded within the simplest GIS (b) visual and numeric, (c) open and closed,
software, not the most sophisticated, that and (d) local and global. The first of
will actually determine the future uptake these, populism and elitism, is very much
and development of spatial analysis methods. concerned with the difficulty noted above:
Given the enormous contextual influence of although GIS use is becoming massively
GIS on the practical use of spatial analysis, more widespread, this does not directly
prevalent standards of GIS training can be increase the ability of users to appropriately
seen to have a significant impact on the engage with sophisticated spatial analysis
overall level of spatial analytical methods methods. There is in reality no organization
demanded and employed. with the authority to either restrict or
Public awareness of spatial data continues educate GIS users in this respect, so the
to increase massively through the popularity spatial analysis community must address
of web-based mapping tools, of which itself to the challenge of awareness-raising
Multimap (http://www.multimap.com/), the among an ever-multiplying community of
Neighbourhood Statistics Service (http:// low-level GIS users. The incorporation of
www.neighbourhood.statistics.gov.uk/), Win- visualization functions in spatial analysis
dows Live Local (http://local.live.com/) and tools, for example in GeoDa described
Google Earth (http://earth.google.com/) above, goes some way towards the enhanced
provide just a few examples. These communication of spatial analysis concepts
developments bring spatial data and to more advanced GIS users who might
concepts onto the desktops of millions who otherwise be unlikely to engage with purely
will remain unaware that there has even been statistical aspects. An increasing tendency
a debate about the role of GIS in spatial towards open-source software development
analysis. Such tools embody various simple may eventually assist in exposing underlying
GIS analysis functions such as route-finding algorithms but it is inevitably the case that
(Multimap, Windows Live Local), tagging only a small proportion of users will concern
and grouping of spatial objects (Google themselves with such a level of technical
Earth) and interactive choropleth mapping detail. The fourth tension between local and
(Neighbourhood Statistics). While it seems global analysis represents a continuum, with
unlikely that these populist tools will a need for analytical techniques appropriate
develop a need for much more sophisticated for each scale of analysis.
Two of the outstanding technical bar- 3.6. CONCLUSION: CONVERGENCE

riers facing spatial analysis and GIS are OR DIVERGENCE?
the (related) development of methods and
techniques that begin to seriously tackle Marble (2000) sees it as essential that
both space and time, a dimension whose developments in both GIS and spatial
importance is considered critical to the future analysis achieve closer integration. In this
integration of GIS and spatial analysis by context, he specifically cites the role of both
Marble (2000), and the availability of large spatial and temporal aspects. His argument
computational models to the ordinary user. is that researchers in both domains must
Batty (2003) suggests that the inability to more seriously get to grips with modern
adequately handle temporal dynamics has computational approaches. An obstacle to
long been the Achilles heel of geography this is seen as the conservative (actually
and GIS (p. 83) and when considering myopic Marble, 2000, p. 32) definition
many of the current grand challenges of spatial analysis as only that which is
of spatial analysis this certainly remains strictly statistical spatial analysis (a dis-
the case. Real world problems frequently tinction certainly made by Bailey (1998)
demand answers with temporal dimensions, and endorsed by Ungerer and Goodchild
for example how might this neighbourhood (2002)) resulting in the exclusion of some
change over time?, what will happen here of the more modelling-oriented approaches
in an extreme flood event? or how will the described above. The lack of integration is
best route change as congestion increases? primarily due to the characterization of user
The GIS industry has never developed a demand in determining what functionality is
consensus model for temporal representa- incorporated into commercial GIS software.
tion (Langran, 1992; Peuquet, 2002) and Key to further integration is therefore the
many spatio-temporal analyses based on unambiguous demonstration of the utility
GIS technology have for the most part of spatial analytical approaches which have
continued to use inadequate data models. the capability of stirring up user demand
While this may have been sufficient when to see different types of tools within their
data originated from pre-digital sources such GIS environments. Operationally, Marble
as land surveys and population censuses, sees the adoption of object-oriented data
it is inadequate to high-frequency satellite models as one of the keys to advancing
imagery or real time monitoring of traffic integration. Moves in this direction in the
flows or mobile telephony operations. These data architectures of major software such
data volumes not only offer the potential for as the most recent versions of ArcGIS
genuinely temporal analysis, requiring new certainly provide far richer environments
data architectures and analytical techniques, for the customization of the GIS and the
but also demand massively greater com- writing of new spatial analysis tools using
putational power. Spatio-temporal dynam- languages such as VBA, as used by Ungerer
ics was an important element of the and Goodchild (2002). Longley and Batty
geocomputational techniques noted above (2003a) trace the historical development
(Longley et al., 1998), and contemporary of GIS and its extension to incorporate
developments in pervasive and grid-enabled contemporary spatial analysis, specifically
computing (Martin, 2003b) seem set to offer drawing out the three themes of temporal and
the data access and computational power spatial representation, agent and institutional
required for a new generation of spatio- communications and geographical networks.
temporal analysis. These are major areas in which both GIS and
THE ROLE OF GIS 37
stand-alone spatial analysis software have a Anselin, L. (2005). Exploring Spatial Data with GeoDa:
long way to go. Again, it is the need for much A Workbook. Urbana-Champaign: University of
more sophisticated handling of space and Illinois.
time and the incorporation of different types Bailey, T.C. (1998). Review of statistical spatial analysis
of spatial computation that are the underlying in GIS. In: Fotheringham, A.S. and Rogerson, P. (eds),
Spatial Analysis and GIS, pp. 1345. Philadelphia:
themes.
Taylor and Francis.
In the preceding sections we have
reviewed various examples of the relation- Bailey, T.C. and Gatrell, A.C. (1995). Interactive Spatial
Data Analysis. Harlow: Longman.
ships between GIS and spatial analysis
which, despite differences of detail, display Batty, J.M. (2003). Agent-based pedestrian modelling.
remarkably little change over the last two In: Longley, P.A. and Batty J.M. (eds), Advanced
Spatial Analysis: The CASA Book of GIS. pp. 81106.
decades. It seems improbable that GIS Redlands, CA: ESRI Press.
software intended for an increasingly wide
Berry, J.K. (1987). Fundamental operations in computer-
user base will ever incorporate a high level
assisted map analysis. International Journal of
of spatial analytical functionality as the use of Geographical Information Systems, 1(2): 119136.
complex and advanced methods will never be
Brown, L.A. (2000). The GIS/SA interface for substantive
a concern of the ordinary GIS user. Although
research(ers): a critical need. Geographical Systems,
the absolute levels of spatial analytical 2: 4347.
functionality in GIS continues to increase, the
Brunsdon, C. and Charlton, M. (1996). Developing an
gap between populist software and research- exploratory spatial analysis system in XLisp-Stat. In:
oriented analytical tools cannot be closed in Parker, D. (ed.), Innovatons in GIS 3 pp. 135146.
relative terms. More realistically, a ground- London: Taylor and Francis.
swell of open software standards and, poten- Burrough, P.A. and McDonnell, R.A. (1998). Principles
tially, grid-based computing applications may of Geographical Information Systems. Oxford: Oxford
make practical communication between GIS University Press.
software and the more sophisticated analysis Couclelis, H. (1998). Geocomputation in context. In:
tools much easier. There is thus no greater Longley, P.A., Brooks, S.M., McDonnell, R.A. and
prospect of true convergence between GIS Macmillan, B. (eds), Geocomputation: A Primer
and spatial analysis than at any previous time, pp. 1730. Chichester: Wiley.
yet the two fields will continue to grow and DeMers, M.N. (2002a). Fundamentals of Geographic
feed off one another. What we still need are Information Systems, Second Edition Update.
more realistic expectations of what drives New York: Wiley.
the design of commercial software and a DeMers, M.N. (2002b). GIS Modelling in Raster.
concerted effort on more sustainable ways of New York: Wiley.
embedding spatial analytical tools within the Ding, Y. and Fotheringham, S. (1991). The Integration
broader GIS landscape. of Spatial Analysis and GIS: the Development of
the STACAS Module for ArcInfo. Technical Paper
915, National Center for Geographic Information
and Analysis, Buffalo, NY: NCGIA.
Evans, S. and Steadman, P.J. (2003). Interfacing
REFERENCES
land-use transport models with GIS: the Inverness
model. In: Longley, P.A. and Batty, J.M. (eds),
Anselin, L. (1999). Interactive techniques and
Advanced Spatial Analysis: The CASA Book of GIS,
exploratory spatial data analysis. In: Longley, P.,
pp. 289308. Redlands, CA: ESRI Press.
Goodchild, M., Maguire, D. and Rhind, D. (eds),
Geographical Information Systems: Principles, Fotheringham, A.S., Brundson, C. and Charlton, M.
Techniques, Applications and Management, Second (2000). Quantitative Geography: Perspectives on
Edition, pp. 253266. Chichester: Wiley. Spatial Data Analysis. London: Sage.
Fotheringham, A.S., Brunsdon, C. and Charlton, M. Longley, P.A. and Batty J.M. (eds) (2003b). Advanced
(2002). Geographically Weighted Regression. Spatial Analysis: The CASA Book of GIS. Redlands,
Chichester: Wiley. CA: ESRI Press.
Fotheringham, A.S. and Rogerson, P.A. (1993). GIS and Longley, P.A., Brooks, S.M., McDonnell, R.A. and
spatial analytical problems. International Journal of Macmillan, B. (eds) (1998). Geocomputation:
Geographical Information Systems, 7(1): 319. A Primer. Chichester: Wiley.
Fotheringham, A.S. and Wegener, M. (2000). Spatial Longley, P.A., Goodchild, M.F., Maguire, D.J. and
Models and GIS: New Potential and New Models. Rhind, D.W. (2001). Geographic Information Systems
London: Taylor and Francis. and Science. Chichester: Wiley.
Goodchild, M.F. (1987). A spatial analytical perspective Marble, D. (2000). Some thoughts on the integration of
on geographical information systems. International spatial analysis and Geographic Information Systems.
Journal of Geographical Information Systems, Geographical Systems, 2: 3135.
1(4): 32734.
Martin, D. (1999a). Spatial representation: the social
Goodchild, M.F. (1992). Geographical information scientists perspective. In: Longley, P., Goodchild, M.,
science. International Journal of Geographical Maguire, D. and Rhind, D. (eds). Geographical
Information Systems. 6(1): 3145. Information Systems: Principles, Techniques, Applica-
Goodchild, M.F. (2000). The current status of GIS and tions and Management, Second Edition, pp. 7180.
spatial analysis. Geographical Systems, 2: 510. Chichester: Wiley.
Goodchild, M.F., Haining, R., Wise, S. and 12 others Martin, D. (1999b). The use of GIS in census planning.
(1992). Integrating GIS and spatial data analysis: In: Stillwell, J., Geertman, S. and Openshaw, S.
problems and possibilities. International Journal of (eds), Geographical Information and Planning, Berlin:
Geographical Information Systems, 6(5): 40723. Springer. pp. 283298.
Haining, R., Wise, S. and Ma, J. (2001). Providing Martin, D. (2003a). Extending the automated zoning
spatial statistical data analysis functionality for the procedure to reconcile incompatible zoning systems.
GIS user: the SAGE project. International Journal of International Journal of Geographical Information
Geographical Information Science, 15(3): 239254. Science, 17(2): 181196.
Higgs, G. (2005). A literature review of the use Martin, D. (2003b). Reconstructing social GIS. Transac-
of GIS-based measures of access to health care tions in GIS, 7(3): 305307.
services. Health Services and Outcomes Research Miller, H.Z. and Wentz, E.A. (2003). Representation and
Methodology, 5(2): 11939. spatial analysis in geographic information systems.
Heywood, I., Cornelius, S. and Carver, S. (2006). An Annals of the Association of American Geographers,
Introduction to Geographical Information Systems, 93 (3): 574594.
Third Edition. London: Pearson. Openshaw, S. (1977). A geographical solution to
Langran, G. (1992). Time in Geographic Information scale and aggregation problems in region-building,
Systems. London: Taylor and Francis. partitioning and spatial modelling. Transactions
of the Institute of British Geographers, NS 2(4):
Longley, P.A. and Batty J.M. (1996a). Analysis, 459472.
modelling, forecasting, and GIS technology. In:
Longley, P.A. and Batty J.M. (eds), Spatial Analysis: Openshaw, S. (1984). The Modiable Areal Unit
Modelling in a GIS Environment. pp. 116. Problem. Concepts and Techniques in Modern
Cambridge: GeoInformation International. Geography 38. Norwich: Geo Books.
Longley, P.A. and Batty, J.M. (eds) (1996b). Spa- Openshaw, S. (1996). Developing GIS-relevant zone-
tial Analysis: Modelling in a GIS Environment. based spatial analysis methods. In: Longley, P.A. and
Cambridge: GeoInformation International. Batty, J.M. (eds) (1996). Spatial Analysis: Modelling
in a GIS Environment, pp. 5573. Cambridge:
Longley, P.A. and Batty J.M. (2003a). Advanced
GeoInformation International.
spatial analysis: extending GIS. In: Longley, P.A.
and Batty, J.M. (eds), Advanced Spatial Analysis: Openshaw, S. and Rao, L. (1995). Algorithms for
The CASA Book of GIS. pp. 118. Redlands, reengineering 1991 Census geography. Environment
CA: ESRI Press. and Planning A, 27(3): 425446.
THE ROLE OF GIS 39
Peuquet, D. (2002). Representations of Space and Time. Ungerer, M.J. and Goodchild, M.F. (2002). Integrating
New York: Guilford. spatial data analysis and GIS: a new implemen-
tation using the component object model (COM).
Tomlinson, R.F., Calkins, H.W. and Marble, D.F. (1976).
International Journal of Geographical Information
Computer Handling of Geographical Data. Paris:
Science, 16(1): 4153.
UNESCO Press.
4
Geovisualization and
Geovisual Analytics
Urka Demar
Geographic information science has in the data. Uncovering such knowledge is

encountered a new challenge in the recent sometimes a difficult task, which current
explosion of availability of spatial data sets. computational methods are not always able
Current spatial data sets tend to be very to perform, especially in large and highly
large examples are the terabytes of data multidimensional data sets. This is where
generated by Earth Observation Satellites, visual exploratory data analysis becomes
census databases and large databases of useful, since multivariate visualization is one
climate and environmental data. The data way in which humans can deal with complex
are recorded via sensors and monitoring data. Its purpose is to reveal knowledge in
systems that capture many parameters, which the data which is not detectable by current
makes the data highly multidimensional. computational methods, but which can easily
Another trend which follows current be identified by the human visual system. The
developments in spatial data interoperability value of visualization is in the fact that it
and management is that data from different can force us to notice something in the data
and until recently incompatible sources are that we never expected to see. As Plaisant
nowadays commonly integrated into larger (2004) puts it, Information visualization is
and even more multidimensional collections sometimes described as a way to answer
(Miller and Han, 2001). questions you didnt know you had.
These data are regarded as a source of This chapter discusses visualization as
potentially valuable knowledge, which exists means for exploring spatial data with the
in the form of patterns and relationships aim to create new knowledge and provide
new scientific insight. Visual data exploration patterns, trends and relationships that make
implies generation of new ideas through it easier to quickly perceive the signif-
creation, inspection and interpretation of icant aspects and characteristics of the
visual representations and can be considered data. The main purpose of visualization
a part of Exploratory Data Analysis (EDA) is to provide insight into data, which
(Tukey, 1977). When looking at spatial data, is usually done by displaying them with
we are talking about Exploratory Spatial reduced complexity, while at the same
Data Analysis (ESDA) (Unwin and Unwin, time preserving the interesting structure
1998). Visual exploration is essential as characteristics and minimizing the loss of
the first step of data analysis and serves information. Scientific visualization was
to uncover any indications of what there first defined 20 years ago (McCormick
actually is in the data, to prompt ideas and et al., 1987) as the use of computing
generate hypotheses. It is usually followed technology to create visual displays with
by confirmatory data analysis and as the last the goal to facilitate thinking and problem
step by visual communication where results solving. The term data visualization some-
are presented and disseminated in visual form times stands as a synonym for scientific
(DiBiase, 1990). This last step is the focus of visualization and is usually defined as
traditional cartography, which is beyond the visualization of data that have a natural
scope of this chapter. geometric structure. A more general term
The rest of this chapter is structured as information visualization refers to graphical
follows: the following section introduces representations of any type of data, including
the general visualization terminology, abstract structures, such as trees, networks
describes what role visualization plays in or graphs. Even though borders between
data exploration, presents one of the many these different terms are sometimes blurred,
possible classifications of visualization in all cases the emphasis is on supporting
methods and lists some examples of general knowledge construction from visual displays
(not necessarily spatial) visualization of data (Card et al., 1999; Fayyad et al.,
methods. The rest of the chapter focuses on 2002).
geospatial data, presents the state-of-the-art Knowledge construction from data is
in geovisualization research, lists a brief the process of actively manipulating data
selection of geovisualization software and in order to discover patterns, relationships
shows several examples. Finally, a new or other abstract knowledge representations
emerging discipline of Geovisual Analytics that facilitate the understanding of the
is introduced together with some of the future phenomenon under investigation. All knowl-
challenges in geovisualization research. edge construction is therefore a form of
pattern recognition. The most formidable
pattern recognition apparatus known to the
human race is the human brain, which can
4.1. INFORMATION VISUALIZATION analyze complex events in a short time
AND VISUAL DATA interval, recognize important patterns and
EXPLORATION make decisions much more effectively than
any computer can do. The question is how
Visualization is the graphical (as opposed to enable this formidable apparatus to work
to textual or verbal) presentation of data. in the knowledge construction process. Given
It translates complex data into visual displays that vision is the predominant sense and that
where a human can look for structure, computers have been created to communicate
GEOVISUALIZATION AND GEOVISUAL ANALYTICS 43
with humans visually, computerized data There are many ways to represent data
visualization provides an efficient connection graphically. There are also many ways of
between data and mind to support the grouping these displays according to some
data exploration process (Keim and Ward, orderly fashion, such as for example if
2003). their focus is geometric or symbolic, if the
The main goal of visual data exploration display is static or dynamic, according to
is to get an idea of what the data contain, the amount of structure the visualization
or what the data look like. This process method requires, etc. One of the more
does not provide a complete understanding comprehensive classifications is presented by
of the phenomenon behind the data that Keim and Ward (2003), who construct a
is not the point. Visual data exploration is three-dimensional space of visualizations by
intended to provide ideas about the general classifying the methods according to three
characteristics of the data which are to serve orthogonal criteria: the data type, the type of
as a basis for new hypotheses. These can the visualization method and the interaction
then be further tested using confirmatory method (Figure 4.1). Table 4.1 names some
data analysis methods (for example, statistics examples of each type of visualization
or other mathematical methods). The obser- methods according to Keim and Wards
vations can also be used to choose an classification to give the reader some
appropriate method for further scientific in- idea what kind of methods we are talking
depth analysis (Keim and Ward, 2003). about. A more comprehensive coverage of
Visual data exploration is usually per- information visualization techniques can be
formed in three steps according to the Visual found for example in Card et al., (1999),
Information Seeking Mantra (Shneiderman, Ware (2000) or other recent books on
1996): overview first, zoom and filter, then information visualization.
details-on-demand. One of the fundamental
concepts in this process is interaction.
The user can typically interact with the
visualization in a number of different ways, 4.2. GEOVISUALIZATION AND
such as browsing, selecting, querying and SPATIAL DATA EXPLORATION
manipulating the graphical parameters or
displaying other available information about Geovisualization or visualization of geospa-
the data all with the goal to discover tial data (any data with a given geographic
interesting patterns which are valid, novel, location) is defined as the use of visual
useful and comprehensible. A valid pattern representations in order to employ the vision
is general enough to apply to new data. to solve spatial problems (MacEachren et al.,
Novel means that the pattern is non- 1999). It can be considered as a perceptual-
trivial and unexpected. Usefulness refers cognitive process of interpreting and under-
to the property that the pattern can be standing georeferenced visual displays and
used for either decision-making or further provides theory, methods and tools for
scientific investigation. Comprehensibility visual exploration, analysis, synthesis and
means that the pattern is simple enough presentation of geospatial data (MacEachren
to be interpretable by humans, which is and Kraak, 2001). While its roots lie in
important because the analysts trust in cartography and geographic techniques for
the exploration result depends on how representing spatial data, geovisualization
comprehensible it is to him/her (Miller and integrates these traditions with scientific
Han, 2001). and information visualization principles and
Interaction
lays
disp
projection
ed
ays
filtering
form
ispl
ans
Dd
lays
lays
zoom
ys
r
D/3
pla
disp
t
disp
ally
distortion
l dis
rd 2
tric
sed
sed
hica
nda
me
brushing and linking
-ba
l-ba
sta
rarc
geo
icon
pixe
hie
1-d
ime
2-d nsio
ime nal
mu nsio Visualization
ltidi nal
text m
/hyp ension
hier er te al
arc xt
algo hies
rith and
ms gra
and phs
soft
Data war
e
Figure 4.1 The three-dimensional space of visualizations (redrawn after Keim and Ward
(2003)).
developments in exploratory data analysis. knowledge discovery in databases (KDD)

Research in this discipline dates several and geographic information science.
decades back, starting with Bertins work In addition to the challenges of ordi-
on cartographic design (Bertin, 1967/1983) nary information visualization, namely the
and followed by the establishment of volume and multidimensionality of data,
the Commission on GeoVisualization of geovisualization faces a task of preserving
the International Cartographic Association the richness and particular characteristics of
(ICA) in 1995 (ICA, 2008), which has geospatial data (such as for example spatial
ever since played a major role in the dependency and spatial heterogeneity). With
development of the discipline. For those the display possibilities restricted to the
interested, a more detailed description of usual two or three dimensions plus perhaps
the history of geovisualization can be found the additional dimension given by time and
in MacEachren et al., (2004). Let it suf- animation, geovisualization provides a clear
fice to say that the area has advanced linkage to the geographic space, so that the
from the first attempts of analyzing how user can relate the observed patterns to a
maps facilitate scientific thinking into a particular geographic location (Fotheringham
broad multidisciplinary research discipline et al., 2000; Miller and Han, 2001). Most
that converges theory, methods and ideas current geovisualization systems attempt to
from information visualization, cartogra- solve this problem by displaying data in
phy, graphic design, image analysis, per- a number of linked displays sometimes
ception and cognition, computer science, called multiple linked views. These displays
statistics, exploratory data analysis (EDA), typically include geographic visualizations,
Table 4.1 Examples of visualization bivariate matrices or similar multi-displays

methods, classied according to Keim and (see, for example, systems in Gahegan
Ward (2003) et al., (2002), Takatsuka and Gahegan (2002),
Visualization type Examples of visualization G. Andrienko et al., (2003a), Dykes and
methods
Mountain (2003), etc.). Roberts (2005) has
Standard 2D/3D Line graphs and surfaces
a more comprehensive list of examples.
displays (Figure 4.3)
A histogram
All these displays are usually interactively
A kernel plot connected by the concept of brushing and
A box-and-whiskers plot linking, which means that data elements
A scatterplot which are in some way interactively selected
A contour plot in one display (usually either through mouse-
A pie chart
over operation, direct selection or by some
Geometrically Scatterplot matrix
transformed Multiform bivariate matrix other interaction method) are simultaneously
visualizations (Figure 4.4) (MacEachren highlighted or selected everywhere. This
et al., 2003) provides a better visual impression and
Parallel coordinates plot facilitates pattern recognition across multiple
(Figure 4.4) (Inselberg,
displays. The key word here is interaction:
2002)
Icon-based display Star icons (Fayyad et al.,
high levels of interaction are necessary for
methods 2002) any kind of data exploration task (Dykes,
Chernoff faces (Chernoff, 2005).
1973) Sometimes the sheer volume and complex-
Dense pixel Recursive pattern visualization ity of the geospatial data makes it impossible
visualizations (Keim, 2002)
to rely solely on human vision for knowledge
Circle segment view (Keim,
2002) discovery. Successful knowledge construc-
Spacells (Figure 4.4) tion is therefore more likely if the advan-
(Gahegan et al., 2002) tages of visual exploration are combined
Hierarchical Dendrogram as a top-down with computational exploration methods. The
displays rooted tree
goal then becomes to construct visually
(Mller-Hannemann, 2001),
combined with a scatterplot
enabled knowledge discovery systems that
(Seo and Shneiderman, can facilitate the automatic process of pattern
2002) or mapped on a and relationship recognition in complex
sphere The Magic Eye data and the subsequent interpretation of
View (Kreuseler and the discovered patterns and relationships.
Schumann, 2002)
The data could, for example, first be
A treemap (Bederson et al.,
2002) visually explored with direct manipulation
A sunburst (Stasko and Zhang, of the visual displays and then when some-
2000) thing interesting appears, computational tools
could be applied. Alternatively computational
data mining can be used as a first pass and
the results can then be examined visually only
to reiterate the process with another pass of
such as maps or cartograms, as well as other computational mining and/or visual explo-
multivariate visualizations, for example any ration if required. By merging automatic and
of the visualization methods described in the visual exploration the flexibility, creativity
previous section or even constructs consisting and knowledge of a person are combined with
of several of those visualizations, such as the storage capacity and computational power
of the computer which results in a faster the entire data set in order to obtain proper
and more effective knowledge discovery. understanding of the underlying phenomenon
In practice, however, how to enable such and form appropriate hypotheses. Visual
synergy is not yet fully understood and the exploration is therefore a complex process
problem of integrating combined and visual which requires training and expertise to be
exploration tools in the best manner is not performed properly (G. Andrienko et al.,
trivial to solve (MacEachren et al., 1999; 2006).
Shneiderman, 2001; MacEachren and Kraak, An important issue to consider when devel-
2001). oping new geovisualization tools is therefore
Visual data exploration of spatial data has how users use these tools and how the tools
several advantages: it is intuitive and does support particular exploration tasks. These
not require understanding of complex math- questions can be answered by investigating
ematical and computational methodology. It the usability properties of the tools. Usability
is also effective when little is known about is defined as the extent to which a computer
the data, when the exploration goals are system supports users to achieve specified
vague or when the data are noisy and/or goals and does so effectively, efficiently, and
heterogeneous (Keim, 2002). On the other in a satisfactory way (Nielsen, 1993). The
hand, during visual exploration the analyst idea behind usability is that information sys-
typically looks at data from various perspec- tems designed with their users psychology
tives, at various scales and combines use and physiology in mind are easier to learn
of multiple techniques and approaches. No and more efficient and satisfying to use.
single visualization is capable of providing The principle of usability originates from
all the required views of the data, from user-centred design in HumanComputer
the general overview to indicating various Interaction (HCI), which is a discipline that
anomalies and patterns. It is therefore often explores the quality of interaction between
necessary for the analyst to simultaneously the users and information systems. One of the
use several techniques for various purposes. basic requirements for developing a usable
Different exploration tasks might also require and useful information system is knowledge
different visualizations. The fundamental about users and how they use the system.
questions to address prior to any exploration This is the basic principle of the user-centred
is what is the current task, what way of design, which is a philosophy where the
thinking does it require and which tools best needs, wants, and limitations of the users
support the task and way of thinking at hand of an information system are given attention
(Gahegan, 2005). Additionally, it is of course at every stage of the design process (Preece
also important to find out which visualization et al., 2002).
methods are available and what type of data Design of exploratory geovisualization
and phenomena they are suitable for. This tools has been technology driven for many
is not the only complexity issue: during the years. Tools and systems were developed
actual exploration, the analyst is required from a purely technical point of view,
to decompose the exploration problem into where knowledge about users did not play
smaller subproblems in a proper and efficient a major role. In recent years, however, the
manner which might be different for each approach has shifted towards user-centred
exploration task. In the last exploration step, design with the aim of providing useful and
the fragmentary knowledge resulting from usable geovisualization tools which support
each of the subproblem explorations needs to analytical reasoning (Fuhrmann et al., 2005).
be merged into a consistent interpretation for While the importance of geovisualization
tools for exploration of spatial data has they differ from the information visualization
been generally recognized, the issues of systems in several ways. For example,
usability testing for geovisualization are data representation in GIS packages is
not exactly the same as those in human limited to predefined object- (point, line,
computer interaction and how the visual area) or field-based representations, while
tools support human analytical reasoning is information visualization software does not
still not fully explained. Traditional usability usually have this assumption and treats
methods borrowed from humancomputer all data types as equal, regardless if this
interaction therefore need to be adapted makes sense geographically or not. This can
accordingly. The key issue in visual data be beneficial to reveal patterns that would
exploration is the intuitive search process otherwise remain obscured in traditional
in a visualized environment. It is therefore geographic representations. Most of the GIS
necessary to incorporate physiological and also offer only limited support for dynamics,
psychological findings about the process animation, interactivity between a number
of human vision as well as knowledge of different visualizations and any integrated
of the relation between geospatial objects computational methods (although there are
and their representation in the process of some attempts to implement data mining
system engineering (Fuhrmann et al., 2005; methods in the context of GIS, see for
N. Andrienko and G. Andrienko 2006a). example, Lacayo and Skupin, 2007).
The potentials and limitations of information On the other side of the story, there
visualization tools have been explored in exist numerous information visualization
numerous recent experiments focusing on environments that support development of
some aspect of the usability of geovisualiza- visual exploration systems for multivariate
tion tools (for example, N. Andrienko et al., data. Examples of well-known information
2002; Suchan, 2002; Tobn, 2002; Edsall, visualization environments are XGobi, R
2003; Haklay and Tobn, 2003; Slocum et al., and SPSS, but for this chapter, those that
2003; Griffin, 2004; van Elzakker, 2004; focus on spatial data are more relevant.
Ahonen-Rainio, 2005; Koua, 2005; Robinson Three that deserve a description here are
et al., 2005; Tobn, 2005; G. Andrienko et al., GeoVISTA Studio, CommonGIS and GeoDa,
2006; Demar, 2006, 2007a), but much still but this selection is far from exhaustive and
remains to be investigated. new tools and environments are developed
continuously.
GeoVISTA Studio is a java-based
collection of various geographic and other
4.3. GEOGRAPHIC INFORMATION visualizations and computational data
SYSTEMS AND mining methods (MacEachren et al., 1999;
GEOVISUALIZATION SOFTWARE Gahegan et al., 2000; Takatsuka, 2001; Dai
and Hardisty, 2002; Gahegan et al., 2002;
Todays geovisualization is much more than Gahegan and Brodaric, 2002; Takatsuka and
just map design, even though it is firmly Gahegan, 2002; Guo, 2003; MacEachren
rooted in cartographic traditions of map et al., 2003; Edsall, 2003; Guo et al., 2004;
design and display. Most of the contemporary Guo et al., 2005; Robinson et al., 2005).
commercial Geographic Information Systems Its components are implemented as Java
(GIS) provide a set of mapping tools, with Beans, which are self-contained software
appropriate symbology, graphical represen- components that can be easily connected
tation, classification and so on; nevertheless into a customized data exploration system
by visual programming. Furthermore, using for exploration problems in such various

Java Beans technology makes it possible disciplines as social geography, forestry,
to integrate external methods and bespoke meteorology, seismology, crime and environ-
components in the system. Visualizations ment (G. Andrienko et al., 2003a, 2003b;
include a parallel coordinates plot, N. Andrienko et al., 2003; N. Andrienko and
various bivariate visualizations (scatterplots, G. Andrienko, 2006b). More information is
spacefills, bivariate maps, etc.) that can be available from the authors homepage, http://
either independent or elements in different www.ais.fraunhofer.de/and.
types of multiform matrices, as well as time GeoDa or Geodata analysis software
series visualizations and visual classifiers. (Anselin et al., 2004) is an interactive
Computational methods include a statistics environment that provides a user-friendly
package and several types of classification and graphical introduction to spatial analysis
methods (k-means, ISODATA, maximum for non-experts, from simple mapping to
likelihood and a Self-Organising Map) with more advanced exploratory data analysis.
respective visualizations. Studio is free The functionality ranges from spatial data
and can be downloaded from its website manipulation, data transformation, mapping
http://www.geovistastudio.psu.edu. Two of including choropleth maps, cartograms and
the figures in this chapter were produced map animation, to statistical graphic tools
using GeoVISTA-based exploration systems for exploratory data analysis and the
(Figures 4.4 and 4.5). A selection of visualization of various spatial statistical
GeoVISTA components has recently been characteristics, such as spatial autocorrelation
assembled in an application version called the and spatial regression. GeoDa is also free
GeoVizToolkit, which is freely available at and is downloadable from http://www.geoda.
http://www.geovista.psu.edu/geoviztoolkit/. uiuc.edu.
This application has a user-friendly interface
and represents a good starting point for
learning how to explore multidimensional
spatial data. It is a lot easier to learn and 4.4. SOME GEOVISUALIZATION
use than original GeoVISTA Studio, but it EXAMPLES
does not provide the full functionality and
all computational capabilities of the Studio. This section attempts to introduce the reader
CommonGIS consists of various methods to several examples of geovisualization. Due
for cartographic visualization, non-spatial to the constraints of this publishing medium
graphs, tools for querying, search and clas- (a printed book) we are unfortunately limited
sification and computation-enhanced visual to present examples of these visualizations as
techniques for exploration of spatio-temporal black and white images. Colour, animation
data. Main features are interactive thematic and interactivity, which are all integral parts
mapping techniques, statistical computa- of geovisualization, are not supported. The
tions and displays, animated maps, dynamic reader is therefore encouraged to follow up
queries, table lenses, parallel coordinate plots the references in this section for a more
and time-aware geovisualization techniques. realistic illustration of what geovisualization
All the tools have a high level of interaction can do.
and are dynamically linked via highlighting, An alternative visualization to traditional
selection and brushing. The system has maps are cartograms, which distort the
been gradually developed over a number display space according to a specific attribute
of years and was used by the authors (Tobler, 2004). The objective of the distortion
(a) (b)
Figure 4.2 An example of (a) a choropleth map of the proportion of residents in Social
Class 1 in the Electoral Divisions (EDs) of the Republic of Ireland and (b) an area cartogram
of the same phenomenon where the areas of EDs are scaled according to the population
size. Dark colour indicates a high proportion and light colour a low proportion of residents
of Social Class 1 (i.e., rich residents) in a particular ED. On the cartogram in (b), the pattern
in Dublin can be clearly seen: the South side has the largest proportion of rich people, and
there are three areas in the north-east, north-west and south-west of the city where the
proportion of the rich is the lowest. This pattern can be barely recognized in the choropleth
map in (a), but the cartogram distortion makes it very eye-catching.
is to reveal patterns that are not apparent in of the same phenomenon (Figure 4.2b). The
the conventional map. Typical examples are figure shows two displays of the spatial
linear cartograms, where the space (usually variation in the proportion of residents in
represented as a spatial network) is distorted Social Class 1 in the Electoral Divisions
according to some distance other than the (EDs) of the Republic of Ireland in 2002.
geometric one, for example travel time. Such Residents in Social Class 1 are the most
cartograms are commonly used to represent affluent. The map on the left (Figure 4.2a)
public transit systems in larger cities is drawn using the Irish National Grid
any subway map or a map of commuter projection in which the polygons are scaled
rail services is typically a linear cartogram. in proportion to their land area. It is difficult
Another principle is to stretch the space to see what spatial variations there are in
continuously according to the distribution of the main urban centres, and the boundaries
values of some attribute, but to preserve the are visually intrusive. The areas in the
general shape and adjacency of polygons to cartogram on the right (Figure 4.2b) have
produce an area cartogram (Tobler, 2004). been redrawn so that their areas are in
Figure 4.2 shows an example of a choropleth proportion to their population this is an area
map (Figure 4.2a) versus the area cartogram cartogram or a density equalized projection.
The urban centres (starting from Dublin as above the sea level or the depth of the sea
the largest distorted area located on the bottom (Kreuseler, 2000). In some cases, the
east coast, followed by Waterford, Cork, attribute mapped to the z-axis represents time
Limerick, Galway and Sligo in clockwise- and instead of the surfaces, trajectories of
order along the coastline) become dominant movements of objects are projected through
in the display, and we can easily see the display space. This type of geovisual-
the spatial variation in the proportions of ization is very common in time-geography
affluent residents across the country a (Kraak and Koussoulakou, 2004) and in
spatial pattern which was not obvious in the transportation studies (Kwan, 2000, 2004).
traditional choropleth map (Figure 4.2a). The In the third type of the surfaces the z-axis
cartogram in this figure was produced using attribute represents neither a real geographic
the algorithm and software by Gastner and dimension nor time, but some other variable
Newman (2004). of interest, such as the population density, the
Another example of a fairly common temperature, the density of human activity or
geovisualization are 3D displays. These travel (Kwan, 2000), or in geosciences the
project the three locational dimensions onto magnetic variation or the kriging variance
a 2D display using a set of perceptual (Carr, 2002). Figure 4.3 shows a surface
depth cues to reinforce this projection, where the z-axis represents the concentration
such as perspective, occlusion and parallax of radon in the groundwater. The surface
motion (Ware and Plumlee, 2005). Here is covered with two maps of the area,
we present some examples of 3D geovi- one showing the bedrock and another one
sualizations, but only in the context of showing locations of fractures (Demar and
visual knowledge discovery from spatial Skeppstrm, 2005). Visual exploration of
data. The reader can explore other issues, this representation clearly indicates that high
such as the use of 3D georepresentations values of radon in this area (the highest peaks
in Virtual Reality and Virtual Environments, of the surface) occur only on a particular type
elsewhere (two starting points for that would of bedrock, which is shown with medium
be Fisher and Unwin (2002) and Bodum grey shade.
(2005)). Figure 4.4 shows a screenshot of a visual
One of the most common methods of exploratory system built using GeoVISTA
representing multivariate geospatial data in Studio. The system consists of a multiform
three-dimensions for knowledge discovery bivariate matrix, a geoMap and a parallel
are surfaces, which are sometimes also coordinates plot (PCP), which all share the
referred to as 2.5D representations when same colour scheme (except the spaceFills
displayed on the screen, as they are not liter- in the matrix). This principle of colouring
ally three dimensional. A general approach the graphical entities belonging to the same
to produce a surface is to map the two data element with the same colour in
basic geographic dimensions, longitude and all visualizations is called visual brushing.
latitude, to the x and y-axis respectively All visualizations are also connected by
and show the variable of interest on the interactive selection and brushing through
z-axis. Over this surface some other type mouse-over operation interaction, which
of geographic information can be draped to unfortunately cannot be adequately presented
provide texture: a thematic map or a satellite through a simple screenshot image, but is
image. Traditionally the attribute mapped essential for successful data exploration.
to the z-axis represents the third dimension The parallel coordinates plot (PCP) maps
in the real world, such as the elevation the n dimensional space onto the two display
Figure 4.3 A bedrock-fractures-radon visualization as a 2.5D surface. The height of the

surface represents the concentration of radon in the groundwater. Most of the peaks which
indicate high radon values are located on a certain type of bedrock, shown in medium grey
shade on the geological map, which is draped over the radon surface.
Figure 4.4 A GeoVISTA-based system displaying a synthetic spatial dataset (Demar, 2006)
based on the famous iris data (Fisher, 1936).
dimensions by using n equidistant parallel or can alternatively be inherited from other

vertical axes produced (Inselberg, 2002). visualizations through visual brushing as
The axes correspond to the dimensions and mentioned above.
are linearly scaled from the minimum to Notice the linear separability of the three
the maximum value of the corresponding clusters in several scatterplots in the matrix:
dimension. Each data item is then drawn as the dark grey cluster can be linearly separated
a polygonal line intersecting each of the axes from the light grey and the black one,
at the point which corresponds to the data which are mixed in several displays. The
value. Figure 4.4 shows an example. same clusters are clearly separated in the
A multiform bivariate matrix in Figure 4.4 petal length and petal width variables in
is a generalization of a scatterplot matrix the PCP, but not in other dimensions. There
and consists of univariate visualizations is also a distinct spatial pattern of the
histograms on the diagonal and bivariate three clusters in the map, where the black
visualizations at other positions in the cluster is separated from the other two. The
matrix (MacEachren et al., 2003). In the spacefills in the matrix indicate correlations
matrix in Figure 4.4, scatterplots of each between several pairs of variables the
corresponding pair of variables are located strongest one seems to be between petal
above the diagonal and spacefills below the length and petal width, where the pattern in
diagonal. Spacefills are dense pixel bivariate the relevant spacefill proceeds from white
visualizations, where each data element is to black in a relatively smooth manner.
represented by a grid square. The first of Other spacefills display a completely random
the two variables defines the colour of each distribution of cells, such as for example
square, ranging from light for low values the one in the row belonging to the leaf
to dark for high values of the variable. length variable and the column belonging
The second variable defines the order of the to the sepal width variable this indicates
squares inside the rectangular display: the that there is probably no correlation between
cell with the lowest value of this variable the corresponding two variables (which can
is situated in the bottom-left corner, from be confirmed by looking at the appropriate
where the cells then proceed along a scan scatterplot).
line towards the cell with the highest value in The visualization in Figure 4.5 shows
the top-right corner. If the attribute defining another type of GeoVISTA matrix a
the colour of the cells is correlated with the so-called fixed-row matrix (MacEachren
attribute defining the order of the cells, there et al., 2003), where a selected row variable
is a relatively smooth transition from the (in this case sepal length) is mapped against
lightest to the darkest colour from bottom each of the column variables using a different
to top (or from top to bottom). If the bivariate visualization for each row. In this
correlation is weaker, the pattern appears particular example the first row contains
more scattered and if theres no correlation, bivariate choropleth maps of sepal length
all that can be seen is a random distribution vs. all other five variables, while the second
of the cells in the display (Gahegan et al., row shows scatterplots of the same pairs
2002). of variables. Such matrices can either be
The geographic visualization in Figure 4.4 used separately or form a part of a larger
is the GeoVISTAs geoMap, which is a exploratory system as one of the multiple
bivariate choropleth map, whose colour linked views.
scheme is defined by a cross-tabulation of the A popular recent approach to combine
two display attributes (Gahegan et al., 2002) visual and computational data exploration
Figure 4.5 A xed row matrix of bivariate visualizations, again a component from
GeoVISTA Studio and displaying the same data as in the previous gure.
is to use a Self-Organizing Map (SOM) plane is a SOM lattice. In the D-matrix

as the computational method together with the grey shade represents how similar each
other spatial or non-spatial visualizations. cell is to its neighbours. Dark areas in the
The SOM is an unsupervised neural network D-matrix consist of very similar cells and
which projects multidimensional data onto therefore represent clusters. Light areas in
a two-dimensional lattice of cells while contrast indicate borders between clusters. In
preserving the topology and the probability each other plane (except in the D-matrix) the
density of the input data space. This means grey shade of each cell indicates the average
that similar data vectors are mapped to the value of a particular attribute calculated from
same neuron cell or to the neighbour cells values of all data elements assigned to that
in the two-dimensional output map, which cell. Relationships between attributes are
makes it useful as a knowledge discovery discovered by comparing the values in the
tool (Kohonen, 1997; Silipo, 2003). The same area of the SOM lattice in different
SOM has been recently used for knowl- planes.
edge discovery in a number of spatial While geographic location is the core
and spatio-temporal applications (Takatsuka, concept of geographic information science,
2001; Gahegan et al., 2002; Jiang and the visual models and methods that geog-
Harrie, 2004; Koua and Kraak, 2004; Guo raphers and cartographers have been using
et al., 2005; Skupin and Hagelman, 2005; for a long time can also be applied to
Demar, 2007b; Lacayo and Skupin, 2007; the representation of objects, phenomena
patenkov et al., 2007). or processes with spatial characteristics and
One reason for its popularity for inte- behaviour in abstract spaces. The use of
gration into a visual system is that SOM geographic and cartographic concepts to
produces a very visualizable result due represent data which are not inherently
to its two-dimensionality. Vesanto (1999) spatial is called spatialization (Skupin and
lists a number of possible visualizations. Fabrikant, 2003). The aim is to systematically
An example that we present here are the transform highly multidimensional data into
component planes (Figure 4.6), where each spatial representations in lower-dimensional
D-matrix DAY_365 WEEKDAY HOUR
TYPE NORTH EAST NEAR_BUILD_DIST
NEAR_BUILD_TYPE NEAR_TYPE1 NEAR_TYPE2 NEAR_TYPE3
NEAR_TYPE4 NEAR_TYPE5 AGE_DEN POP_DEN
Figure 4.6 Visually discovering relationships between the spatio-temporal attributes from
the SOM component planes visualization. The image was produced using a spatio-temporal
data set of emergency response data (patenkov et al., 2007) and the SOM toolbox for
Matlab.
abstract spaces with the goal to facilitate shows the in-degree of each publication,
data exploration and knowledge construc- i.e., how many other publications cite it.
tion (Fabrikant et al., 2002). Examples of The direction of the arrows is ignored in
spatializations are spatial representations of Figure 4.7(b) and the size of vertices in this
scientific co-authorship networks (Newman, picture represents the betweenness centrality
2004), proteinreceptor interaction networks (from social network analysis (Freeman,
in medicine, genealogies and citation net- 1979)) which measures the importance of
works (Batagelj and Mrvar, 2003). each vertex. Note that vertex representing the
Figure 4.7 shows an example of a spa- paper by MacEachren et al., (1999) has a
tialization: two different visualization of a high in-degree as well as high betweenness
citation network of the GeoVISTA Studio because many other papers cite it, while the
related papers from the reference list of this relatively large betweenness of Guo et al.,
chapter. In Figure 4.7(a), arrows indicate (2005) is a result of the fact that this paper
citations, i.e., the paper that the arrow points cites many other papers (even though its
from cites the paper which the arrow points never cited itself and has a low in-degree
to. The size of the vertices in Figure 4.7(a) compare with Figure 4.7(a).
MacEachren et al. 2003
Guo et al. 2004

Guo et al. 2005
Gahegan et al. 2002
Edsall 2003

Guo 2003
Gahegan et al. 2000
Gahegan and Brodaric 2002

Gahegan 2000
Dai and Hardisty 2002

Takatsuka and Gahegan 2002
(a) Takatsuka 2001
Takatsuka 2001
Guo 2003
Gahegan and Brodaric 2002
Takatsuka and Gahegan 2002

Dai and Hardisty 2002 Guo et al. 2005

Gahegan et al. 2000
Guo et al. 2004
Gahegan 2000
Gahegan et al. 2002
(b) Edsall 2003

Figure 4.7 A spatialization of a non-spatial phenomenon: the citation network of GeoVISTA

publications from the reference list of this chapter as (a) a directed and (b) an undirected
graph. The size of the vertices in (a) indicates the in-degree of each vertex and in (b) the
relative importance of the vertex in the network, measured by the betweenness centrality
(Freeman 1979). Spatial positions of the vertices were calculated according to their
in-degree in (a) and betweenness in (b). First the vertex with the highest in-degree/
betweenness was placed in a central position. Then the vertices with every lower
in-degree/betweenness value were iteratively placed in the nearest proximity while at the
same time minimizing the energy of the edges according to an energy-preserving
graph-drawing algorithm. The spatialization was produced using the Pajek software for
analysis and visualization of large networks (Batagelj and Mrvar, 2007).
4.5. THE FUTURE: FROM Visual Analytics and geographic information

GEOVISUALIZATION TO science for analysis of spatio-temporal
GEOVISUAL ANALYTICS data. Geovisual Analytics is defined as
the science of analytical reasoning and
A recent new research area that has decision-making with geospatial information,
emerged in information visualization is facilitated by interactive visual interfaces,
Visual Analytics, which is defined as the computational methods, and knowledge con-
science of analytical reasoning supported by struction, representation, and management
highly interactive visual interfaces (NVAC, strategies (G. Andrienko et al., 2007). The
2005). Visual Analytics tools are used for four research areas defined in the research
synthesizing information into knowledge, agenda of Visual Analytics are also appli-
derive insight from massive, dynamic and cable for Geovisual Analytics and the tools
conflicting data, discover the unexpected, and developed in the Geovisual Analytics context
provide and communicate timely and under- are starting to be of increasing importance
standable assessments. The recent research for various applications fields, such as crisis
agenda (NVAC, 2005) for Visual Analytics management (Tomaszewski et al., 2007) and
identifies four major research areas: spatial decision support (G. Andrienko et al.,
2007).
Apart from the above-mentioned research
the science of analytical reasoning, which areas which are common to Geovisual and
provides the reasoning framework to serve as the
Visual Analytics, there are many questions
basis for visual technologies for data analysis;
in geovisualization which are related to the
visual representations and interaction techniques, particularities of geospatial data and phe-
which provide the mechanism to see and nomena and thereby inherently characteristic
understand large volumes of data; to our discipline. Here we present a short
selection of topics, although the list is far
data representations and transformations appro- from exhaustive the reader is encouraged
priate to the analytical task that correctly convey to turn to the following sources for a more
the important content of large, complex and comprehensive review and a research agenda
dynamic data sets; and (MacEachren and Kraak, 2001; Dykes et al.,
2005; Andrienko et al., 2007; ICA, 2008).
production, presentation and dissemination of
Support for collaborative group work and
results, where the goal is to reduce the time to
present the results to the audience in a more distributed geovisualization: with the advent
effective communication manner. of ubiquitous computing, many potential
application areas for Geovisual Analytics will
require actions distributed over geographic
While the primary goal of developing space and time. Such tasks will include
the Visual Analytics research agenda was exploration of various spatio-temporal dis-
for US security purposes, the challenges tributions of complex data and events, and
listed above will have impact on any field will be physically performed in more than
of scientific research where understanding one location. Geovisual Analytics tools of
complex and dynamic data is important. the future should be able to support such
Geovisualization is no exception. In this exploration. This is particularly important in
context, a subfield of Visual Analytics situations where the users do not have time
relevant for geovisualization is Geovisual to consider all possible solutions to their
Analytics, which integrates perspectives from problems or cannot afford to search for an
optimal one, such as in, for example, crisis (Tobn, 2005), which implies that the
management. In order to deal efficiently with cognitive processes that must be supported
time pressure and stress in such situations, in geovisualization are different and possibly
Geovisual Analytics tools need to provide more complex than when non-spatial data are
support for a shared collaborative work investigated. Experiments have also shown
during a process where key parameters that there exist significant interpersonal
change quickly, such as, for example, for spa- differences in the way people visually explore
tial decision support in emergencies. Open spatial data, how they interpret what they
issues here range from developing distributed see and what exploration strategies they form
system architectures to intelligent solutions (G. Andrienko et al., 2006; Demar, 2006,
that support fast knowledge capture, rational 2007a). All this suggests that visual data
reasoning and time-critical spatial decision exploration is inherently complex. What can
making. be done to alleviate the complexity? How
A related topic is mobile geovisualization is the ability to use the tools related to
and location-based visual exploration. users background and experience? These are
Present technological advances in mobile just some questions to be considered. In
communications and the ubiquity of various order to resolve them, work on technological
mobile devices (mobile phones, PDAs, advances should be combined with work on
BlackBerries, etc.) are likely to change the human spatial cognition to fully reveal the
way people use information systems and potential of visual representations to support
this includes tools for geovisualization and spatial analytical reasoning, spatial problem
Geovisual Analytics. The emerging location- solving and spatial decision making.
based personalization raises not only
technical questions such as how to perform
on-the-fly location-based computation or ACKNOWLEDGEMENTS
how to display as much information as
possible without losing the clarity on a small The author would like to thank Mark
display of most of todays mobile devices, Gahegan from The University of Auckland
but also conceptual issues, for example the for kindly consenting to read the first
use of individually personalized dynamic draft of this chapter and providing helpful
egocentric maps (Meng, 2004; Meng, comments and suggestions. Thanks goes also
2005) instead of the traditional geocentric to Olga patenkov from Helsinki University
visualizations that remain static for a longer of Technology and Martin Charlton from
period and aim to communicate geographic the National Centre of Geocomputation,
information to a variety of users. National University of Ireland, Maynooth,
Finally, one of the recurrent topics in who prepared the illustrations showing the
visualization research are cognitive and SOM component planes and the cartograms
perceptual questions and evaluation of the respectively. Finally, research presented in
tools. Not only are visualization tools difficult this paper was supported by a grant to
to evaluate objectively, the results of such the National Centre for Geocomputation by
evaluations might not be replicable nor Science Foundation Ireland (03/RP1/1382)
generalizable and are in general difficult and by a Strategic Research Cluster grant
to interpret (Plaisant, 2004). Additionally, (07/SRC1/1168) from Science Foundation
there exist some evidence that there may Ireland under the national Development Plan.
be fundamental differences between infor- The author gratefully acknowledges this
mation visualization and geovisualization support.
REFERENCES Batagelj, V. and Mrvar, A. (2007). Pajek Program

for Analysis and Visualization of Large Networks,
Ahonen-Rainio, P. (2005). Visualization of geospatial version 1.20., 25 June 2007, available for
metadata for selecting geographic data sets. PhD download at: http://vlado.fmf.uni-lj.si/pub/networks/
thesis. Helsinki University of Technology, Espoo, pajek/ (last accessed 10 July 2007).
Finland.
Bederson, B.B., Shneiderman, B. and Wattenberg, M.
Andrienko, N., Andrienko, G., Voss, H., Bernardo, F., (2002). Ordered and quantum treemaps: making
Hipolito, J. and Kretchmer, U. (2002). Testing effective use of 2D space to display hierarchies. ACM
the usability of interactive maps in CommonGIS. Transactions on Graphics, 21(4): 833854.
Cartography and Geographic Information Science,
Bertin, J. (1983). The Semiology of Graphics,
29(4): 325342.
University of Wisconsin Press, Madison, Winsconsin.
Andrienko, G., Andrienko, N. and Voss, H. (2003a). GIS Translation of: Bertin, J. (1967). Semiologie
for everyone: the CommonGIS project and beyond. Graphique, Paris: Mouton.
In: Peterson, M. (ed.), Maps and the Internet,
Bodum, L. (2005). Modelling virtual environments
pp. 131146. Elsevier Science.
for geovisualization: a focus on representation.
Andrienko, G., Andrienko, N. and Gitis, V. (2003b). In: Dykes, J., MacEachren, A.M. and Kraak, M.-J.
Interactive maps for visual exploration of grid and (eds), Exploring Geovisualization, pp. 389402.
vector geodata. ISPRS Journal of Photogrammetry Amsterdam: Elsevier.
and Remote Sensing, 57: 380389.
Card, S.K., Mackinlay, J. and Shneiderman, B. (eds)
Andrienko, G., Andrienko, N., Fischer, R., Mues, V. and (1999). Readings in Information Visualization: Using
Schuck, A. (2006). Reactions to geovisualization: an Vision to Think. San Francisco: Morgan Kaufmann
experience from a European project. International Publishers.
Journal of Geographic Information Science, 20(10):
Carr, J.R. (2002). Data Visualization in the Geological
11491171.
Sciences. Upper Saddle River: Prentice Hall.
Andrienko, G., Andrienko, N., Jankowski, P., Keim, D.,
Chernoff, H. (1973). The use of faces to represent points
Kraak, M.-J., MacEachren, A.M. and Wrobel, S.
in k -dimensional space graphically. Journal of the
(2007). Geovisual analytics for spatial decision
American Statistical Association, 68(342): 361368.
support: setting the research agenda. International
Journal of Geographic Information Science, 21(8): Dai, X. and Hardisty, F. (2002). Conditioned and
839857. manipulable matrix for visual exploration. In:
Proceedings of the National Conference for Digital
Andrienko, N., Andrienko, G. and Gatalsky, P.
Government Research 2002.
(2003). Exploratory spatio-temporal visualization: an
analytical review. Journal of Visual Languages and Demar, U. and Skeppstrm, K. (2005). Use of
Computing, 14: 503541. GIS and 3D visualisation to investigate radon
problem in groundwater. In: Proceedings of the 10th
Andrienko, N. and Andrienko, G. (2006a). The
Scandinavian Research Conference on Geographic
complexity challenge to creating useful and usable
Information Science, ScanGIS2005, Stockholm,
geovisualization tools. In: Proceedings of GIScience
Sweden.
2006, Mnster, Germany.
Demar, U. (2006). Investigating visual exploration of
Andrienko, N. and Andrienko, G. (2006b). Exploratory
geospatial data: an exploratory usability experiment
Analysis of Spatial and Temporal Data. Berlin
for visual data mining. Computers, Environment
Heidelberg: Springer Verlag.
and Urban Systems (accepted for a special issue).
Anselin, L., Syabri, I. and Youngihn, K. (2004). Short version presented at the 1st ICA Workshop
GeoDa: An introduction to spatial data analysis. on Geospatial Analysis and Modelling, Vienna,
Geographical Analysis, 38: 522. July, 2006.
Batagelj, V. and Mrvar, A. (2003). Pajek Anal- Demar, U. (2007a). Combining formal and exploratory
ysis and visualization of large networks. In: methods for evaluation of an exploratory geovisual-
Jnger, M. and Mutzel, P. (eds), Graph Drawing ization application in a low-cost usability experiment.
Software, pp. 77103. BerlinHeidelberg: Springer Cartography and Geographic Information Science,
Verlag. 34(1): 2945.
Demar, U. (2007b). Knowledge discovery in environ- Freeman, L.C. (1979). Centrality in social networks:
mental sciences: visual and automatic data mining Conceptual clarication. Social Networks, 1:
for radon problems in groundwater. Transactions in 215239.
GIS, 11(2): 255281.
Fuhrmann, S., Ahonen-Rainio, P., Edsall, R.M.,
DiBiase, D. (1990). Visualization in the Earth Sciences. Fabrikant, S.I., Koua, E.L., Tobn, C., Ware, C.
Earth and Mineral Sciences, 59(2): 1318. and Wilson, S. (2005). Making useful and useable
geovisualization: design and evaluation issues. In:
Dykes, J.A. and Mountain, D.M. (2003). Seeking Dykes, J., MacEachren, A.M. and Kraak, M.-J.
structure in records of spatio-temporal behaviour: (eds). Exploring Geovisualization, pp. 553566.
visualization issues, efforts and applications. Amsterdam: Elsevier.
Computational Statistics and Data Analysis,
43(4): 581603. Gahegan, M. (2000). On the application of inductive
machine learning tools to geographical analysis.
Dykes, J.A. (2005). Facilitating interaction for geo- Geographical Analysis, 32(2): 113139.
visualization. In: Dykes, J., MacEachren, A.M.
Gahegan, M., Takatsuka, M., Wheeler, M.
and Kraak, M.-J. (eds), Exploring Geovisualization,
and Hardisty, F. (2000). GeoVISTA Studio: a
pp. 265292. Amsterdam: Elsevier.
geocomputational workbench. In: Proceedings of
Dykes, J.A., MacEachren, A.M. and Kraak, M.-J. Geocomputation 2000. Univeristy of Greenwich, UK.
(2005). Advancing geovisualization. In: Dykes, J.,
Gahegan, M. and Brodaric, B. (2002). Computational
MacEachren, A.M. and Kraak, M.-J. (eds), Exploring
and visual support for geographical knowledge
Geovisualization, pp. 693704. Amsterdam: Elsevier.
construction: lling in the gaps between exploration
Edsall, R.M. (2003). The parallel coordinate plot in and explanation. In: Proceedings of the Spatial Data
action: design and use for geographic visualization. Handling 2002. Ottawa, Canada.
Computational Statistics and Data Analysis, 43: Gahegan, M., Takatsuka, M., Wheeler, M. and
605619. Hardisty, F. (2002). Introducing Geo-VISTA Studio:
van Elzakker, C.P.J.M. (2004). The use of maps in the an integrated suite of visualization and compu-
exploration of geographic data. PhD thesis. Utrecht tational methods for exploration and knowledge
University, Utrecht, The Netherlands. construction in geography. Computers, Environment
and Urban Systems, 26: 267292.
Fabrikant, S.A., Skupin, A. and Couclelis, H. (2002).
Gahegan, M. (2005). Beyond tools: visual support
Spatialization: Spatial Metahphors and Methods for
for the entire process of GIScience. In: Dykes, J.,
Handling Non-Spatial Data. Web document (last
MacEachren, A.M. and Kraak, M.-J. (eds),
accessed 12 July 2007), http://www.geog.ucsb.edu/
Exploring Geovisualization, pp. 8399. Amsterdam:
sara/html/research/ucgis/spatialization_ucsb.pdf
Elsevier.
Fayyad, U., Grinstein, G.G. and Wierse, A. (eds) Gastner, M.T. and Newman, M.E.J. (2004). Diffusion-
(2002). Information Visualization in Data Mining based method for producing density-equalizing
and Knowledge Discovery. San Francisco: Morgan maps. In: Proceedings of the National Academy of
Kaufmann Publishers. Sciences, 101(20): 74997504.
Fisher, R.A. (1936). The use of multiple mea- Grifn, A. (2004). Understanding how scientists use
surements in taxonomic problems. Annals of datadisplay devices for interactive visual comput-
Eugenics, 7(2): 179188. In: Fisher, R.A. (1950). ing with geographical models. PhD thesis. The
Contributions to Mathematical Statistics. New York: Pennsylvania State University, Pennsylvania, USA.
John Wiley & Sons.
Guo, D. (2003). Coordinating computational and visual
Fisher, P.F. and Unwin, D.J. (eds) (2002). Virtual Reality approaches for interactive feature selection and
in Geography. London: Taylor and Francis. multivariate clustering. Information Visualization,
2003(2): 232246.
Fotheringham, A.S., Brunsdon, C. and Charlton, M.
(2000). Exploring Spatial Data Visually, Chapter Guo, D., Gahegan, M. and MacEachren, A.M. (2004).
4 in Quantitative Geography Perspectives on An Integrated Environment for High-dimensional
Spatial Data Analysis, 6592. Sage Publications. Geographic Data Mining. In: Proceedings of
London, UK. GIScience 2004. University of Maryland, USA.
Guo, D., Gahegan, M., MacEachren, A.M. and Zhou, B. Kwan, M.P. (2000). Interactive geovisualization of
(2005). Multivariate analysis and geovisualization activity-travel patterns using three dimensional
with an integrated geographic knowledge discovery geographical information systems: a methodological
approach. Cartography and Geographic Information exploration with a large data set. Transportation
Science, 32(2): 113132. Research Part C, 8: 185203.
Haklay, M. and Tobn, C. (2003). Usability evaluation Kwan, M.P. (2004). GIS methods in time-geographic
and PPGIS: Towards a user-centred design approach. research: geocomputation and geovisualization of
International Journal of Geographical Information human activity patterns. Geograska Annaler B,
Science, 17(6): 577592. 86: 267280.
International Cartographic Association (ICA) (2008). Lacayo, M. and Skupin, A. (2007). A GIS-based module
ICA Commission on GeoVisualization, website for training and visualization of self-organizing
of the commission, http://geoanalytics.net/ica (last maps. Working paper, accepted to the Workshop
accessed: 11 September 2008). of the ICA Commission on Visualization and Virtual
Environments, From Geovisualization to Geovisual
Inselberg, A. (2002). Visualization and data mining of
Analytics, Helsinki, August 2007.
high-dimensional data. Chemometrics and Intelligent
Laboratory Systems, 60: 147159. MacEachren, A.M., Wachowitz, M., Edsall, R.,
Haug, D. and Masters, R. (1999). Constructing
Jiang, B. and Harrie, L. (2004). Selection of streets from
knowledge from multivariate spatio-temporal data:
a network using self-organizing maps. Transactions
integrating geographical visualization with know-
in GIS, 8: 335350.
ledge discovery in database methods. International
Keim, D.A. (2002). Information visualization and visual Journal of Geographic Information Science, 13(4):
data mining. IEEE Transactions on Visualization and 311334.
Computer Graphics, 7(1): 100107. MacEachren, A.M. and Kraak, M.-J. (2001). Research
Keim, D.A. and Ward, M. (2003). Visualization. In: challenges in geovisualization. Cartography and
Berthold, M. and Hand, D.J. (eds), Intelligent Data Geographic Information Science, 28(1): 312.
Analysis, 2nd edn, pp. 403428. BerlinHeidelberg: MacEachren, A., Dai, X., Hardisty, F., Guo, D. and
Springer Verlag. Lengerich, G. (2003). Exploring High-D Spaces with
Kohonen, T. (1997). Self-Organizing Maps, 2nd edn. Multiform Matrices and Small Multiples. In: Proceed-
BerlinHeidelberg: Springer Verlag. ings of the International Symposium on Information
Visualization 2003. Seattle, Washington, USA.
Koua, E.L. and Kraak, M.-J. (2004). Alternative
visualization of large geospatial datasets. The MacEachren, A.M., Gahegan, M., Pike, W., Brewer, I.,
Cartographic Journal, 41: 217228. Cai, G. and Lengerich, E. (2004). Geovisualization for
knowledge construction and decision support. IEEE
Koua, E.L. (2005). Computational and visual support Computer Graphics and Applications, 24(1): 1317.
for exploratory geovisualization and knowledge
construction. PhD thesis. Utrecht University, Utrecht, McCormick, B.H., DeFanti, T.A. and Brown, M.D.
The Netherlands. (1987). Visualization in Scientic Computing A
Synopsis. IEEE Computer Graphics and Applications,
Kraak, M.-J. and Koussoulakou, A. (2004). A visual- 7(7): 6170.
ization environment for the spacetime cube. In:
Fisher, P.F. (ed.) Developments in Spatial Data Meng, L. (2004). About egocentric geovisualization. In:
Handling, 11th International Symposium on Spatial Proceedings of the 12th International Conference on
Data Handling, pp. 189200. BerlinHeidelberg: Geoinformatics, Gvle, Sweden, June 2004.
Springer Verlag. Meng, L. (2005). Egocentric design of map-based
Kreuseler, M. (2000). Visualization of geographically mobile services. The Cartographic Journal, 42(1):
related multidimensional data in virtual 3D scenes. 513.
Computers & Geosciences, 26: 101108. Miller, H.J. and Han, J. (eds) (2001). Geographic
Data Mining and Knowledge Discovery. London and
Kreuseler, M. and Schumann, H. (2002). A exible
New York: Taylor & Francis.
approach for visual data mining. IEEE Transactions
on Visualization and Computer Graphics, 8(1): Mller-Hannemann, M. (2001). Drawing trees, series
3951. parallel digraphs and lattices. In: Kaufmann, M. and
Wagner, D. (eds), Drawing Graphs Methods and for non-geographic information visualization.
Models. Lecture Notes in Computer Science, 2025: Cartography and Geographic Information Science,
4670. BerlinHeidelberg: Springer Verlag. 30(2): 99119.
National Visualization and Analytics Center (NVAC) Skupin, A. and Hagelman, R. (2005). Visualizing
(2005). Illuminating the Path: Creating the R&D demographic trajectories with self-organizing maps.
Agenda for Visual Analytics. Available at: http://nvac. Geoinformatica, 9(2): 159179.
pnl.gov/agenda.stm (last accessed 17 July 2007).
Slocum, T.A., Cliburn, D.C., Feddema, J.J. and
Newman, M.E.J. (2004). Who is the best connected Miller, J.R. (2003). Evaluating the usability of a
scientist? A study of scientic coauthorship networks. tool for visualizing the uncertainty of the future
In: Ben-Naim, E., Frauenfelder, H. and Toroczkai, Z. global water balance. Cartography and Geographic
(eds), Complex Networks, pp. 337370. Berlin Information Science, 30(4): 299317.
patenkov, O., Demar, U. and Krisp, J.M. (2007).
Nielsen, J. (1993). Usability Engineering. San Francisco: Self-organising maps for exploration of spatio-
Morgan Kaufmann Publishers. temporal emergency response data. In: Proceedings
of Geocomputation 2007. Maynooth, Ireland.
Plaisant, C. (2004). The challenge of information
visualization evaluation. In: Proceedings of the IEEE Stasko, J. and Zhang, E. (2000). Focus+context
Conference on Advanced Visual Interfaces AVI04, display and navigation techniques for enhancing
Gallipoli, Italy. radial, space-lling hierarchy visualizations. In:
Proceedings of InfoVis2000, IEEE Symposium on
Preece, J. Rogers, Y. and Sharp, H. (2002). Interaction
Information Visualization, pp. 5768. Salt Lake City,
Design: Beyond HumanComputer Interaction.
Utah, USA.
New York: John Wiley and Sons.
Suchan, T.A. (2002). Usability studies of geovisual-
Roberts, J.C. (2005). Exploratory visualization with mul-
ization software in the workplace. In: Proceedings
tiple linked views. In: Dykes, J., MacEachren, A.M.
of the National Conference for Digital Government
and Kraak, M.-J. (eds), Exploring Geovisualization,
Research, Los Angeles, USA.
pp. 159180. Amsterdam: Elsevier.
Takatsuka, M. (2001). An application of the self-
Robinson, A.C., Chen, J., Lengerich, E.J., Meyer, H.G.
organizing map and interactive 3-D visualization
and MacEachren, A.M. (2005). Combining usability
to geospatial data. In: Proceedings of the
techniques to design geovisualization tools for
Sixth International Conference on Geocomputation,
epidemiology. In: Proceedings of Auto-Carto 2005,
Brisbane, Australia.
Las Vegas, USA.
Takatsuka, M. and Gahegan, M. (2002). GeoVISTA
Seo, J. and Shneiderman, B. (2002). Interactively
Studio: a codeless visual programming environment
Exploring Hierarchical Clustering Results. IEEE
for geoscientic data analysis and visualization.
Computer, 35(7): 8086.
Computers & Geosciences, 28: 11311144.
Shneiderman, B. (1996). The eyes have it: a task by
Tobler, W. (2004). Thirty Five Years of Computer
data type taxonomy for information visualization.
Cartograms. Annals of the Association of American
IEEE Proceedings of Visual Languages. Boulder,
Geographers, 94(1): 5873.
Colorado, USA.
Tobn, C. (2002). Usability Testing for Improving Inter-
Shneiderman, B. (2001). Inventing discovery tools:
active Geovisualization Techniques. Working paper,
combining information visualization with data
Centre for Advanced Spatial Analysis, University
mining. In: Proceedings of the 12th International
College London, available at: http://www.casa.ucl.
Conference on Algorithmic Learning Theory. Lecture
ac.uk/working_papers/Paper45.pdf (last accessed
Notes in Computer Science, 2226: 1728. Berlin
13 July 2007).
Tobn, C. (2005). Evaluating geographic
Silipo, R. (2003). Neural networks. In: Berthold, M. and
visualization tools and methods: an approach
Hand, D.J. (eds), Intelligent Data Analysis, 2nd edn,
and experiment based upon user tasks. In:
pp. 26932. BerlinHeidelberg: Springer Verlag.
Dykes, J. MacEachren, A.M. and Kraak, M.-J.
Skupin, A. and Fabrikant, S.A. (2003). Spatialization (eds), Exploring Geovisualization, pp.645666.
methods: a cartographic research agenda Amsterdam: Elsevier.
Tomaszewski, B., Robinson, A.C., Weaver, C., Vesanto, J. (1999). SOM-based data visualization
Stryker, M. and MacEachren, A.M. (2007). Geovisual methods. Intelligent Data Analysis, 3: 111126.
analytics and crisis management. In: Proceedings
Ware, C. (2000). Information Visualization: Perception
of the 4th International ISCRAM Conference, Delft,
for Design. San Francisco, USA: Morgan Kaufmann
The Netherlands.
Publishers.
Tukey, J.W. (1977). Exploratory Data Analysis. Reading,
Ware, C. and Plumlee, M. (2005). 3D geovisu-
Massachusetts: Addison-Wesley.
alization and the structure of visual space. In:
Unwin, A. and Unwin, D. (1998). Exploratory spatial Dykes, J. MacEachren, A.M. and Kraak, M.-J.
data analysis with local statistics. The Statistician, (eds), Exploring Geovisualization, pp. 567576.
47(3): 415421. Amsterdam: Elsevier.
5
Availability of Spatial Data
Mining Techniques
Shashi Shekhar, Vijay Gandhi, Pusheng Zhang
and Ranga Raju Vatsavai
5.1. INTRODUCTION the National Cancer Institute, and the

United States Department of Transportation.
The explosive growth of spatial data and These organizations are spread across many
widespread use of spatial databases empha- application domains including ecology and
size the need for the automated discov- environmental management, public safety,
ery of spatial knowledge. Spatial data transportation, Earth science, epidemiology,
mining (Roddick and Spiliopoulou, 1999; and climatology.
Shekhar and Chawla, 2003) is the process Extracting interesting and useful pat-
of discovering interesting and previously terns from spatial datasets is more diffi-
unknown, but potentially useful patterns cult than extracting corresponding patterns
from spatial databases. The complexity of from traditional numeric and categorical
spatial data and intrinsic spatial relationships data. Specific features of spatial data that
limit the usefulness of conventional data preclude the use of general purpose data
mining techniques for extracting spatial mining algorithms are: (a) rich data types
patterns. Efficient tools for extracting infor- (e.g., extended spatial objects) (b) implicit
mation from geo-spatial data are crucial to spatial relationships among the variables,
organizations which make decisions based (c) observations that are not independent,
on large spatial datasets, including NASA, and (d) spatial autocorrelation among the
the National Imagery and Mapping Agency, features.
This chapter is organized as follows. In among spatial objects, such as overlap, inter-
section 5.2, we provide an overview of spatial sect, and behind are often implicit. Table 5.1
data. Section 5.3 presents important statis- lists non-spatial relationships and their cor-
tical concepts used in spatial data mining. responding spatial relationship. One possible
Spatial Data Mining techniques, the main way to deal with implicit spatial relationships
focus of this chapter, are explained in is to materialize the relationships into tra-
section 5.4. Specifically, we present major ditional data input columns and then apply
accomplishments in mining output patterns classical data mining techniques (Agrawal
known as predictive models, semi-supervised and Srikant, 1994; Jain and Dubes, 1988;
approaches, outliers, co-location rules, and Quinlan, 1993). However, the materialization
clustering. In section 5.5, we briefly review can result in loss of information. Another way
the computational processes for spatial data to capture implicit spatial relationships is to
mining techniques. Finally, in section 5.6, we develop models or techniques to incorporate
identify areas of spatial data mining where spatial information into the spatial data
further research is needed. This chapter does mining process. We discuss a few case studies
not discuss spatial statistics or algorithm- of such techniques in section 5.4.
level computational processes in depth as The representation of spatial data and use
these topics are beyond the scope of this of spatial operators has been standardized
chapter. by the Open GIS (OGIS) consortium for
interoperability of spatial applications, such
as Geographic Information Systems. OGIS
defines standard spatial data types which can
5.2. DATA INPUT be used in combination to represent a spatial
object. Some examples of OGIS data types
The data inputs of spatial data mining are include Point, Curve, Surface, and Geometry
more complex than the inputs of classical Collection. In addition to specifying data
data mining because they include extended types, the OGIS standard also includes three
objects such as points, lines, and polygons. categories of spatial operations: (a) basic
The data inputs of spatial data mining spatial operations which can to applied to all
have two distinct types of attributes: non-
spatial attribute and spatial attribute. Non-
spatial attributes are used to characterize
non-spatial features of objects, such as name, Table 5.1 Relationships among non-spatial
population, and unemployment rate for a city. data and spatial data
They are the same as the attributes used Non-spatial relationship Spatial relationship
in the data inputs of classical data mining. (explicit) (often implicit)
Spatial attributes are used to define the Arithmetic Set-oriented: union, intersection,
spatial location and extent of spatial objects membership,
(Bolstad, 2002). The spatial attributes of a Ordering Topological: meet, within,
spatial object most often include information overlap,
Is instance of Directional: North, NE, left,
related to spatial locations, e.g., longitude,
above, behind,
latitude and elevation, as well as shape. Subclass of Metric: e.g., distance, area,
Relationships among non-spatial objects perimeter,
are explicit in data inputs, e.g., arithmetic Part of Dynamic: update, create,
relation, ordering, is instance of, subclass of, destroy,
Membership of Shape-based and visibility
and membership of. In contrast, relationships
AVAILABILITY OF SPATIAL DATA MINING TECHNIQUES 65
geometry datatypes, e.g., to find the boundary by a Poisson process) using the average
of a spatial object, (b) operations to test distance between a point and its nearest
for topological relationship between objects, neighbor. For a random pattern, this average

e.g., to find if two spatial objects overlap, distance is expected to be 1/(2 density),
and (c) operations to perform spatial analysis, where density is the average number of
e.g., to calculate the shortest distance path points per unit area. If for a real process,
between two spatial objects. the computed distance falls within a certain
A recent topic of research is the representa- limit, then we conclude that the pattern is
tion of spatial data which have an associated generated by a random process; otherwise it
temporal aspect. A location based service is is a non-random process.
an example in which a service is offered
based on the location and time of an entity.
Current OGIS standards do not yet support Lattice
such systems. A lattice is a model for a gridded space
in a spatial framework. Here the lattice
refers to a countable collection of regular or
irregular spatial sites related to each other
5.3. STATISTICAL FOUNDATION via a neighborhood relationship. Several
spatial statistical analyses, e.g., the spatial
Readers of this handbook will be exposed to
autoregressive model and Markov random
more statistical foundations in later chapters.
fields can be applied on lattice data.
Here we address only the basic concepts
needed to follow the rest of this chapter.
Statistical models (Cressie, 1993) are often
used to represent observations in terms of
Geostatistics
Geostatistics deals with the analysis of spatial
random variables. These models can then
continuity and weak stationarity (Cressie,
be used for estimation, description, and pre-
1993), which is an inherent characteristic of
diction based on probability theory. Spatial
spatial datasets. Geostatistics provides a set
data can be thought of as resulting from
of statistics tools, such as kriging (Cressie,
observations on the stochastic process Z(s) :
1993), to the interpolation of attributes at
s D, where s is a spatial location and D is
unsampled locations.
possibly a random set in a spatial framework.
One of the fundamental assumptions of
Here we present three spatial statistical
statistical analysis is that the data samples
problems one might encounter: point process,
are independently generated: like successive
lattice, and geostatistics.
tosses of coin, or the rolling of a die.
However, in the analysis of spatial data,
Point process the assumption about the independence of
A point process is a model for the spatial samples is generally false. In fact, spatial
distribution of the points in a point pattern. data tends to be highly self correlated. For
Several natural processes can be modeled example, people with similar characteristics,
as spatial point patterns, e.g., positions of occupation and background tend to cluster
trees in a forest and locations of bird habitats together in the same neighborhoods. The
in a wetland. Spatial point patterns can be economies of a region tend to be similar.
broadly grouped into random or non-random Changes in natural resources, wildlife, and
processes. Real point patterns are often temperature vary gradually over space. The
compared with a random pattern (generated property of like things to cluster in space is so
White noiseno spatial autocorrelation Water depth variation across Marshland

0 0
10 10
20 20
30 30
40 40
50 50
60 60
70 70
80 80
0 20 40 60 80 100 120 140 160 0 20 40 60 80 100 120 140 160

nz-5372 nz-5372
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 10 20 30 40 50 60 70 80 90
(a) Attribute with an independent (b) Attribute with spatial autocorrelation

identical distribution
Figure 5.1 Attribute values in space with independent identical distribution and spatial
autocorrelation.
fundamental that geographers have elevated objects under study (e.g., urban, forest,
it to the status of the first law of geography: water) are often much larger than 30 m. As
Everything is related to everything else but a result, per-pixel-based classifiers, which
nearby things are more related than distant do not take spatial context into account,
things (Tobler, 1979). In spatial statistics, often produce classified images with salt and
an area within statistics devoted to the pepper noise. These classifiers also suffer in
analysis of spatial data, this property is terms of classification accuracy.
called spatial autocorrelation (Shekhar and The spatial relationship among locations in
Chawla, 2003). For example, Figure 5.1 a spatial framework is often modeled via a
shows the value distributions of an attribute contiguity matrix. A simple contiguity matrix
in a spatial framework for an independent may represent a neighborhood relationship
identical distribution (Figure 5.1(a)) and defined using adjacency, Euclidean distance,
a distribution with spatial autocorrelation etc. Example definitions of neighborhood
(Figure 5.1(b)). using adjacency include a four-neighborhood
Knowledge discovery techniques which and an eight-neighborhood. Given a uni-
ignore spatial autocorrelation typically form gridded spatial framework, a four-
perform poorly in the presence of spatial neighborhood assumes that a pair of locations
data. Often the spatial dependencies arise influence each other if they share an edge.
due to the inherent characteristics of the An eight-neighborhood assumes that a pair
phenomena under study, but in particular of locations influence each other if they share
they arise due to the fact that the spatial either an edge or a vertex.
resolution of imaging sensors are finer than Figure 5.2(a) shows a gridded spatial
the size of the object being observed. For framework with four locations, A, B,
example, remote sensing satellites have C, and D. A binary matrix representa-
resolutions ranging from 30 m (e.g., the tion of a four-neighborhood relationship is
Enhanced Thematic Mapper of the Landsat 7 shown in Figure 5.2(b). The row-normalized
satellite of NASA) to 1 m (e.g., the IKONOS representation of this matrix is called a
satellite from SpaceImaging), while the contiguity matrix, as shown in Figure 5.2(c).
A B C D A B C D
A 0 1 1 0 A 0 0.5 0.5 0
A B
B 1 0 0 1 B 0.5 0 0 0.5
C 1 0 0 1 C 0.5 0 0 0.5
C D
D 0 1 1 0 D 0 0.5 0.5 0
(a) Spatial framework (b) Neighbor relationship (c) Contiguity matrix
Figure 5.2 A spatial framework and its four-neighborhood contiguity matrix.
Other contiguity matrices can be designed 5.4.1. Predictive models

to model neighborhood relationships based
The prediction of events occurring at partic-
on distance. The essential idea is to specify
ular geographic locations is very important
the pairs of locations that influence each
in several application domains. Examples of
other along with the relative intensity of
problems which require location prediction
interaction. More general models of spatial
include crime analysis, cellular networking,
relationships using cliques and hypergraphs
and natural disasters such as fires, floods,
are available in the literature (Warrender and
droughts, vegetation diseases, and earth-
Augusteijn, 1999). In spatial statistics, spatial
quakes. Here we describe one such problem
autocorrelation is quantified using measures
domain and provide two spatial data mining
such as Ripleys K-function and Morans I
techniques for predicting locations, namely
(Cressie, 1993).
the Spatial Autoregression Model (SAR) and
A topic of recent interest in the field of
Markov Random Fields (MRF).
spatial statistics is multiscale modeling. Since
most physical and human processes vary
with spatial scale, multi-scale representation
An application domain
is very important. Much of the current work
We begin by introducing an example to
related to multi-scale modeling lacks a formal
illustrate the different concepts related to
statistical framework. However, Kolaczyk
location prediction in spatial data mining. We
et al. (2005) use statistical models called
are given data about two wetlands, named
mixlets which allow representation of spatial
Darr and Stubble, on the shores of Lake Erie
information at multiple scales.
in Ohio, USA in order to predict the spatial
distribution of a marsh-breeding bird, the red-
winged blackbird (Agelaius phoeniceus). The
data was collected from April to June in two
5.4. OUTPUT PATTERNS successive years, 1995 and 1996.
A uniform grid was imposed on the two
In this section, we present spatial data mining wetlands and different types of measurements
techniques for different output patterns: were recorded at each cell or pixel. In total,
predictive models, spatial clustering, semi- the values of seven attributes were recorded
supervised learning, spatial outliers, and at each cell. Domain knowledge is crucial
spatial co-location rules. in deciding which attributes are important
Nest sites for 1995 Darr location Vegetation distribution across the wetland
0
10 10
20 20
30 30
40 40
50 50
60 60
70 70
80 80
20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 160

nz = 5372
(a) Nest locations (b) Vegetation durability
Water depth variation across wetland Distance to open water

0 0
10 10
20 20
30 30
40 40
50 50
60 60
70 70
80 80
0 20 40 60 80 100 120 140 160 0 20 40 60 80 100 120 140 160
nz = 5372 nz = 5372
(c) Water depth (d) Distance to open water
Figure 5.3 (a) Learning dataset: the geometry of the Darr wetland and the locations of the
nests, (b) the spatial distribution of vegetation durability over the marshland, (c) the spatial
distribution of water depth, and (d) the spatial distribution of distance to open water.
and which are not. For example, vegeta- training data, and then tested on the remain-
tion durability was chosen over vegetation der of the data, called the testing data. In this
species because specialized knowledge about study a model was built using the 1995 Darr
the bird-nesting habits of the red-winged wetland data and then tested using the 1995
blackbird suggested that the choice of nest Stubble wetland data. In the learning data,
location is more dependent on plant structure, all the attributes are used to build the model
plant resistance to wind, and wave action than and in the training data, one value is hidden,
on the plant species. in this case the location of the nests. Using
An important goal is to build a model for knowledge gained from the 1995 Darr data
predicting the location of bird nests in the and the value of the independent attributes in
wetlands. Typically, the model is built using the test data, the goal is to predict the location
a portion of the data, called the learning or of the nests in the 1995 Stubble data.
Modeling spatial dependencies using the We refer to this equation as the Spatial
SAR and MRF models Autoregressive Model (SAR). Notice that
Several previous studies (Jhung and Swain, when = 0, this equation collapses to
1996; Solberg et al., 1996) have shown that the classical regression model. The bene-
the modeling of spatial dependency (often fits of modeling spatial autocorrelation are
called context) during the classification many: the residual error will have much
process improves overall classification lower spatial autocorrelation (i.e., systematic
accuracy. Spatial context can be defined by variation). With the proper choice of W , the
the relationships between spatially adjacent residual error should, at least theoretically,
pixels in a small neighborhood. In this have no systematic variation. If the spatial
section, we present two approaches to autocorrelation coefficient is statistically sig-
modeling spatial dependency: the SAR and nificant, then SAR will quantify the presence
MRF-based Bayesian classifiers. of spatial autocorrelation. It will indicate the
extent to which variations in the dependent
variable ( y) are explained by the average of
Spatial autoregression model neighboring observation values. Finally, the
The spatial autoregressive model decom- model will have a better fit, (i.e., a higher
poses a classifier fC into two parts, R-squared statistic).
namely spatial autoregression and logis-
tic transformation. We first show how
spatial dependencies are modeled using the
Markov random eld-based Bayesian
framework of logistic regression analysis. In
classiers
Markov random field-based Bayesian clas-
the spatial autoregression model, the spatial
sifiers estimate the classification model fC
dependencies of the error term, or, the
using MRF and Bayes rule. A set of random
dependent variable, are directly modeled in
variables whose interdependency relationship
the regression equation (Anselin, 1988). If
is represented by an undirected graph (i.e., a
the dependent values yi are related to each
symmetric neighborhood matrix) is called
other, then the regression equation can be
a Markov random field (Li, 1995). The
modified as
Markov property specifies that a variable
depends only on its neighbors and is
y = Wy + X + (5.1) independent of all other variables. The
location prediction problem can be modeled
in this framework by assuming that the class
Here W is the neighborhood relationship label, li = fC (si ), of different locations,
contiguity matrix and is a parameter si , constitutes an MRF. In other words,
that reflects the strength of the spatial random variable li is independent of lj
dependencies between the elements of the if W (si , sj ) = 0.
dependent variable. After the correction term The Bayesian rule can be used to predict
Wy is introduced, the components of the li from feature value vector X and neighbor-
residual error vector are then assumed to hood class label vector Li as follows:
be generated from independent and identical
standard normal distributions. As in the case
of classical regression, the SAR equation has Pr X li , Li Pr li Li

Pr li X, Li = .
to be transformed via the logistic function for Pr (X)
binary dependent variables. (5.2)
The solution procedure can estimate on neighboring classes. Logistic regression

Pr(li | Li ) from the training data, where Li and logistic SAR models belong to a
denotes a set of labels in the neighborhood more general exponential family. The expo-
of si , excluding the label at si , by examining nential family is given by Pr(u | v) =
the ratios of the frequencies of class labels exp (A(v ) + B(u, p) + vT u)u where u, and v
to the total number of locations in the spatial are location and label respectively. This expo-
framework. Pr(li | Li ) can be estimated nential family includes many of the common
using kernel functions from the observed distributions such as Gaussian, Binomial,
values in the training dataset. For reliable Bernoulli, and Poisson as special cases.
estimates, even larger training datasets are Experiments were carried out on the Darr
needed relative to those needed for the and Stubble wetlands to compare classi-
Bayesian classifiers without spatial context, cal regression, SAR, and the MRF-based
since we are estimating a more complex Bayesian classifiers. The results showed that
distribution. An assumption on Pr(li | Li ) the MRF models yield better spatial and
may be useful if the training dataset available classification accuracies over SAR in the
is not large enough. A common assumption prediction of the locations of bird nests.
is the uniformity of influence from all It was also observed that SAR predictions
neighbors of a location. For computational are extremely localized, missing actual nests
efficiency it can be assumed that only local over a large part of the marshlands (Shekhar
explanatory data X(si ) and neighborhood et al., 2002).
label Li are relevant in predicting class label
li = fC (si ). It is common to assume that all
interaction between neighbors is captured
5.4.2. Spatial clustering
via the interaction in the class label variable.
Many domains also use specific parametric Spatial clustering is a process of grouping
probability distribution forms, leading to a set of spatial objects into clusters so
simpler solution procedures. In addition, it that objects within a cluster have high
is frequently easier to work with a Gibbs similarity in comparison to one another, but
distribution specialized by the locally defined are dissimilar to objects in other clusters.
MRF through the HammersleyClifford For example, clustering is used to determine
theorem (Besag, 1974). the hot spots in crime analysis and disease
A more detailed theoretical and experi- tracking. Hot spot analysis is the process
mental comparison of these methods can be of finding unusually dense event clusters
found in Shekhar et al. (2002). Although across time and space. Many criminal justice
MRF and SAR classification have different agencies are exploring the benefits provided
formulations, they share a common goal, by computer technologies to identify crime
estimating the posterior probability distri- hot spots in order to take preventive strategies
bution: p(li | X). However, the posterior for such as deploying saturation patrols in hot
the two models is computed differently spot areas.
with different assumptions. For MRF the Spatial clustering can be applied to group
posterior is computed using Bayes rule. similar spatial objects together; the implicit
On the other hand, in logistic regres- assumption is that patterns in space tend to
sion, the posterior distribution is directly be grouped rather than randomly located.
fit to the data. One important difference However, the statistical significance of spatial
between logistic regression and MRF is that clusters should be measured by testing the
logistic regression assumes no dependence assumption in the data. The test is critical
before proceeding with any serious clustering of homogeneous Poisson processes: event-
analyses. to-nearest-event distances are proportional to
2 random variables, whose densities have a
substantial amount of probability near zero
(Cressie, 1993). Spatial clustering is more
Complete spatial randomness, cluster, statistically significant when the data exhibit
and decluster a cluster pattern rather than a CSR pattern or
In spatial statistics, the standard against decluster pattern.
which spatial patterns are often compared is Several statistical methods can be applied
a completely spatially random point process, to quantify deviations of patterns from a
and departures indicate that the pattern is complete spatial randomness point pattern
not distributed randomly in space. Complete (Cressie, 1993). One type of descriptive
spatial randomness (CSR) (Cressie, 1993) is statistic is based on quadrats (i.e., well
synonymous with a homogeneous Poisson defined area, often rectangular in shape).
process. The patterns of the process are Usually quadrats of random location and
independently and uniformly distributed over orientations in the quadrats are counted,
space, i.e., the patterns are equally likely to and statistics derived from the counters
occur anywhere and do not interact with each are computed. Another type of statistic is
other. However, patterns generated by a non- based on distances between patterns; one
random process can be either cluster patterns such type is Ripleys K-function (Cressie,
(aggregated patterns) or decluster patterns 1993).
(uniformly spaced patterns). After the verification of the statistical
To illustrate, Figure 5.4 shows realiza- significance of the spatial clustering, classical
tions from a completely spatially random clustering algorithms (Han et al., 2001) can
process, a spatial cluster process, and a be used to discover interesting clusters.
spatial decluster process (each conditioned
to have 80 points) in a square. Notice
in Figure 5.4(a) that the complete spatial
randomness pattern seems to exhibit some Clustering point process
clustering. This is not an unrepresentive real- As discussed in section 5.3, a point process
ization, but illustrates a well-known property is a model for the spatial distribution of
(a) Complete spatial (b) Cluster pattern (c) Decluster pattern

randomness (CSR)
pattern
Figure 5.4 Illustration of CSR, cluster, and decluster patterns.

Figure 5.5 Marked spatial point process. Spatial locations for different female chimpanzees
at the Gombe National Park, Tanzania.
the spatial points in a point pattern. A point Besags L-function (Besag, 1977), which is
process in which each of the spatial locations a modified version of Ripleys K-function
is marked with a unique label is called (Cressie, 1993), is used to quantify the
a marked spatial point process. Clustering second-order interaction between point pro-
of marked spatial point processes is an cesses. This measure provides the correlation
interesting problem in many application between the observed and expected pairs of
domains. For example, in behavioral ecology, points at a certain distance from each other.
ecologists are interested in finding clusters Based on the value of this measure, marked
of individual chimpanzees based on their point processes can be clustered hierarchi-
space usage, which usually consists of several cally, to produce a dendrogram or a block
spatial points for each individual. An example diagonal matrix, which can be analyzed by
of marked spatial point processes is shown in domain experts to find a threshold level to
Figure 5.5. identify proper clusters.
The problem of clustering marked spatial
point processes is a generalization of the
problem of clustering spatial points, where
5.4.3. Semi-supervised learning
instead of a single spatial location for
each category, we have multiple spatial The methods described in the previous
locations for each category. Each category section are examples of supervised learn-
is a spatial point process. Classical clustering algorithms. In supervised methods, the
ing approaches handle homogeneous spatial model is built using a training dataset.
points and hence cannot cluster marked For example, in a remote sensing image,
spatial point processes. A very limited training data will be a collection of
amount of research has been done in the area labeled pixels. Practically, it is very dif-
of clustering marked spatial point processes. ficult to collect labels for all training
(Han et al., 2001). data. Hence an approach which does not
A data mining technique for clustering require many labeled samples is needed.
marked spatial point processes is proposed Such an approach which uses less labeled
by (Shekhar et al., 2006). This algorithm is samples and a large number of unlabeled
based on the intuition that the intra-cluster samples is called semi-supervised learning
similarity must be significantly higher than (Vatsavai and Shekhar, 2005). Based on the
the inter-cluster similarity. During clustering, ExpectationMaximization (EM) algorithm,
Labeled samples Labeled samples Unlabeled samples
+ +
Bayesian Bayesian Clustering

Classification Classification (ex. K-means, EM)
Use EM
+
+ + +
+ +

+
Classification is
Unlabeled samples are Each cluster is
dependent on labeled
added manually classified
samples
Supervised Semi-supervised Un-supervised
(a) Supervised (b) Semi-supervised (c) Un-supervised
Figure 5.6 Illustration of different approaches in Classication. + and indicates labeled

data, o indicates unlabeled data.
maximum likelihood, and maximum a poste- classification model. Figure 5.7 shows clas-
riori classifiers, the semi-supervised method sification of satellite imagery into different
utilizes a small set of labeled and a large classes. Figure 5.7(a) is obtained by using
number of unlabeled training samples to 100 labeled data points in the training dataset.
build a model. The model obtained using only 20 labeled
Figure 5.6 illustrates the difference data points is shown in Figure 5.7(b). As
between different approaches used in it can be seen the model with a lesser
classification. The supervised approach number of labeled data points is poorer as
shown in Figure 5.6(a) requires many compared to the model with a greater number
labeled data, in this case + and to build of data points. However, the model with a
a model. An unsupervised approach does not lesser number of labeled data points can be
require any training dataset to build a model. improved by including unlabeled data points
The semi-supervised approach shown in and using a semi-supervised technique. The
Figure 5.6(c) uses a small number of labeled resulting model is shown in Figure 5.7(c).
and a large number of unlabeled datasets to
build a model.
A semi-supervised approach is better than
5.4.4. Spatial outliers
using a supervised approach with a smaller
number of labeled samples. Figure 5.7 Outliers have been informally defined as
shows an example which proves that observations in a dataset which appear to be
including an unlabeled dataset and using inconsistent with the remainder of that set
a semi-supervised approach improves the of data (Barnett and Lewis, 1994), or which
160 160 5 160 5
140 140 140

5
120 120 120
14
100 14 100 3 100
14 26
Band 5
Band 5
Band 5
12 3 12
26 26 13 12 3
13 80 43 13
80 43 80 43
29
29
60 2
60 29 2 60 2
40 40 40
20 20 20
1 1 1
0 0 0
0 20 40 60 80 100 120 140 160 0 20 40 60 80 100 120 140 160 0 20 40 60 80 100 120 140 160
Band 4 Band 4 Band 4
(a) Results with 100 labeled (b) Results with 20 labeled (c) Results with 20 labeled
samples, supervised samples, supervised and 80 unlabeled samples,
semi-supervised
Figure 5.7 Illustration of supervised and semi-supervised approach.
deviate so much from other observations so growing metropolitan area is a spatial outlier
as to arouse suspicions that they were gen- based on the non-spatial attribute house age.
erated by a different mechanism (Hawkins,
1980). The identification of global outliers
can lead to the discovery of unexpected Illustrative examples
knowledge and has a number of practical We use an example to illustrate the dif-
applications in areas such as credit card ferences among global and spatial outlier
fraud, athlete performance analysis, voting detection methods. In Figure 5.8(a), the
irregularity, and severe weather prediction. X-axis is the location of data points in
This section focuses on spatial outliers, i.e., one-dimensional space; the Y -axis is the
observations which appear to be inconsistent attribute value for each data point. Global
with their neighborhoods. Detecting spatial outlier detection methods ignore the spatial
outliers is useful in many applications of location of each data point and fit the
geographic information systems and spatial distribution model to the values of the non-
databases, including transportation, ecology, spatial attribute. The outlier detected using
public safety, public health, climatology, and this approach is the data point G, which
location-based services. has an extremely high attribute value 7.9,
A spatial outlier is a spatially referenced exceeding the threshold of + 2 = 4.49 +
object whose non-spatial attribute values 2 1.61 = 7.71, as shown in Figure 5.8(b).
differ significantly from those of other This test assumes a normal distribution
spatially referenced objects in its spatial for attribute values. On the other hand,
neighborhood. Informally, a spatial outlier is S is a spatial outlier whose observed value
a local instability (in values of non-spatial is significantly different than its neighbors
attributes) or a spatially referenced object P and Q.
whose non-spatial attributes are extreme
relative to its neighbors, even though the
attributes may not be significantly different Tests for detecting spatial outliers
from the entire population. For example, Tests to detect spatial outliers separate
a new house in an old neighborhood of a spatial attributes from non-spatial attributes.
Original data points Histogram of attribute values

8 9
G Data point
Fitting curve
7 S 8
7
6
Number of occurrence
6
Attribute values
5
P 5 2 +2
4
D 4
3
Q 3
2
2
L
1 1
0 0
0 2 4 6 8 10 12 14 16 18 20 2 0 2 4 6 8 10
Location Attribute values
(a) An example dataset (b) Histogram
Figure 5.8 A dataset for outlier detection.
Spatial attributes are used to characterize both locations may appear to be reasonable
location, neighborhood, and distance. Non- when examining the dataset non-spatially.
spatial attribute dimensions are used to Figure 5.9(a) shows a variogram cloud for
compare a spatially referenced object to its the example dataset shown in Figure 5.8(a).
neighbors. The spatial statistics literature This plot shows that two pairs (P, S) and
provides two kinds of bi-partite multidi- (Q, S) on the left-hand side lie above the main
mensional tests, namely graphical tests and group of pairs, and are possibly related to
quantitative tests. Graphical tests, which are spatial outliers. The point S may be identified
based on the visualization of spatial data, as a spatial outlier since it occurs in both
highlight spatial outliers. Example methods pairs (Q, S) and (P, S). However, graphical
include variogram clouds and Moran scatter- tests of spatial outlier detection are limited
plots. Quantitative methods provide a precise by the lack of precise criteria to distinguish
test to distinguish spatial outliers from the spatial outliers. In addition, a variogram
remainder of data. Scatterplots (Anselin, cloud requires non-trivial post-processing of
1994) are a representative technique from the highlighted pairs to separate spatial outliers
quantitative family. from their neighbors, particularly when
A variogram-cloud (Cressie, 1993) dis- multiple outliers are present, or density varies
plays data points related by neighborhood greatly.
relationships. For each pair of locations, A Moran scatterplot (Anselin, 1995) is
the square-root of the absolute difference a plot of a normalized attribute value
between attribute values at the locations (Z[ f ( i)] = (f (i) f )/f ) against the neigh-
versus the Euclidean distance between the borhood average of normalized attribute val-
locations are plotted. In datasets exhibiting ues (W Z), where W is the row-normalized

strong spatial dependence, the variance in (i.e., j Wi j = 1) neighborhood matrix,
the attribute differences will increase with (i.e., Wi j > 0 iff neighbor (i, j)). The
increasing distance between locations. Loca- upper left and lower right quadrants of
tions that are near to one another, but with Figure 5.9(b) indicate a spatial association of
large attribute differences, might indicate a dissimilar values: low values surrounded by
spatial outlier, even though the values at high value neighbors (e.g., points P and Q),
Variogram cloud Moran scatterplot

2.5 3
Square root of absolute difference of
Weighted neighbor z-score of

2 (Q,S) 2
attribute values
(P,S) 1
attribute values
1.5 Q
0
1
S
1
0.5
2
0 3
0 0.5 1 1.5 2 2.5 3 3.5 2 1.5 1 0.5 0 0.5 1 1.5 2 2.5
Pairwise distance Z-Score of attribute values
(a) Variogram cloud (b) Moran scatterplot
Figure 5.9 Variogram cloud and Moran scatterplot to detect spatial outliers.
and high values surrounded by low values A location (sensor) is compared to its
(e.g., point S). Thus we can identify points neighborhood using the function S(x) =
(nodes) that are surrounded by unusually high [ f (x) EyN(x) ( f ( y))], where f (x) is the
or low value neighbors. These points can be attribute value for a location x, N(x) is the
treated as spatial outliers. set of neighbors of x, and EyN(x) ( f ( y)) is
A scatterplot (Anselin, 1994) shows the average attribute value for the neighbors
attribute values on the X-axis and the average of x. The statistic function S(x) denotes the
of the attribute values in the neighborhood difference of the attribute value of a sensor
on the Y -axis. A least square regression located at x and the average attribute value
line is used to identify spatial outliers. of xs neighbors.
A scatter sloping upward to the right indicates Spatial statistic S(x) is normally distributed
a positive spatial autocorrelation (adjacent if the attribute value f (x) is normally dis-
values tend to be similar); a scatter sloping tributed. A popular test for detecting spatial
upward to the left indicates a negative spatial outliers for normally distributed f (x) can be
autocorrelation. The residual is defined as the described as follows: spatial statistic Zs(x) =
vertical distance (Y -axis) between a point P |(S(x) s )/s | > . For each location
with location (Xp , Yp ) to the regression line x with an attribute value f (x), the S(x) is
Y = mX + b, that is, residual = Yp the difference between the attribute value at
(mXp + b). Cases with standardized residuals, location x and the average attribute value of
standard = ( )/ , greater than 3.0 or xs neighbors, s is the mean value of S(x),
less than 3.0 are flagged as possible spatial and s is the value of the standard deviation
outliers, where and are the mean and of S(x) over all stations. The choice of
standard deviation of the distribution of the depends on a specified confidence level. For
error term . In Figure 5.10(a), a scatterplot example, a confidence level of 95 percent will
shows the attribute values plotted against the lead to 2.
average of the attribute values in neighboring Figure 5.10(b) shows the visualization of
areas for the dataset in Figure 5.8(a). The the spatial statistic method described above.
point S turns out to be the farthest from the The X-axis is the location of data points
regression line and may be identified as a in one-dimensional space; the Y -axis is
spatial outlier. the value of spatial statistic Zs(x) for each
Scatterplot Spatial statistic Zc(x) test

Average attribute values over neighborhood 7 4
6.5 S
3
6
P
5.5
2
5
Q
Zs(x)
4.5 1
4
3.5 S 0
3
1
2.5 P
2
Q
2
0 1 2 3 4 5 6 7 0 2 4 6 8 10 12 14 16 18 20
Attribute values Location
(a) Scatterplot (b) Spatial statistic Zs(x)
Figure 5.10 Scatterplot and Spatial Statistic Zs(x ) to Detect Spatial Outliers.
data point. We can easily observe that point In the iterative algorithms (Kou et al., 2003),
S has a Zs(x) value exceeding 3, and will be only one outlier is detected in each iteration,
detected as a spatial outlier. Note that the two and then its attribute value is modified in
neighboring points P and Q of S have Zs(x) subsequent iterations so that it does not have
values close to 2 due to the presence of a negative impact in detecting a new outlier.
spatial outliers in their neighborhoods. The median-based algorithm (Kou et al.,
Designing computationaly efficient tech- 2003) reduces the impact of the presence of
niques to find spatial outliers is important. data points with extreme high or low attribute
One efficient method is to compute the global values.
statistical parameters using a spatial join
(Shekhar et al., 2003). In this method, the
algorithm computes the algebraic aggregate
functions in a single scan of a spatial self-join
5.4.5. Spatial co-location rules
from a spatial dataset using a neighborhood
relationship. The computed values from Boolean spatial features are geographic
the algebraic aggregate functions can be object types which are either present or
used to validate the outlier measure of a absent at different locations in a two-
dataset. dimensional or three-dimensional metric
A drawback in most of the techniques to space, e.g., the surface of the Earth. Examples
detect multiple spatial outliers is that some of Boolean spatial features include plant
of the data points are misclassified, i.e., either species, animal species, road types, cancers,
some of the true spatial outliers are ignored crime, and business types. Co-location pat-
or some points are wrongly identified as terns represent the subsets of the Boolean
spatial outliers. This misclassification occurs spatial features whose instances are often
because most algorithms tend not to take into located in close geographic proximity. Exam-
account the effect of an outlier in the neigh- ples include symbiotic species, e.g., Nile
borhood of another outlier. To overcome this crocodile and Egyptian plover in ecology, and
problem, iterative algorithms and a median- frontage roads and highways in metropolitan
based non-iterative algorithm can be used. road maps.
90
80
70
60
50
40
30
20
10 River
Roads
0
0 10 20 30 40 50 60 70 80 90
Figure 5.11 (a) Illustration of point spatial co-location patterns. Shapes represent different
spatial feature types. Spatial features in sets {+, ,} and (o, ) tend to be located
together. (b) Co-location between roads and rivers. (Courtesy: Architecture Technology
Corporation).
Co-location rules are models to infer associations with support values larger than a
the presence of Boolean spatial features user given threshold. The purpose of mining
in the neighborhood of instances of other association rules is to identify frequent item
Boolean spatial features. For example, Nile sets for planning store layouts or marketing
Crocodiles Egyptian Plover predicts the campaigns. In the spatial co-location rule
presence of Egyptian Plover birds in areas mining problem, transactions are often not
with Nile Crocodiles. Figure 5.11(a) shows explicit. The transactions in market basket
a dataset consisting of instances of several analysis are independent of each other.
Boolean spatial features, each represented by Transactions are disjoint in the sense of not
a distinct shape. A careful review reveals sharing instances of item types. In contrast,
two co-location patterns, i.e., {+, } the instances of Boolean spatial features
and {, } are embedded in a continuous space and
Co-location rule discovery is a process share a variety of spatial relationships (e.g.,
to identify co-location patterns from large neighbor) with each other.
spatial datasets with a large number of
Boolean features. The spatial co-location
rule discovery problem looks similar to, Co-location rule approaches
but, in fact, is very different from the Approaches to discovering co-location rules
association rule mining problem (Agrawal can be categorized into two classes, namely
and Srikant, 1994) because of the lack spatial statistics, and data mining approaches.
of transactions. In market basket datasets, Spatial statistics-based approaches use mea-
transactions represent sets of item types sures of spatial correlation to characterize
bought together by customers. The support the relationship between different types of
of an association is defined to be the fraction spatial features. Measures of spatial cor-
of transactions containing the association. relation include the cross K-function with
Association rules are derived from all the Monte Carlo simulation (Cressie, 1993),
A1 C1 C1 A1 C1 C1 C1
A1 A1 A1
B1 B1 B1 B1 B1
A2 A2 A2 A2 A2
B2 B2 B2 B2 B2
C2 C2 C2 C2 C2
Example dataset, neighboring Reference feature = C Support(A,B) = 1 Support(A,B) = 2 Support(A,B) = min(2/2,2/2) = 1

instances of different features Transactions{{B1},{B2}} Support(B,C) = min(2/2,2/2) = 1
are connected support(A,B) = Support for (a,b) is not well defined i.e. order sensitive
(a) (b) (c) (d)
Figure 5.12 Example to illustrate different approaches to discovering co-location

patterns (a) Example dataset. (b) Transaction based approach. Support measure is
ill-dened and order sensitive. (c) A distance-based approach with k -neighbouring
class sets. (d) A distance-based approach with event-centric model.
mean nearest-neighbor distance, and spatial Transactions over space can be defined by
regression models. Computing spatial corre- a reference-feature centric model. Under
lation measures for all possible co-location this model, transactions are created around
patterns can be computationally expensive instances of one user-specified spatial feature.
due to the exponential number of candidate The association rules are derived using the
subsets given a large collection of spatial a priori (Agarwal et al., 1993) algorithm.
Boolean features. The rules formed are related to the reference
Data mining approaches can be further feature. For example, consider the spatial
divided into a clustering-based map over- dataset in Figure 5.12(a) with three feature
lay approach and association rule-based types, A, B and C, each of which has two
approaches. A clustering-based map overlay instances. The neighborhood relationships
approach treats every spatial attribute as between instances are shown as edges.
a map layer. The spatial clusters (regions) Co-locations (A, B) and (B, C) may be
of point-data in each layer are candidates considered to be frequent in this example.
for mining associations. Given X and Y as Figure 5.12(b) shows transactions created
sets of layers, a clustered spatial association by choosing C as the reference feature.
rule is defined as X Y (CS, CC%), Co-location (A, B) will not be found since
for X Y = , where CS is the it does not involve the reference feature.
clustered support, defined as the ratio of Generalizing the paradigm of forming rules
the area of the cluster (region) that satisfies related to a reference feature to the case
both X and Y to the total area of the where no reference feature is specified is non-
study region S. CC% is the clustered trivial. Also, defining transactions around
confidence, which can be interpreted as the locations of instances of all features may
percentage of area of clusters (regions) of yield duplicate counts for many candidate
X that intersect with the area of clusters associations.
(regions) of Y . A distance-based approach was proposed
Association rule-based approaches can be concurrently by Morimoto (2001) and
divided into transaction- and distance-based Shekhar and Huang (2001). Morimoto
approaches. Transaction-based approaches defined distance-based patterns called
focus on defining transactions over space so k-neighboring class sets, in which instances
that an a priori-like algorithm can be used. of objects are grouped together based on
their Euclidean distance from each other. In of type B in their neighborhoods. The
Morimotos work, the number of instances conditional probability for the co-location
for each pattern is used as the prevalence rule is: spatial feature A at location l
measure, which does not possess an spatial feature type B in neighborhood is
anti-monotone property by nature. Since anti- 100%. This yields a well-defined prevalence
monotonicity is required for such algorithms, measure (i.e., support) without the need for
Morimoto used a non-overlapping constraint transactions. Figure 5.12(d) illustrates that
to get the anti-monotone property for the event-centric model will identify both
this measure. Also, it is possible that the (A, B) and (B, C) as frequent patterns.
instances of a k-neighboring class set are Prevalence measures and conditional prob-
different depending on the order the class ability measures, called interest measures, are
is added into the class set. This in turn defined differently in different models, as
yields different values of support of a summarized in Table 5.2. The transaction-
given colocation. Figure 5.12(c) shows based and distance-based k-neighboring class
two possible partitions for the dataset of sets materialize transactions and thus can
Figure 5.12(a), along with the supports for use traditional support and confidence mea-
co-location (A, B). sures. The event-centric approach defined
The distance-based approach by new transaction free measures, e.g., the
Shekhar and Huang (2001) eliminates participation index (see Shekhar and Huang
the non-overlapping-instance constraint. (2001) for details).
Their event-centric model finds subsets To find co-locations, much of the time is
of spatial features likely to occur in a spent in computing joins to identify instances
neighborhood around instances of given of candidate co-location patterns. To decrease
subsets of event types. For example, let this computation time, a partial-join based
us determine the probability of finding at approach (Yoo, 2004) or a join-less approach
least one instance of feature type B in the (Yoo, 2006) can be used. In the partial-
neighborhood of an instance of feature type join based approach, the number of instance
A in Figure 5.12(a). There are two instances joins for identifying candidate co-locations
of type A and both have some instance(s) are minimized by transactionizing a spatial
Table 5.2 Interest measures for different co-location approaches

Model Items Transactions Interest measures for C 1 C 2 Algorithm
dened by Prevalence Conditional probability
Transaction based Predicates on Instances of Fraction of Pr (C 2 is true for an A priori
reference reference feature instance of instance of reference
and relevant C 1 and C 2 reference feature features given C 1 is
features involved with with C 1 C 2 true for that
instance of reference
feature)
Distance-based Boolean A partitioning of Fraction of Pr (C 2 in a partition Partition-based
k -neighboring feature types spatial dataset partitions with given C 1 in that
class sets C1 C2 partition)
Distance-based Boolean Neighborhoods of Participation index Pr (C 2 in a Join-based and
event-centric feature types instances of of C 1 C 2 neighborhood of C 1) Join-less
feature types
dataset under a neighbor relationship and sequence of observations at different time

tracing only residual neighborhood instances slots, e.g., a collection of monthly tem-
cut apart via the transactions. The key compo- peratures from 19512000 in Minneapolis.
nent is identifying instances of co-locations Finding locations where climate attributes
split across explicit transactions. are highly correlated is frequently used to
The join-less approach uses an instance- retrieve interesting relationships among spa-
look-up scheme instead of an expensive spatial objects of Earth science data. For exam-
tial join for identifying co-location instances. ple, such queries are used to identify the land
Without any loss of co-location instances, this locations whose climate is severely affected
approach is more efficient and scalable for by El Nio. However, such correlation-based
dense data than the join-based method. queries are computationally expensive due to
the large number of spatial points, e.g., more
than 250k spatial cells on the Earth at a 0.5
degree by 0.5 degree resolution, and the high
5.5. COMPUTATIONAL PROCESS dimensionality of sequences, e.g., 600 for the
19512000 monthly temperature data.
Many generic algorithmic strategies have A spatial indexing approach proposed
been generalized to apply to spatial data by Zhang et al. (2003) exploits spatial
mining. For example, as shown in Table 5.3, autocorrelation to facilitate correlation-based
algorithmic strategies, such as divide-and- queries. The approach groups similar time
conquer, filter-and-refine, ordering, hierarchi- series together based on spatial proximity
cal structure, and parameter estimation, have and constructs a search tree. The queries are
been used in spatial data mining. processed using the search tree in a filter-and-
In spatial data mining, spatial autocorre- refine style at the group level instead of at the
lation and low dimensionality in space (e.g., time series level. Algebraic analyses using
23) provide more opportunities to improve cost models and experimental evaluations
computational efficiency than classical data showed that the proposed approach saves a
mining. NASA Earth observation systems large portion of computational cost, ranging
currently generate a large sequence of global from 40% to 98% (see Zhang et al. (2003)
snapshots of the Earth, including various for details).
atmospheric, land, and ocean measurements An important task in most of the spatial
such as sea surface temperature, pressure, data mining techniques is to estimate the
precipitation, and net primary production. values of parameters. Inclusion of an auto-
Each climate attribute in a location has a correlation parameter for spatial data mining
increases the computation cost. For example,
in the SAR model, we have to estimate the
Table 5.3 Algorithmic strategies in spatial value of the spatial autocorrelation parameter,
data mining.
and value of the regression coefficient .
Generic Spatial data mining
Estimation of such parameters is done using
Divide-and-conquer Space partitioning maximum likelihood (ML) and Bayesian
Filter-and-rene Minimum-bounding-rectangle
based techniques. A number of algorithms
(MBR)
Ordering Plane sweeping, Space-lling used for parameter estimation in a SAR
curves model are provided in Table 5.4.
Hierarchical structures Spatial index, tree matching Maximum likelihood based SAR solution
Parameter estimation Parameter estimation with spatial involves computation of two terms; a cheaper
autocorrelation
sum-of-squares error (SSE) term and a
Table 5.4 Classication of algorithms for spatial autoregression model (Celik et al., 2006)
Method used Exact Approximate
Maximum Applying direct sparese matrix
likelihood algorithms, eigenvalue based ML based matrix exponential specication, graph theory approach, Taylor
1-D surface series approximation, Chebyshev polynomial approximation method,
Partitioning semiparametric estimates, characteristic polynomial approach, double
bounded likelihood estimator, upper and lower bounds via divide and
conquer, spatial autoregression local estimation
Bayesian None Bayesian matrix exponential specication, Markov Chain Monte Carlo
(MCMC)
computationally expensive term, called the It is possible to materialize the implicit

likelihood function, which involves a number relationships into traditional data input
of computations of determinant of a large columns and then apply classical data
matrix. ML-based solutions can be divided mining techniques (Quinlan, 1993; Barnett
into exact and approximate solutions based and Lewis, 1994; Agrawal and Srikant,
on how they compute the computationally 1994; Jain and Dubes, 1988). Another
intensive term. Exact solutions suffer from way to deal with implicit relationships
high computational complexities and mem- is to use specialized spatial data mining
ory requirements. Approximate solutions are techniques, e.g., spatial autoregression and
computationally feasible but many of these co-location mining. However, the exist-
formulations still suffer from large memory ing literature does not provide guidance
requirements. One way to reduce the compu- regarding the choice between classical data
tational complexity of exact SAR solutions mining techniques and spatial data mining
is to reduce the number of computation of techniques to mine spatial data. New research
a large martix in its likelihood function. is needed to compare the two sets of
This is done by setting an upper bound on approaches in effectiveness and computa-
the spatial autocorrelation parameter (Celik tional efficiency.
et al., 2006).
5.6.2. Spatial interest measures

5.6. RESEARCH NEEDS The interest measures of patterns in spatial
data mining are different from those in
In this section, we discuss some areas where
classical data mining, especially regarding
further research is needed in spatial data
the four important output patterns shown in
mining.
Table 5.5.
For a two-class problem, the standard way
to measure classification accuracy is to cal-
5.6.1. Comparison of classical data
culate the percentage of correctly classified
mining techniques with
objects. However, this measure may not be
spatial data mining
the most suitable in a spatial context. Spatial
techniques
accuracy how far the predictions are from
As discussed in section 5.2, relationships the actuals is equally important in this
among spatial objects are often implicit. application domain due to the effects of the
Table 5.5 Interest measures of patterns for classical data mining and spatial data mining
Classical data mining Spatial data mining
Predictive model Classication accuracy Spatial accuracy
Cluster Low coupling and high cohesion in feature space Spatial continuity, unusual density, boundary
Outlier Different from population or neighbors in feature Signicant attribute discontinuity in geographic
space space
Association Subset prevalence, Spatial pattern prevalence
Pr [B T | A T , T : a transaction] Pr [B N (A) | N : neighborhood ] cross
Correlation K -Function
P Legend
A P P A P A = nest location
A = actual nest in pixel
P P
P = predicted nest in pixel
A A A A A A
(a) (b) (c) (d)
Figure 5.13 (a) The actual locations of nests. (b) Pixels with actual nests, (c) Location
predicted by a model. (d) Location predicted by another model. Prediction (d) is spatially
more accurate than (c).
discretizations of a continuous wetland into

5.6.3. Spatio-temporal data mining
discrete pixels, as shown in Figure 5.13.
Figure 5.13(a) shows the actual locations of Spatio-temporal data mining is done to
nests and Figure 5.13(b) shows the pixels extract patterns which have both spatial and
with actual nests. Note the loss of information temporal dimensions. Two examples where
during the discretization of continuous space spatio-temporal data mining could be useful
into pixels. Many nest locations barely fall are in a transportation network, to detect
within the pixels labeled A and are quite patterns of vehicle movement; and in a
close to other blank pixels, which represent location based service, where a service can
no-nest. Now consider the two predictions be offered to a customer by predicting his
shown in Figure 5.13(c) and (d). Domain sci- future location.
entists prefer prediction (d) over (c), since the Consider a location-based service. It relies
predicted nest locations are closer on average on tracking the positions of a mobile object.
to some actual nest locations. However, Since the positions change continuously,
the classification accuracy measure cannot large volumes of updates are required on the
distinguish between Figure 5.13(c) and (d) database side. Mining the frequently used
since spatial accuracy is not incorporated in paths of a mobile object will reduce the
the classification accuracy measure. Hence, number of updates required on the database.
there is a need to investigate proper measures The main challenge here is to reduce the
for location prediction to improve spatial communication between the mobile object
accuracy. and the system (Civilis and Jensen, 2005).
Finding spatio-temporal sequential patterns 5.6.5. Modeling semantically

is another research area in spatio-temporal rich spatial properties, such
data mining. The traditional sequential pat- as topology
tern mining algorithms are not applicable for
The spatial relationship among locations in
spatio-temporal data. A recent algorithm to
a spatial framework is often modeled via
find such patterns is given in Cao and Cheung
a contiguity matrix using a neighborhood
(2005).
relationship defined using adjacency and
Another challenge in spatio-temporal data
distance. However, spatial connectivity
mining is to find co-evolving spatial pat-
and other complex spatial topological
terns. A spatially co-located pattern repre-
relationships in spatial networks are
sents a pattern in which the instances are
difficult to model using the continuity
often located in close geographic proximity.
matrix. Research is needed to evaluate the
Co-evolving spatial patterns are co-located
value of enriching the continuity matrix
spatial patterns whose temporal occurences
beyond the neighborhood relationship.
are correlated with a special time series. For
Another area with research potential is
example, droughts and fires in Australia show
modeling of 3D topographic data (Penninga,
similar variation as El Nio index values over
2005).
the last 50 years (Taylor, 1998). Finding co-
evolving spatial patterns is computationally
expensive. Most of the methods in the
literature do not work well for co-evolving
5.6.6. Statistical interpretation
spatial patterns because they do not consider
models for spatial patterns
the temporal domain of the co-location
pattern. An efficient algorithm which takes Spatial patterns, such as spatial outliers
temporal domain into account is proposed by and co-location rules, are identified in
Rogers et al. (2006). the spatial data mining process using
unsupervised learning methods. There is
a need for an independent measure of
the statistical significance of such spatial
patterns. For example, we may com-
5.6.4. Improving computational
pare the co-location model with dedicated
efciency
spatial statistical measures, such as Ripleys
Mining spatial patterns is often computation- K-function, characterize the distribution of
ally expensive. For example, the estimation the participation index interest measure
of the parameters for the spatial autore- under spatial complete randomness using
gressive model is an order of magnitude Monte Carlo simulation, and develop a
more expensive than that for linear regression statistical interpretation of co-location rules
in classical data mining. Similarly, the co- to compare the rules with other patterns in
location mining algorithm is more expensive unsupervised learning.
than the a priori algorithm for classical asso- Another challenge is the estimation of the
ciation rule mining (Agrawal and Srikant, detailed spatial parameters in a statistical
1994). Research is needed to reduce the model. Research is needed to design effective
computational costs of spatial data mining estimation procedures for the continuity
algorithms by a variety of approaches includ- matrices used in the spatial autoregres-
ing the classical data mining algorithms as sive model and Markov random field-based
potential filters or components. Bayesian classifiers from learning samples.
5.6.7. Effective visualization of 5.7. SUMMARY

spatial relationships
This chapter discussed major research
Visualization in spatial data mining is useful
accomplishments and techniques in spatial
to identify interesting spatial patterns. As
data mining, especially those related to
we discussed in section 5.2, the data inputs
four important output patterns: predictive
of spatial data mining have both spatial
models, spatial outliers, spatial co-location
and non-spatial features. To facilitate the
rules, and spatial clusters. Research needs
visualization of spatial relationships, research
in the area of spatial data mining were also
is needed on ways to represent both spatial
identified.
and non-spatial features.
For example, many visual representations
have been proposed for spatial outliers.
However, we do not yet have a way to ACKNOWLEDGMENTS
highlight spatial outliers within visualiza-
tions of spatial relationships. For instance, This work was supported in part by
in variogram cloud (Figure 5.9(a)) and the Army High Performance Computing
scatterplot (Figure 5.10(b)) visualizations, Research Center under the auspices of the
the spatial relationship between a single Department of the Army, Army Research
spatial outlier and its neighbors is not Laboratory cooperative agreement number
obvious. It is necessary to transfer the DAAD19-01-2-0014, the content of which
information back to the original map in does not necessarily reflect the position or
geographic space to check neighbor rela- the policy of the government, and no official
tionships. Since a single spatial outlier endorsement should be inferred.
tends to flag not only the spatial location We are particularly grateful to our collabo-
of local instability but also its neighboring rators Prof. Vipin Kumar, Prof. Paul Schrater,
locations, it is important to group flagged Dr. Sanjay Chawla, Dr. Chang-Tien Lu,
locations and identify real spatial outliers Dr. Weili Wu, and Prof. Uygar Ozesmi for
from the group in the post-processing their various contributions. We also thank
step. James Kang, Mete Celik, Jin Soung Yoo,
Betsy George and anonymous reviewers for
their valuable feedback on earlier versions of
this chapter. We would also like to express
5.6.8. Preprocessing spatial our thanks to Kim Koffolt for improving the
data readability of this chapter.
Spatial data mining techniques have been

widely applied to the data in many appli-
cation domains. However, research on the REFERENCES
preprocessing of spatial data has lagged
behind. Hence, there is a need for pre- Agarwal, T., Imielinski, R. and Swami, A. (1993).
Mining Association Rules between sets of items
processing techniques for spatial data to in large databases. In: Proc. of the ACM SIGMOD
deal with problems such as treatment of Conference on Management of Data, Washington,
missing location information and impre- D.C., May 1993.
cise location specifications, cleaning of Agrawal, R. and Srikant, R. (1994). Fast algorithms for
spatial data, feature selection, and data Mining Association Rules. In: Proc. of Very Large
transformation. Databases, May 1994.
Anselin, L. (1988). Spatial Econometrics: Methods and Li, S. (1995). Markov random eld modeling. Computer
Models. Dordrecht, Netherlands: Kluwer. Vision. Berlin: Springer Verlag.
Anselin, L. (1994). Exploratory spatial data analysis and Morimoto, Y. (2001). Mining frequent neighboring class
geographic information systems. In: Painho, M. (ed.), sets in spatial databases. In: Proc. ACM SIGKDD
New Tools for Spatial Analysis, pp. 4554. International Conference on Knowledge Discovery
and Data Mining.
Anselin, L. (1995). Local indicators of spatial
association: LISA. Geographical Analysis, 27(2): Mamoulis, N., Cao, H. and Cheung, D.W. (2005).
93115. Mining frequent spatio-temporal sequential patterns.
Fifth IEEE International Conference on Data Mining.
Barnett, V. and Lewis, T. (1994). Outliers in Statistical
Data. 3rd edn. New York: John Wiley. Penninga, F. (2005). 3D Topographic Data Modelling:
Why Rigidity Is Preferable to Pragmatism.
Besag, J.E. (1974). Spatial interaction and statistical
analysis of latice systems. Journal of Royal Statistical Quinlan, J. (1993). C4.5: Programs for Machine
Society, Ser. B, 36: 192236. Learning. New York: Kaufmann Publishers.
Besag, J.E. (1977). Comments on Ripleys paper. Roddick, J.-F. and Spiliopoulou, M. (1999).
Journal of the Royal Statistical Society. A bibliography of temporal, spatial and
spatio-temporal data mining research. SIGKDD
Bolstad, P. (2002). GIS Fundamentals: A First Text on Explorations, 1(1): 3438.
GIS. Eider Press.
Rogers, J.P., Shine Celik, J.A. and Shekhar, S.
Celik et al. (2006). NORTHSTAR: A Parameter (2006). Discovering Emerging Spatio-Temporal Co-
Estimation Method for the Spatial Auto-regression occurrence Patterns: A Summary of Results.
Model.
Shekhar, S. and Chawla, S. (2003). A tour of spatial
Civilis, S.P.A. and Jensen, C.S. (2005). Techniques databases. New York: Prentice Hall.
for efcient road-network-based tracking of moving
Shekhar, S. and Huang, Y. (2001). Co-location
objects. IEEE Transaction on Knowledge and Data
rules mining: a summary of results. Proc. of
Engineering, 17(5).
Spatio-temporal Symposium on Databases.
Cressie, N.A. (1993). Statistics for Spatial Data, Shekhar, S., Lu, C.T. and Zhang, P. (2003).
New York: Wiley. A unied approach to detecting spatial outliers.
Han, J., Kamber, M. and Tung, A. (2001). Spatial GeoInformatica, 7(2).
clustering methods in data mining: a survey. In: Shekhar, S., Schrater, P.R., Vatsavai, R.R., Wu, W. and
Miller, H. and Han, J. (eds). Geographic Data Mining Chawla, S. (2002). Spatial contextual classication
and Knowledge Discovery. New York: Taylor and and prediction models for mining geospatial data.
Francis. IEEE Transaction on Multimedia, 4(2).
Hawkins, D. (1980). Identication of Outliers. New Shekhar, S., Srivastava, J., Mane, A. and Murray, C.
York: Chapman and Hall. (2005). Spatial clustering of chimpanzee locations for
Jain, A. and Dubes, R. (1988). Algorithms for Clustering neighborhood identication. Fifth IEEE International
Data. New York: Prentice Hall. Conference on Data Mining, pp. 773740.
Jhung, Y. and Swain, P.H. (1996). Bayesian contex- Solberg, A. H., Taxt, T. and Jain, A. (1996). A Markov
tual classication based on modied M-estimates random eld model for classication of multisource
and Markov random elds. IEEE Transaction satellite imagery. IEEE Transaction on Geoscience
on Pattern Analysis and Machine Intelligence, and Remote Sensing, 34(1): 100113.
34(1): 6775. Taylor, G. H. (xxxx). Impacts of El Nino on Southern
Kolaczyk, G.S., Eric, D. and Ju Junchang, J. Oscillation on the Pacic Northwest.
(2005). Multiscale, Multigranular Statistical Image Tobler, W.R. (1979). In: Gale and Olsson (eds), Cellular
Segmentation. Geography, Philosophy in Geography. Dordrecht:
Reidel.
Kou, Y., Lu, C.T. and Chen, D. (2003). Algorithms
for spatial outlier detection. IEEE International Vatsavai, T.E.B. and Shekhar, S. (2005).
Conference on Data Mining. A semi-supervised learning method for remote
sensing data mining. IEEE International Symposium on Advances in Geographic Information

Conference on Tools with Articial Intelligence, Systems.
pp. 207211.
Yoo, S.S. (2006). A join-less approach for mining spatial
Warrender, C. E. and Augusteijn, M. F. (1999). Fusion co-location patterns. IEEE Transaction on Knowledge
of image classications using Bayesian techniques and Data Engineering.
with Markov rand elds. International Journal of
Zhang, P., Huang, Y., Shekhar, S. and Kumar, V. (2003).
Remote Sensing, 20(10): 19872002.
Exploiting spatial auto-correlation to efciently
Yoo, S.S. (2004). A partial join approach for mining process correlation-based similarity queries. In: Proc.
co-location patterns. In: Proc. 12th International 8th Intl. Symp. on Spatial and Temporal Databases.
6
Spatial Autocorrelation
Marie-Jose Fortin and Mark R.T. Dale
6.1. INTRODUCTION to test whether nearby objects tend to have

similar attributes or to be more clustered
Objects in natural systems (e.g., tree species (Figure 6.1(a)) than expected from random-
in a forest) are rarely randomly distributed ness alone (Figure 6.1(b)). The presence of
over space. In fact, they usually have spatial structure in quantitative data means
some degree of patchiness (i.e., they are that similarity varies with distance between
spatially clustered). Spatial aggregation of the locations and how this variation is
objects produces a variety of distinct spatial affected by distance is known as the structure
patterns that can be characterized by the of the variables spatial autocorrelation. In
size and shape of the aggregations, and natural systems, it is the norm to have
can be quantified according to the degree a mosaic of patches with different spatial
of similarity between the objects in their autocorrelation structures (Figure 6.1). As
attributes or quantitative values. These prop- spatial structures have their own intensity
erties of spatial patterns can be indicative (magnitude) and size (extent) that make
of the underlying processes and factors that them distinct, they are usually easy to
generate and modify them through time. detect. In fact, it is the presence of spatial
This is why in most disciplines (geography, patterns that creates scale; if there were
economics, ecology, evolution, epidemiol- only spatial randomness around us there
ogy, environmental science, genetics, etc.), would be no need to determine the spatial
the first step toward the understanding of sampling design (Delmelle, in this volume;
phenomena is to determine whether the actual Fortin et al., 1989). We therefore need to
locations (coordinates) of observational data detect the spatial patterns and determine
matter in explaining the spatial arrangement. the scope of their temporal and spatial
So, the primary quest is to investigate and scales.
(a) (b) (c)
Figure 6.1 Spatial patterns. (a) Positive spatial autocorrelation where cells with similar
values (gray tones) are nearby forming a patch. (b) Spatial randomness. (c) Negative spatial
autocorrelation where nearby cells have dissimilar values showing spatial repulsion.
Just as there is no smoke without fire, geomorphology, topography, and hydrology.

there is no spatial pattern without the under- Similarly, we refer loosely to processes as
lying processes that create it. Hence, spatial any events (such as disturbance, dispersal,
patterns in data can act as indicators of species interactions) that change the spatial
the processes that have occurred over a pattern or the state of the variable under
given region. If it was simple and there study. As illustrated in Figure 6.2, seed spatial
was a straightforward match between the aggregation can be mainly due to their need
spatial pattern and the generating process, we for a specific soil type. This will be a case
could identify and understand immediately of spatial dependency where the seeds are
the phenomenon under study. Of course patchy at the scale of the study area but
it is not that simple. In fact, it is quite not necessarily at the scale of the soil patch.
complex because several processes can take Seeds abundance at a given location i, j can
place through time each operating at a be modeled by a regression function of the
given spatial and temporal scale (Fortin and effects of environmental factors where the
Dale, 2005; Green et al., 2005). Moreover, error terms (i, j ) are independent:
spatial patterns are shaped by a sequence
of processes, all varying in duration and
seedsi,j
in intensity. Consequently, data are the
end-product of an amalgam of interacting
= f (soili,j , moisturei,j ,topographyi,j , etc.)
processes (Fortin and Dale, 2005; Wagner
and Fortin, 2005).
+i,j .
The factors and processes that affect data
can be coarsely divided into two kinds:
those that induce spatial dependence (most Non-uniform seed dispersal will also
spatially distributed environmental factors) result in seed patchiness (Figure 6.2). Seed
and those that generate spatial autocorrelation patchiness is due to the process of dispersal
(Figure 6.2). In our grouping of environmen- and refers to the degree of spatial autocor-
tal factors, we include all physical initial relation of the seeds. Here, the distribution
conditions of a region based on geology, of seeds can be modeled, including the
SPATIAL AUTOCORRELATION 91
(a)
(b)
Figure 6.2 Sources of spatial structures. (a) Spatial dependence where the spatial
distribution of environmental factors (here soil types A and B) constrained seeds spatial
distribution. The gray polygons represent different soil types where black seeds (circles) can
grow only soil type A (light gray polygon) and white seeds on soil type B (dark gray
polygon). (b) Spatial autocorrelation, on top of the spatial dependence described in (a),
where the seeds are dispersed by the trees.
effects of the environmental factors, and Henebry, 1995):

adding the effect of spatial dependence ()
among seed abundance values as function
of distance (d((i,j),(i,j)) ) between a location seedsi,j
(i, j) and nearby locations (i, j), as an
autoregressive component (Anselin, in this = f (soili,j ,moisturei,j , topographyi,j ,etc)
volume; Lichstein et al., 2002), and having
independent errors (i, j ): +(d(i,j,i,j) i,j +i,j ).
seedsi,j This effect results in a statistical problem

because the error terms are not independent
= f (soili,j ,moisturei,j , topographyi,j ,etc.) of locations. Spatially dependent errors
impair the use of both parametric significance
+d((i,j),(i,j)) (seedsi,j )+i,j . testing and randomization tests (Cliff and
Ord, 1981; Fortin and Dale, 2005; Fortin and
Jacquez, 2000; Haining, 2003).
As in the case of spatial dependence of the Ideally, to understand natural systems we
environmental variables, the seed dispersal would like to be able to separate the spatial
process created spatial patchiness at the scale structure due to the environmental factors
of the study area, as well as at the scale of from that due to spatial autocorrelation
the soil type. generated by the processes themselves. This
Furthermore, seed abundance as a func- worthwhile task is complicated because there
tion of environmental factors and pro- are feedback effects between existing spatial
cesses and the errors could include some patterns and processes that act on them
degree of spatial dependency (Haining, 2003; that modify both the spatial patterns and
the processes (Wagner and Fortin, 2005). statistics quantify the degree of self-similarity
These legacies of the spatial patterns on of a variable as a function of distance.
the processes (Peterson, 2002) can either These spatial statistics assume that, within
promote the spread of disturbances and the study area, the parameters of the function
disease or impede animal movement (e.g., defining the underlying process, such as
fragmentation due to roads). The result of the mean and the variance, are constant
the sequence of processes and feedbacks are regardless of the distance and direction
included in the observed data (Haining, this between the sampling locations. This prop-
volume). The question is then: At which erty of the random function is known as
scale should we spatial analyze the data spatial stationarity (Cressie, 1993). Then the
when it is the scale itself that we want to goal of spatial statistics is to test the null
determine? hypothesis of absence of spatial pattern.
Spatial statistics will save us, right? For each spatial statistic spatial pattern
No! is either spatial aggregation or segregation
Spatial statistics were developed to quan- (Ripleys K; join count statistics) or spatial
tify the degree of spatial aggregation (join autocorrelation (Morans I and Gearys c).
count, Ripleys K), spatial autocorrelation The null hypothesis implies that nearby
(Morans I, Gearys c) or spatial variance locations (or attributes, measures) do not
(semi-variance ; see Atkinson and Lloyd, affect one another such that there is indepen-
in this volume) over a study area where the dence and spatial randomness (Figure 6.2(b)).
mean and variance of the function describing The alternatives are that there is clustering
the process are constant with distance and and thus positive spatial autocorrelation
direction between locations. Thus, spatial (Figure 6.2(a)) or repulsion and negative
statistics can quantify patterns but cannot spatial auotocorrelation (Figure 6.2(c)).
identify their origin. The mathematical commonality of the
So what are spatial statistics good for? various spatial statistics is that they use the
To answer this fundamental question, we cross-product between a weighted function
summarize first how the most commonly relating the degree of distance (wij ) among
used spatial statistics estimate spatial patterns the sampling locations (n) and a function (Y )
and spatial autocorrelation. We stress how quantifying the degree of similarity among
spatial analyses of larger areas where there is the values of the variable (xij ) at these
more than one process impair the direct use sampling locations (Dale et al., 2002; Getis,
of spatial statistics and parametric statistics. 1991; Getis and Ord, 1992):
We present the statistical issues and the
recent developments aiming to address them.

n
n
Then, we conclude by commenting on some wij (d)Yij (x)
unresolved challenges in the field of spatial i=1 j=1
i=j j=i
statistics. Statistic (d) =
C(w, d)
where d is the spatial distance lag (or search

6.2. SPATIAL STATISTICS IN A window size or calculation template of
NUTSHELL radius d), between the sampling locations at
which the spatial statistic is computed. The
Arising from time series statistics and the divisor of C(w, d) is just a correction of the
more familiar parametric statistics, spatial overall magnitude of the statistic calculated.
Table 6.1 Similarity functions and signicance test procedures according to the
spatial statistics
Global spatial statistics Signicance test
Point data Ripleys K : for each radius t, the statistic sums Assess using a condence envelope based on a
the indicator function It (i , j ) that counts, at randomization procedure (complete spatial
each point, the number of points within a randomness).
circle of radius t (Figure 6.3).
Join count statistics: The statistics count the Assess by comparing the observed frequencies of
number of links of matching Jrr and links to those expected under the null
mismatching Jrs categories. hypothesis of randomness.
Polygon data Join count: The statistics count the number of Assess by comparing the observed frequencies of
links of matching Jrr and mismatching Jrs links to those expected under the null
categories. hypothesis of randomness.
Quantitative data Morans I : The statistic sums the deviation of the Assess using either a randomization procedure or
values at a given distance lag from the mean a normal distribution approximation test where

of the variable, Yij (x ) = (xi x ) xj x . the expected value of absence of spatial
autocorrelation is EN (I ) = ER (I ) = (n 1)1 .
Gearys c : The statistic sums the squared Assess using either a randomization procedure or
deviation of the values at a given distance a normal distribution approximation test where

lag, Yij (x ) = xi xj 2 . the expected value of absence of spatial
autocorrelation is EN (c ) = ER (c ) = 1.
Each spatial statistic is characterized by (Table 6.1). To overcome the issue of

a particular way to determine its search assumed normality, randomization tests can
window type (links, areas), size and shape be used to determine the significance of the
(circular, square), as well as the way it spatial pattern (Bjrnstad and Falck, 2001).
calculates the degree of relationship among This is how the significance of Ripleys
the values of the variables (Table 6.1; K is assessed by using a complete spatial
for mathematical details see Cliff and randomness procedure (Poisson process).
Ord, 1981; Cressie, 1993; Diggle, 1983; Several researchers have suggested that
Epperson, 2003; Fortin and Dale, 2005; complete spatial randomness is not useful
Haining, 2003; Ripley, 1981). as a hypothesis for comparison and have
These spatial statistics provide, for each proposed other more realistic point patterns
search window size, a single value rep- such as PoissonPoisson or CoxPoisson
resenting the average degree of spatial for comparison (Haining, 2003; Fortin and
autocorrelation over the study area. This Dale, 2005). Other spatially restricted ran-
is why it is important to assume spatial domization procedures can be used for spatial
stationarity. This property of stationarity statistics to reflect the spatial dynamics of
of the random function attributed to the the process or of the spatial distribution of
process is also paramount because it allows the environmental factors (Goovaerts and
significance testing of the null hypothesis Jacquez, 2004; Wiegand and Moloney, 2004).
of spatial randomness (Cliff and Ord, 1981; To emphasize the essential property of
Cressie, 1993). For instance, by assuming the spatial statistics that are based on
normality the expected mean and variance the assumption of spatial stationarity, the
of the spatial pattern can be estimated qualifier global can be added when referring
Point data Categorical data Quantitative data

Join count Join count Morans I and Gearys c
Ripleys k
Figure 6.3 Search window types to determine the distance weight among sampling
locations according to the data types and spatial statistics. For points data, the geographical
coordinates of objects as well as their attributes (black or gray) need to be surveyed for the
entire study area. Join count statistics require rst to establish the link network among the
sampling locations. Here we used a Delaunay tessellation network (Fortin and Dale, 2005;
Okabe et al., 2000) to determine the links. Ripleys K is using circles of radius t at each
point. For categorical data from polygons, join count statistics can be used where the links
are determined using the centroid of the polygons. For quantitative data, spatial
autocorrelation coefcients (Morans I, Gearys c ) can be computed using either a link
network among the sampling locations, a search window from each sampling location or
from each cell (quadrat) from a grid (quadrats, cells).
to them. Because spatial stationarity is this volume; Fortin and Dale, 2005). While
assumed, the shape of the search window this feature was used early on in geostatistics
is isotropic (Figure 6.3) and the intensity (Atkinson, in this volume; Journel and
of the spatial pattern is measured as if Huijbregts, 1978), it took longer to become
it were the same, whatever the direc- common practice in other applications of
tion. In natural systems, this assumption spatial statistics (Oden and Sokal, 1986). The
is often not realistic, as water flow and use of these directional weights still assumes
wind are mostly directional processes. Such that the process can occur over the entire area.
directional processes generate anisotropic It is not always the case, as in studying fish
patterns for which the characteristics depend pools for example, where the spatial patterns
on direction (Figure 6.2). Isotropic search we need to consider are only those of the
windows are not able to detect anisotropic aquatic network itself. In addition, proximity
patterns and therefore, weights are needed in an aquatic network (Figure 6.4) cannot
to compute spatial autocorrelation according be determined using Euclidean distance as
to direction as well as distance (Dubin, in in terrestrial systems (Figure 6.2), but rather
The same Euclidean distance but not the same path length
Not spatially connected
Figure 6.4 Aquatic network path length does not match the Euclidean distance among
sampling locations.
requires a topological basis for proximity small, and it could not include enough
(Fortin and Dale, 2005; Okabe et al., data to characterize the pattern; too large,
2000). Okabe and Yamada (2001) used such and it could cover several patterns from
network weights to account for the particular various sources and at different scales as
topology of spatial networks for computing already mentioned above (Dungan et al.,
Ripleys K. 2002; Fortin and Dale, 2005). Increasing
the extent of the study area implies that
more processes and environmental factors
may alter the variable of interest. Usually,
6.3. EFFECTS OF THE EXTENT ON however, it is rare that we know in advance
GLOBAL SPATIAL STATISTICS at which extent to study a phenomenon. In
the absence of prior knowledge, researchers
As we are looking for spatial patterns should perform a pilot study to determine
in natural systems, several decisions will it (Dungan et al., 2002; Legendre et al.,
affect our ability to detect and quantify 2002). Unfortunately, the wealth of available
spatial structures: how the data are gathered data captured by remote sensing over large
(Dungan et al., 2002; Fortin and Dale, areas is tempting and we often succumb
2005; Legendre et al., 2002), the size of to the temptation. We use all the data
the study area (the extent), and the size available to us. We go fishing for spatial
of the sampling units (the grain). Here patterns. The problem is that the larger the
we will focus on the change of extent size extent, the more likely it is that several
as it has a direct effect on the spatial environmental factors and processes operate
stationarity of the area and the validity on the variable under investigation, resulting
of the global spatial statistics. The change in spatial nonstationarity with the spatial
of grain size is also important and it is patterns of several scales intermingled, or
known as the modifiable area unit problem that some processes have greater effects
(MAUP). A whole chapter is dedicated to in some sub-regions than in others. The
MAUP (Wong, in this volume), and so consequence of the resulting estimation by
we refer the readers to that part of this global spatial statistics of spatial autocor-
handbook. relation at various distances is that the
The extent of the study area affects average values of spatial autocorrelation
our ability to detect spatial patterns: too may not reflect any spatial pattern as the
spatial structures may be cancelling out each issues that affect the reliability of global
others signals. spatial statistics in estimating spatial autocor-
Even when the extent of the study area relation and how to minimize them within
is appropriate for the phenomena under the context of global analyses over the
study, our ability to determine adequately the entire study area. Another approach to deal
spatial pattern can be altered due to sampling with these issues is to measure spatial
issues, statistical issues, or a combination of autocorrelation locally using local spatial
both. One sampling issue is the mismatch statistics (Table 6.2). Local indicators of
between the location of the extent and the spatial autocorrelation (or spatial association,
process under study: if the actual location called LISA, Anselin, 1995) measure the
of the study area is a few meters north degree of spatial autocorrelation using, for
or south, it can cause the detected spatial example, Morans I algorithm for sampling
pattern to vary (Plante et al., 2004). From locations based only on the neighborhood
a statistical point of view, the number of around a given sampling location. The
neighboring points at the edge of the study neighborhood search window can be based
area is always smaller than at the center either on a link network or on distance classes
(as illustrated in Figure 6.3 by the sampling as in the global Morans I approach. Several
locations, at the centroid of patches or at variants of LISA having been developed in
the centroid of quadrats marked by squares). the same spirit of measuring local spatial
This edge effect is known and several edge association rather than autocorrelation such
correction algorithms have been proposed to as the local Getis and the local Ord statistics
adjust either for the edge, the corner or both (Boots, 2002; Fotheringham et al., 2000;
(Goreaud and Plissier, 1999; Haase, 1995; Getis and Ord, 1996; Ord and Getis, 1995,
Wiegand and Moloney, 2004). Similarly, 2001). One of the advantages of these
rectangular study areas will have pairs of local spatial statistics is that the values of
locations at the larger distance classes only in spatial autocorrelation (or spatial association)
one direction (Fortin 1999). To have a more can be mapped at each sampling location
comparable number of pairs of sampling allowing the identification of sub-regions
locations to estimate spatial autocorrelation, within the study area having positive (called
it is recommended to use distance classes hot spots) or negative (called cold spots)
no larger than half or two thirds of the autocorrelation values (Wulder and Boots,
smallest side of the study area (Fortin and 1998). This is very useful when large
Dale, 2005) or to use equifrequent classes study areas are analyzed to determine how
where the number of pairs is kept constant homogeneous (or not) a region is. One
rather than the thresholds of Euclidean drawback, however, is that the significance
distances for succeeding classes (Sokal and test for each sampling location is based on
Wartenberg, 1983). the global estimate of spatial autocorrelation
for the entire study area and that assumes
spatial stationarity. In the absence of spatial
stationarity, the advantage of using local
6.4. LOCAL SPATIAL STATISTICS: spatial statistics over larger areas is cancelled
ONE STEP IN A GOOD by the lack of significance test. This is why
DIRECTION recently researchers have been developing
new procedures to assess local significance
The previous section presented some of that account for the global estimate of spatial
the most common sampling and statistical autocorrelation (Ord and Getis, 2001; Kabos
Table 6.2 Recent developments related to each kind of spatial statistics

Spatial statistics Developments
Point pattern Dale and Powell (2001): asymmetric point pattern analysis permits the detection of centers of
low and high density regions (univariate) or of segregation and aggregation (bivariate or
multivariate).
Dixon, (2002): multivariate point pattern analysis using counts of nearest neighbors.
Ripleys K Edge correction algorithms to adjust for points close to the edge and corner of the study area
that have less likely to have points nearby than those at the center (Goreaud and Plissier,
1999; Haase, 1995).
Spatial network weights accounting for particular topology as in roads or aquatic stream
network (Okabe and Yamada, 2001). Restricted randomization accounting for spatially
heterogeneous study area (Wiegand and Moloney, 2004).
Join count Signicance test accounting for the presence of global spatial autocorrelation (Kabos and
Csillag, 2002).
Global spatial statistics Global non-parametric spatial covariance (Bjrnstad and Falck, 2001).
Local indicators of spatial autocorrelation or association (Anselin, 1995).
Local spatial aggregation statistics (Boots, 2002, 2003; Getis and Ord, 1992; Ord and Getis,
1995; Wulder and Boots, 1998).
Local spatial statistics Statistics that account for the presence of global spatial autocorrelation: Ords O (Ord and
Getis, 2001).
Local indicators of spatial autocorrelation (Anselin, 1995).
Local spatial aggregation statistics (Boots, 2002, 2003; Getis and Ord, 1992, Ord and Getis,
1995, Wulder and Boots, 1998).
and Csillag, 2002). Even with these newer 6.5. SPATIAL AUTOCORRELATION
methods to test significance, one cannot IMPLICATIONS FOR
apply a Bonferronis correction to adjust for PARAMETRIC AND
the multiple tests for each coefficient as RANDOMIZATION
for the global spatial statistics (Fortin and SIGNIFICANCE TESTING
Dale, 2005) because the tests may be highly
correlated, and there are usually too many One important feature of spatial dependence
sampling locations so often no coefficients in data is that positive spatial autocorrela-
would appear significant. However, the tion makes parametric statistical tests too
mapping of a local spatial coefficient value liberal, in that they produce more apparently
at each sampling location has been found significant results than the data actually
a very informative tool for exploring the justify. A simple intuitive explanation is
characteristics of spatial data (Fotheringham, that because of the lack of independence,
1997; Fotheringham and Brunsdon, 1999; at least some of the information of sam-
Pearson, 2002; Sokal et al., 1998). In ple i is contained in adjacent samples
the same spirit of analyzing locally spatial and so instead of having the information
pattern and the underlying factor or process of n independent samples, we have the
responsible for it, geographically weighted information appropriate to fewer samples,
regression can be used (see Fotheringham, in n, called the effective sample size (cf.
this volume). Cressie, 1993). It is tempting to suggest
Table 6.3 Correction procedures for the presence of spatial autocorrelation for
parametric tests
General concepts Correction procedure
Parametric tests Cressie (1993)
Univariate tests
No general solution: model & Monte Carlo Mizon (1995); Dale and Fortin (2002)
Bivariate tests
Correlation Modied t -test (Clifford et al., 1989; corrected by Dutilleul (1993)
Linear regression Alpargu and Dutilleul (2003)
Partial correlation Alpargu and Dutilleul (2006)
2 2 contingency table Cerioli (1997)
R C contingency table Cerioli (2002)
CochranArmitage Cerioli (2003)
Multivariate
Following DutilleulCerioli approach Speculation!
that, based on the work of Cressie and data, whereas a Monte Carlo procedure
others (see below), we should be able to produces new data of similar structure.)
use the autocorrelation structure of the data In either case, the presence of spatial autocor-
in order to calculate the correct effective relation in the data impairs the fundamental
sample size for testing. For univariate tests, assumption of randomization tests which is
this approach does not seem to work well, that each labeling (attributes, values) can be
and Dale and Fortin (2002) suggest the exchangeable randomly (Figure 6.5 (a,b)).
approach of modeling the data by refining Depending on the type of spatial autocor-
a general ARMA (Auto-Regressive Moving relation, modified randomization procedures
Average) model followed by the Monte (or simply restricted randomization tests)
Carlo generation of artificial data sets can be used where the data are random-
with similar autocorrelation structure for ized with some specific spatial restriction.
comparison. For bivariate data, the effective For example, in Figure 6.5(ac), the data
sample size method seems to work well for a show marked regional differences along the
broad range of statistics (see Table 6.3), and south-westnorth-east diagonal. With such
we speculate that it will work for multivariate a spatial structure, a complete random-
data as well. ization test cannot be used, as illustrated
To avoid having to deal with the estimation in Figure 6.5(a), and a restricted one is
of the effective sample size, the use of more appropriate. One way to account for
randomization tests also seems attractive. this type of spatial structure is to have
Randomization tests (also called permutation, the study area partitioned into two regions
resampling or computer intensive tests) are (Figure 6.5(d)) and then the randomization
convenient when the goal of the study is is applied in each region separately. When
to assess the significance of the sample the data show spatial dependence due to
itself. When the goal is to make inferences underlying environmental factors, restricted
about the sampling population, a Monte Carlo randomization procedures that generate a
procedure should be used instead (Good, comparable degree of spatial autocorrelation
2000). (Permutation re-orders the original as that observed in the data can be helpful
(a) (b)
5
5 5 6
5 6
4 6
7 6
4 7
7 4
7
4 4 2 4
2
7 7 2 7
2 7
4 6 6
4
3 6 6
6 3
5 5
6
3
3
3 5 3 5
2
2
5
2 5 2
(c) (d)
5
5 5 6
5 6
4 6
4 7 6 7
7
7 4 4
4 4 2
2
2 4 7
2 7 7 7
4 6
6
3 6 6
6 3 5
5
6
3
3 5
3 3
2 5
2
5
2 5 2
Figure 6.5 Randomization procedure. (a) Sampling locations with the quantitative values
of a given variable. (b) Complete spatial randomness of the values of the variable over
the sampling locations. (c) Same study area as in (a) where the dashed line delineates
two sub-regions having different mean values. (d) Restricted randomization within each
region.
(Fortin et al., 2003). To assess significance 6.6. HOW MANY SPATIAL SCALES?
with more complex spatial patterns in
which there is more than one spatial scale, A good practice to analyze larger regions
Goovearts and Jacquez (2004) proposed involves assessing first whether the spatial
a typology of increasing levels of spatial patterns of the data involve more than one
restrictions, that they called neutral models, spatial scale, and then relate each scale to a
to simulate more spatially realistic reference key factor or process. This was easy to say
distributions. but not so easy to do until recently. Two new
approaches have been proposed to use spatial 6.7. NEW ERA OF SPATIAL
scales as spatial predictors in regression ANALYSIS: CATEGORICAL DATA
or canonical analysis models (Borcard and
Legendre, 2002; Keitt and Urban, 2005). Spatial analysis of data requires a priori
Borcard and Legendre (2002) determined knowledge about the data and the under-
spatial predictors using principal coordi- lying processes. It requires as well good
nates of neighbor matrices (PCNM) that understanding of possibilities and limitations
decomposed spatial scales into orthogonal of the various spatial statistics available
spatial predictors based on the eigenvectors (Figure 6.6; see also Csillag and Boots,
of the positive eigenvalues of the principal 2005). The issues presented in this chapter
coordinates. The advantage of this method deal mostly with the context of spatial anal-
lies in the fact that neighborhoods, i.e., ysis of quantitative data. Over larger study
spatial scales, can be determined using areas, it is rare however that quantitative
the Euclidean distances among irregularly data are available and it is more likely
spaced sampling locations. Keitt and Urban that we need to rely only on qualitative
(2005) used the wavelet-coefficient of the data. The spatial analysis of categorical
wavelet transform at each decomposition data requires often that the questions are
level as spatial predictor in a multiple regres- revised (Figure 6.6) as well as the type
sion model. Unlike the PCNM approach, of spatial statistical tools. GIS packages
the wavelet decomposition requires that the offer a series of simple spatial descriptions
data are surveyed in a contiguous way of qualitative data (e.g., area, number of
as is the case with remotely sensed and patches) and several landscape metrics are
GIS raster data. These new approaches available to refine the spatial characterization
have a lot of potential to determine the of categorical data (Gustafson, 1998). More
relative importance of environmental factors work is still needed, however, to be able to
and processes in explaining the patterns determine the significance of these metrics
of data. so that they can be compared through time
One process
Yes No
Stationary process Several process:

continue spatial analysis?
Yes No
Yes No
Global spatial analysis
Partition into spatially Revised question(s)
homogeneous regions Change location
Change size of extent
Yes No
Global spatial analysis Local spatial analysis

in each region
Figure 6.6 Flow chart to decide which spatial statistics to use.

and between sites (Fortin et al., 2003; Cerioli, A. (1997). Modied tests of independence
Remmel and Csillag, 2003, 2006). As for in 2 2 tables with spatial data. Biometrics,
spatial statistics for categorical data per se, 53: 619628.
recent methods were at the global level by Cerioli, A. (2002). Testing mutual independence
assessing spatial variance using a transiogram between two discrete-valued spatial processes:
a correction to Pearson chi-squared. Biometrics,
(Weidong, 2006) and at the local level
58: 888897.
developing new local measures of spatial
association (Boots, 2003). The use of mark Cerioli, A. (2003). The CochranArmitage trend test
under spatial autocorrelation. Proceedings of the
connection functions (Stoyan and Penttinen,
Conference Complex Models and Computational
2000) is also a promising area of further Methods for Estimation and Prediction. Treviso,
investigation, perhaps where the mosaic of Italy, September 2003.
patches is converted into a network of points Cliff, A.D. and Ord, J.K. (1981). Spatial Processes:
with marks which identify connections to Models and Applications. London: Pion.
first-order neighbors. Finally, there remains
Clifford, P., Richardson S. and Hmon, D. (1989).
the large problem of incorporating time, Assessing the signicance of correlation between
creating a spatio-temporal analysis to assess two spatial processes. Biometrics, 45: 123134.
the changes in spatial characteristics. Cressie, N.A.C. (1993). Statistics for Spatial Data,
Revised Edition. New York: Wiley.
Csillag, F. and Boots, B. (2005). A framework for
statistical inferential decisions in spatial pattern
REFERENCES analysis. The Canadian Geographer, 49: 172179.
Dale, M.R.T. and Fortin, M.-J. (2002). Spatial autocor-
Alpargu, G. and Dutilleul, P. (2003). To be or relation and statistical tests in ecology. coscience,
not to be valid in testing the signicance of 9: 162167.
the slope in simple quantitative linear models
with autocorrelated errors. Journal of Statistical Dale, M.R.T. and Powell, R.D. (2001). A new method
Computation and Simulation, 73: 165180. for characterizing point patterns in plant ecology.
Journal of Vegetation Science, 12: 597608.
Alpargu, G. and Dutilleul, P. (2006). Stepwise
Dale, M.R.T., Dixon, P., Fortin, M.-J., Legendre, P.,
regression in mixed quantitative linear models with
Myers, D.E. and Rosenberg, M. (2002). The
autocorrelated errors. Communications in Statistics
conceptual and mathematical relationships
Simulation and Computation, 35: 79104.
among methods for spatial analysis. Ecography,
Anselin, L. (1995). Local indicators of spatial associa- 25: 558577.
tion LISA. Geographical Analysis, 27: 93115.
Delmelle (Chapter 10 Spatial sampling).
Atkinson and Lloyd (Chapter 9 Geostatistics) Diggle, P.J. (1983). Statistical Analysis of Spatial Point
Bjrnstad, O.N. and Falck, W. (2001). Nonparametric Patterns. London: Academic Press.
spatial covariance functions: estimation and testing. Dixon, P.M. (2002). Nearest-neighbor contingency
Environmental and Ecological Statistics, 8: 5370. table analysis of spatial segregation for several
Boots, B. (2002). Local measures of spatial association. species. coscience, 9: 142151.
coscience, 9: 168176. Dubin (Chapter 8 Spatial Weights).
Boots, B. (2003). Developing local measures of Dubin, R.A. (1998). Spatial autocorrelation: A primer.
spatial association for categorical data. Journal of Journal of Housing Economics, 7: 304327.
Geographical Systems, 5: 139160.
Dungan, J.L., Perry, J.N., Dale, M.R.T., Legendre, P.,
Borcard, D. and Legendre, P. (2002). All-scale Citron-Pousty, S., Fortin, M.-J., Jakomulska, A.,
spatial analysis of ecological data by means Miriti, M. and Rosenberg, M.S. (2002). A balanced
of principal coordinates of neighbour matrices. view of scale in spatial statistical analysis. Ecography,
Ecological Modelling, 153: 5168. 25: 626640.
Dutilleul, P. (1993). Modifying the t test for assessing the case of lung cancer in Long Island, New York.
the correlation between two spatial processes. International Journal of Health Geographics, 3: 14.
Biometrics, 49: 305314.
Goreaud, F. and Plissier, R. (1999). On explicit
Epperson, B.K. (2003). Covariances among join- formulas of edge effect correction for Ripleys
count spatial autocorrelation measures. Theoretical K -function. Journal of Vegetation Science, 10:
Population Biology, 64: 8187. 433438.
Fortin, M.-J. and Dale, M.R.T. (2005). Spatial Analysis. Green, J.L., Hastings, A., Arzberger, P., Ayala, F.J.,
A Guide for Ecologists. Cambridge: Cambridge Cottingham, K.L., Cuddington, K., Davis, F.D.,
University Press. Dunne, J.A., Fortin, M.-J., Gerber, L. and Neubert, M.
(2005). Complexity in ecology and conserva-
Fortin, M.-J. and Jacquez, G.M. (2000). Randomization
tion: mathematical, statistical, and computational
tests and spatially autocorrelated data. Bulletin of
challenges. BioScience, 55: 501510.
the Ecological Society of America, 81: 201205.
Gustafson, E.J. (1998). Quantifying landscape spatial
Fortin, M.-J., Boots, B., Csillag, F. and Remmel, T.K.
pattern: What is the state of the art? Ecosystems,
(2003). On the role of spatial stochastic models in
1: 143156.
understanding landscape indices in ecology. Oikos,
102: 203212. Haase, P. (1995). Spatial pattern analysis in ecology
based on Ripleys K -function: Introduction and
Fortin, M.-J., P. Drapeau, P. and Legendre, P.
methods of edge correction. Journal of Vegetation
(1989). Spatial autocorrelation and sampling design.
Science, 6: 575582.
Vegetatio, 83: 209222.
Fotheringham (Chapter 13 GWR). Haining (Chapter 2 Nature of Spatial Data).
Fotheringham, A.S. (1997). Trends in quantitative Haining, R. (2003). Spatial Data Analysis: Theory and
methods I: stressing the local. Progress in Human Practice. Cambridge: Cambridge University Press.
Geography, 21: 8896. Journel, A.G. and Huijbregts, C. (1978). Mining
Fotheringham, A.S. and Brunsdon, C. (1999). Local Geostatistics. London: Academia Press.
forms of spatial analysis. Geographical Analysis, Kabos, S. and Csillag, F. (2002). The analysis of
31: 340358. spatial association on a regular lattice by join-
Fotheringham, A.S., Brunsdon, C. and Charlton, M. count statistics without the assumption of rst-
(2000). Quantitative Geography: Perspectives on order homogeneity. Computers and Geosciences,
Spatial Data Analysis. London: Sage Publications. 28: 901910.
Getis, A. (1991). Spatial interaction and spatial auto- Keitt, T.H. and Urban, D.L. (2005). Scale-specic
correlation: a cross product approach. Environment inference using wavelets. Ecology, 86: 24972504.
and Planning A, 23: 12691277. Legendre, P., Dale, M.R.T., Fortin, M.-J., Gurevitch,
Getis, A. and Ord, J.K. (1992). The analysis of J., Hohn, M. and Myers, D.E. (2002). The
spatial association by use of distance statistics. consequences of spatial structure for the design and
Geographical Analysis, 24: 189206. analysis of ecological eld surveys. Ecography, 25:
601615.
Getis, A. and Ord, J.K. (1996). Local spatial
statistics: an overview. In: Longley, P and Lichstein, J.W., Simons, T.R., Shriner, S.A. and Franzreb,
Batty, M. (eds), Spatial Analysis: Modelling in K.E. (2002). Spatial autocorrelation and autore-
a GIS Environment, pp. 261277. Cambridge: gressive models in ecology. Ecological Monographs,
GeoInformation International. 72: 445463.
Good, P. (2000). Permutation Tests: A Practical Guide Mizon, G.E. (1995). A simple message for autocor-
to Resampling Methods for Testing Hypotheses, 2nd relation correctors: dont. Journal of Econometrics,
edn. New York: Springer-Verlag. 69: 267289.
Goovaerts, P. and Jacquez, G.M. (2004). Accounting Oden, N.L. and Sokal, R.R. (1986). Directional
for regional background and population size in autocorrelation: an extension of spatial correlo-
the detection of spatial clusters and outliers using grams to two dimensions. Systematic Zoology, 35:
geostatistical ltering and spatial neutral models: 608617.
Okabe, A. and Yamada, I. (2001). The K -function Remmel, T.K. and Csillag, F. (2003). When are two
method on a network and its computational imple- landscape pattern indices signicantly different?
mentation. Geographical Analysis, 33: 270290. Journal of Geographical Systems, 5: 331351.
Okabe, A., Boots, B., Sugihara, K. and Chiu, S.N. Remmel, T.K. and Csillag, F. (2006). Mutual information
(2000). Spatial Tessellations: Concepts and Appli- spectra for comparing categorical maps. Interna-
cations of Voronoi Diagrams, 2nd edn. Chichester: tional Journal of Remote Sensing, 27: 14251452.
John Wiley.
Ripley, B.D. (1981). Spatial Processes. New York: John
Ord, J.K. and Getis, A. (1995). Local spatial Wiley.
autocorrelation statistics: distributional issues and an Sokal, R.R., Oden, N.L. and Thomson, B.A. (1998).
application. Geographical Analysis, 27: 286306. Local spatial autocorrelation in biological variables.
Ord, J.K. and Getis, A. (2001). Testing for local Biological Journal of the Linnean Society, 65: 4162.
spatial autocorrelation in the presence of global Sokal, R.R. and Wartenberg, D.E. (1983). A test of
autocorrelation. Journal of Regional Science, 41: spatial autocorrelation using an isolation-by-distance
411432. model. Genetics, 105: 219237.
Pearson, D.M. (2002). The application of local Stoyan, D. and Penttinen, A. (2000). Recent applica-
measures of spatial autocorrelation for describing tions of point process methods in forestry statistics.
pattern in north Australian landscapes. Journal of Statistical Science, 15: 6178.
Environmental Management, 64: 8595.
Wagner, H.H. and Fortin, M.-J. (2005). Spatial analysis
Perry, J.N., Liebhold, A.M., Rosenberg, M.S., Dungan, of landscapes: concepts and statistics. Ecology,
J., Miriti, M., Jakomulska, A. and Citron- 86: 19751987.
Pousty, S. (2002). Illustrations and guidelines
for selecting statistical methods for quantifying Weidong, L. (2006). Transiogram: A spatial relationship
spatial pattern in ecological data. Ecography, 25: measure for categorical data. International Journal of
578600. Geographical Information Science, 20: 693699.
Peterson, G.D. (2002). Contagious disturbance, eco- Wiegand, T. and Moloney, K.A. (2004). Rings, circles
logical memory, and the emergence of landscape and null-models for point pattern analysis in ecology.
pattern. Ecosystems, 5: 329338. Oikos, 104: 209229.
Wong (Chapter 7 MAUP).
Plante, M., Lowell, L., Potvin, F., Boots, B. and
Fortin, M.-J. (2004). Studying deer habitat on Anti- Wulder, M. and Boots, B. (1998). Local spatial
costi Island, Qubec: Relating animal occurrences autocorrelation characteristics of remotely sensed
and forest map information. Ecological Modelling, imagery assessed with the Getis statistic. Interna-
174: 387399. tional Journal of Remote Sensing, 19: 22232331.
7
The Modiable Areal Unit
Problem (MAUP)
David Wong
7.1. INTRODUCTION drawn, but the former situation, i.e., defining

a region, is the essence of the analytical issue
Geographical space is continuous in nature known as the modifiable areal unit problem
there is no perfect discontinuity on the Earths (MAUP).
surface. Therefore, in modeling geographical The term MAUP was coined by Openshaw
space, the raster model, which depicts the and Taylor (1979) when they experimented
Earths surface with small grid cells, is often with how correlation coefficient values
used to mimic the continuous nature of space changed when smaller areal units were
as closely as possible. But the geographical aggregated to form larger areal units either
space is also occupied by locations of hierarchically or non-hierarchically. They
identical characteristics (e.g., locations along concluded that the correlation coefficient
a concrete pedestrian walkway) and objects could carry a range of values over different
or features. In the latter case, we will use levels of spatial aggregation. The source
geometric primitives of points, lines or arcs, of the problem is that boundaries of areal
and polygons to represent those objects or units are often created artificially or in an
features either in drawings (maps) or data ad hoc manner and thus can be changed.
(vector data). In the former situation where When boundaries are drawn in a different
areas have homogeneous characteristics, we manner, analyses of data tabulated according
often want to demarcate the areas by discrete to different boundaries will provide different
boundaries to define homogeneous or formal results. Therefore, Openshaw and Taylor
regions. In both situations, boundaries are referred to this inconsistency of analytical
results due to different spatial configuration In remote sensing or raster modeling, the
or partitioning schemes as the modifiable basic areal units are pixels or grid cells.
areal unit problem. Each pixel or cell can be regarded as a
This chapter is intended to provide an spatially discrete unit. These units can be
overview of the MAUP. Although several of different sizes or resolutions. Where the
overviews of the MAUP exist, they are edges of the pixels or cells are located
dated (e.g., Openshaw, 1984; Wong, 1995). is somewhat arbitrary. By shifting the grid
I explain the MAUP and its two sub-problems system slightly over space or changing the
in more detail in section 7.2. While existing size of the pixels or cells, a new dataset can be
literature has already elaborated upon the created. Thus, numerous raster-based datasets
impacts and scope of the MAUP, I provide an can be created and they will give us different
overview of some of its fundamental impacts results. Therefore, the MAUP is not limited
in section 7.3. In section 7.4, I use simulated to polygon or vector data, but also exists in
data and empirical datasets to illustrate the raster data.
processes creating the two MAUP sub- There are two dimensions through which
problems. In section 7.5, I summarize the we can partition space or draw boundaries.
research developments pertaining to the One is to focus on the spatial dimension
MAUP, with emphases upon the most recent by using different configurations to partition
decades. Different directions in handling and space and fixing the number of areal units to
searching for solutions for the MAUP are be derived in the study region. As discussed
reviewed in section 7.6, and this is followed earlier, there are many ways to partition
by a concluding remark. a region even if the number of areal
units is kept constant. In reality, we often
encounter this in the form of re-partitioning
or rezoning processes. A common example
7.2. WHAT IS THE MAUP? is the rezoning of school districts at the
local scale. In some cases, the number of
The essence of the MAUP is that there are schools or districts does not change. But
many ways to draw boundaries to demarcate because of changes in population distribution
space into discrete units to form multiple across the districts and/or in the capacities
spatial partitioning systems. These units of school facilities (such as through school
may serve administrative purposes, such as renovation or addition of structures), the
the counties in the U.S., or statistical or school district boundaries have to be redrawn
data gathering purposes, such as the census to accommodate the change. With new school
enumeration units of tracts, block groups boundaries, the student compositions of some
and blocks below the county level. Although schools according to the new boundaries may
these boundaries are often drawn along some be different from the original ones. Therefore,
physical features (such as rivers or roads) data tabulated according to the old and new
that may serve as physical barriers separating school boundary systems will give different
areal units, there are multiple ways to draw results. Note that the number of districts and
those boundaries. Thus multiple datasets of the population could be the same before and
the same area can be created and they will after the rezoning. Changes in the data occur
offer different descriptions of the areas and when the population is spatially regrouped
different analytical results. into different sub-units in the region.
But the process of drawing boundaries Another common example is congressional
should be interpreted beyond the literal sense. redistricting. Although redistricting may not
THE MODIFIABLE AREAL UNIT PROBLEM (MAUP) 107
inset in (c)
9 10
11 9
7 6 11
4
5
3 10
8
12
3
1
8
2
2 1
(a) (b)
Census tracts
Block groups
Census blocks
7 10
11 9
6
5 4
13 13
8
8
(c) (d)
Figure 7.1 107th Congressional Districts for Georgia; (b) 109th Congressional Districts for
Georgia; (c) 109th Congressional Districts around the Atlanta Region, Georgia; (d) Census
tracts, block groups and blocks of Washington, DC, 2000 Census.
be a perfect example, since the number of It is obvious that the two partitioning
congressional districts is likely different after systems have very different spatial patterns,
the redistricting process due to population although only two districts were added in
growth, it provides a good example to the 109th Congress for Georgia. No old
illustrate the significance and magnitude of district in the 107th Congress in Georgia
this dimension of the MAUP. Figure 7.1(a) maintained its territory in the 109th. Another
and (b) show two maps of the 107th obvious change is that the area around
and 109th congressional districts (CDs) in the Atlanta metropolitan area has become
Georgia. Figure 7.1(a) is the map of the 107th much more spatially fragmented to accom-
CDs and Figure 7.1(b) is for the 109th CDs. modate more CDs. Because of the spatial
complexity within that region, Figure 7.1(c) Note that when the U.S. is partitioned
was provided to show the details. It is according to the levels of census geography
impressive how some districts have such an described above (regiondivisionstate
irregular or non-compact shape. For instance, countytract-blockgroup-block), they form
CD 8 essentially has two sectors, very much a geographical hierarchy such that subdivi-
breaking up CD 11. CD 13 seems to have sions at the more detailed level are found
several pieces scattered around the city of only within, but not across, the larger units
Atlanta and stretched outward narrowly from involved. When other census units, such as
the city. The case of Atlanta CDs is a metropolitan areas, are involved, the situation
possible case of gerrymandering, and is a will not conform to a geographical hierarchy.
good example to illustrate how space can be Still, the general idea is that the region
partitioned in a seemingly infinite number can be subdivided to different levels of
of ways (Openshaw, 1996; Fotheringham, detail or spatial resolution, as in raster data.
2000). Figure 7.1(d) offers such an example using
When the number of areal units is fixed or Washington, DC, while only tracts, block
relatively stable, but boundaries are redrawn groups, and blocks are shown here.
to accommodate changes, this is basically Data are available at all of these census
a zoning process. Data gathered according geography levels. Census tract and census
to different zoning systems of the same block groups data are commonly used in
region will give us different depictions of the demographic and socioeconomic analyses.
region and different analytical results when But one cannot assume that analysis results
the data are analyzed. The inconsistency of from the census tract data will be consistent
the results based upon data from different with the results based on the block group
zoning systems is known as the zoning data. This inconsistency due to the use of
problem, one of the two sub-problems of data at different geographical scales or spatial
the MAUP. resolutions is known as the scale problem,
Another dimension through which we the second sub-problem of the MAUP. In the
can partition space is the scale dimension. next section, I will use simple examples to
Given a study region, we can partition illustrate some fundamental inconsistencies
the region to different levels of detail. of analytical results due to the zoning and
For instance, the U.S. is divided into scale problems.
four census regions, and each region is
further subdivided into divisions, giving
the entire U.S. nine divisions. Under each
division are states, which are essentially 7.3. FUNDAMENTAL IMPACTS
political and administrative units (Wong OF THE MAUP
and Lee, 2005, p. 8). Under each state,
we have counties and then census tracts, To date, most of the literature on the MAUP
block groups and census blocks. Under has been focused on the impacts of the
counties, those census units are enumera- problems. Before I provide a review of
tion or statistical units created for census the literature, I will illustrate some simple
data gathering, tabulation and dissemination effects of the MAUP using the Congressional
purposes. But they provide information Districts data of Georgia and the census data
about the region at a more geographically of Washington, DC.
detailed level than the state or county In the Georgia example, the boundaries of
levels. congressional districts (CDs) changed quite
significantly between the 107th and 109th that in the 107th. On the other hand, CDs
Congresses. The maps in Figure 7.1 show along the southern side of the city of Atlanta
only the boundary changes without demon- became more populated by blacks when
strating the potential impacts on analysis due the boundaries changed from the 107th to
to this zoning effect. The redistricting of the the 109th, while whites tended to be more
109th Congress was based upon the 2000 numerous in CDs surrounding the outskirts
Census data. The 2000 Census data can also of Atlanta and the northeast part of the state.
be tabulated according to the boundaries of When one examines the legends of the
the 107th Congressional Districts in order two maps in Figure 7.2, it is easy to note
to assess how the rezoning affected the that: (1) the different visual patterns of the
characteristics of the CDs. Using simple GIS two maps are not due to using different
procedures, some population variables of the classification values; and (2) data tabulated
2000 Census were tabulated according to according to the two CDs have different
the 107th CD boundaries. Figure 7.2 shows statistical distributions such as minimum and
the percent black in each congressional maximum values. Table 7.1 shows some
district in Georgia in the 2000 Census, of the statistics in detail. Numerically, the
according to the boundaries of the two means from the two Congresses are different,
Congresses. although they are quite close. The 109th
The two maps show very different spatial CDs have a smaller range than the 107th
patterns of the AfricanAmerican population. CDs, but the standard deviation is slightly
The congressional district in southeast larger. When the correlation of percent
Georgia has a lower black concentration white and percent black is evaluated for the
according to the 109th when compared with two Congresses, the correlation for the 107th
Percent black by 107th CDs Percent black by 109th CDs

3.21 15 3.4 15
15 27 15 27
27 38 27 38
38 51 38 51
51 62.62 51 56.1
Figure 7.2 Percent blacks in 2000 Census according to the boundaries of the 107th and
109th Congressional Districts (CDs).
Table 7.1 Selected statistics for the variable percent black for the 107th and 109th CDs
Variable: percent black Mean Minimum Maximum Standard deviation
107th CDs 30.19 3.21 62.62 17.60
109th CDs 28.70 3.40 56.10 18.70
Congress was 0.987 ( p < 0.001) and for When a small number of low value areas
the 109th Congress it was 0.9898 ( p < are surrounded by a large number of high
0.001), too close to tell that they are different. value areas, the scale effect tends to inflate
Although statistically the two means of the low value areas. On the other hand,
percent black for the two Congresses are not when a small number of high value areas are
significantly different, numerical difference surrounded by a large number of low value
in statistical values does raise some concerns areas, the scale effect tends to deflate the
about the consistency of analysis results high value areas. To summarize, a general
using data tabulated according to different characteristic of the scale effect is to smooth
spatial partitioning systems. This inconsis- out extreme values so that the range of the
tency attributable to zonal differences is part values is narrower. To verify this general
of the impact of the zoning effect. impact, Table 7.2 shows selected statistics
The most effective way to illustrate the of the variable at the census tract and block
scale effect of the MAUP is to use data at group levels. Although the means for the two
different levels of a geographical hierarchy. levels are not dramatically different, their
Figure 7.1(d) shows three levels of the maximum values and standard deviations are
census geography of the Washington, DC quite different, supporting the argument that
area. The lowest level, census block, has more aggregated data (tract) tend to have less
limited socioeconomic variables. Therefore, variation, since the aggregation process over
only census tract and block group data are scale smooths the variability.
used here. Figure 7.3 shows the variable If one follows the logic that more spatially
per capita income (PCI) for blacks at aggregated data are less variable, and this
the two census levels. The overall income logic is extended to analyze correlation
distributions depicted by the two maps are between variables, it is not difficult to come
quite similar higher levels in the northwest to the conclusion that data at the higher
and lower levels in the southeast. But the aggregation levels will likely have higher
block group map provides refined details correlation than more spatially disaggregated
that are otherwise concealed at the census data. By picking two variables, per capita
tract level. Some of the block groups in the income for black and median house value,
western part of the region have relatively low we can evaluate their correlation at the tract
PCI values. Because their neighboring block and block group levels. At the tract level,
groups had reasonable PCI levels for blacks, Pearsons correlation coefficient for the two
the overall tract level PCI values are medium variables is 0.6806 ( p < 0.001). At the block
to high. Similarly, there is one small block group level, the correlation is only 0.3867
group on the southeastern side that had ( p < 0.001). Apparently, the correlation at
a moderately high value. But because all the block group level was much lower than
neighboring block groups had lower PCI that at the census tract level. This impact
values, the aggregated value for that tract was of scale effect has long been recognized in
relatively low. the literature for many decades (Gehlke and
PCI for blacks, tracts

0 12369
12370 20883
20884 32176
32177 56964
56965 104731
PCI for blacks, block groups

0 12369
12370 20884
20884 32177
32177 56965
56965 217910
Figure 7.3 Per capita income of blacks, census tract and block group levels.
Table 7.2 Selected statistics for the variable per capita income for blacks at the census
tract and block group levels
Variable: per capita Mean Minimum Maximum Standard deviation
income for blacks
Census tracts 21879 0 104731 15053
Block groups 23390 0 217910 20073
Biehl, 1934; Openshaw and Taylor, 1979; to form larger units (such as tracts), original
Robinson, 1956). Fotheringham and Wong values of the smaller units with some level
(1991) provided a more detailed statistical of variability are summarized or replaced by
explanation to the spatial scale-variant nature a representative value, which, in most cases,
of the correlation coefficient. is a measure of central tendency such as the
mean or median. Extreme values among the
smaller units are now removed and therefore
data more aggregated are becoming less
varied or more similar. Thus, the correlations
7.4. THE MAUP PROCESSES among variables tend to be higher with higher
levels of spatial aggregation. This nature of
7.4.1. The scale effect
the scale effect was best exemplified by the
The above analyses have demonstrated that work of Openshaw and Taylor (1979), which
the MAUP is relevant to even simple shows that the correlation coefficient could
mapping. Maps are often used to explore carry a wide range from a moderate level for
and visualize spatial patterns. The Georgia relatively disaggregated data, to a very high
example shows that to a large extent, the correlation level for highly aggregated data.
spatial pattern is a function of the partitioning Sometimes, slightly negative relationships at
system. Adopting different partitioning sys- the individual or disaggregated level can
tems can generate different patterns for the turn into moderate positive correlations when
same area, despite using the same variable. data are aggregated into larger areal units
The impacts on mapping for the scale effect, (Fotheringham and Wong, 1991).
in the specific example of Washington, DC, Although these correlation analyses are
are not very dramatic. The overall pattern quite straightforward, their results and
is quite persistent across different scales. general patterns have significant implications
However, it is dangerous to assume that the for conducting general statistical and spatial
scale effect has minimal impacts on mapping. analyses on data that may be tabulated and
In fact, many experiments and studies have disseminated at different spatial aggregation
shown that using data from different scale levels. With most multivariate statistics, the
levels can portray very different spatial relationships between variables are often
patterns. summarized by the correlation matrix, or
The impacts of the MAUP on mapping the variancecovariance matrix, and these
are quite obvious, but its impacts on serve as the foundation for analysis (Griffith
statistical analysis seem to be quite difficult and Amrhein, 1997). Data at higher levels
to comprehend and generalize. That is why of aggregation tend to inflate correlation
most of the literature on the MAUP has as compared to the disaggregated levels.
been on assessing its impacts on different Therefore, we can expect that analyses
subject areas (population, urban, vegetation, using more aggregated data will likely show
soil, etc.) and on different techniques (general stronger relationships than analyses using
statistics, spatial statistics, and mathematical more disaggregated data. To some extent, the
models). But the simple correlation analysis process of the scale effect and its general
above using Washington, DC census tract and impacts are quite predictable. Also, because
block group level data offers some insights the variancecovariance matrix is the core of
on the processes related to the scale effect almost all multivariate analyses, the impacts
and its potential impacts. As smaller areal of the MAUP on this matrix are propagated
units (such as block groups) are aggregated to various multivariate statistical techniques
(e.g., Perle (1977) on factor analysis; Hunt Figure 7.4(a), similar values (variable 1) tend
and Boots (1996) on principal component to locate close to each other, exhibiting strong
analysis). positive spatial autocorrelation, a situation
quite common in reality (Odland, 1988;
Griffith, 1987). Figure 7.4(b) was created by
randomly shuffling the original 100 values
7.4.2. The zoning effect
and assigning them to different cells to
For the zoning effect, its process and its create variable 2. As a result, the pattern
general impacts seem to be more difficult is somewhat random. In addition, I have
to assess and comprehend. There are several created two zoning patterns: the first pattern,
variables acting both independently and Configuration 1 in Figure 7.4, follows closely
together to determine the impacts of the the patterns of Variable 1 in Figure 7.4(a); the
zoning effect. To illustrate the roles of second pattern, Configuration 2 in Figure 7.4,
some of these variables, I have created two cuts through different zones of Variable 1.
artificial landscapes (Figure 7.4a and b). Both When Configuration 1 is applied to Vari-
landscapes have the same number of areal able 1, we expected that the general spatial
units (100) and the same set of values. For trend will likely be preserved, while this
Configuration 1
Variable 1
2 23
24 40
41 57
58 75
76 96
(a)
Configuration 2
Variable 2
2 23
24 40
41 57
58 75
76 96
(b)
Figure 7.4 100 units with (a) positive spatial autocorrelation and (b) random pattern; and
two hypothetical zoning systems: Conguration 1 and Conguration 2.
Configuration 1 Configuration 2
Variable 1 Variable 1
15 42.8
15 37 42.8 47.28
37 57.76 47.28 49.56
57.76 84.64 49.56 54.76
Variable 2 Variable 2
40.68 43.24
40.68 49.56 43.24 45.12
49.56 50.56 45.12 46.44
50.56 53.5 46.44 59.6
Figure 7.5 Congurations 1 and 2 applied to Variables 1 and 2.
will not be the case when Configuration 2 the spatial pattern. When Configuration 2
is applied to Variable 1. On the other was applied to Variable 1, the minimum
hand, because the spatial distribution of value was inflated, but the standard deviation
Variable 2 is somewhat random, imposing was greatly suppressed. For Variable 2, we
Configurations 1 or 2 will unlikely create see no obvious differences in statistics when
major differences. Figure 7.5 shows the different configurations were applied, as the
results; besides Variable 1Configuration 1, values are spatially random. In other words,
we cannot identify any pattern. Note that the zoning effect will be minimal if the
the ranges of values in other situations phenomenon exhibits a somewhat random
are much smaller than that in Variable pattern. But if the phenomenon exhibits
1Configuration 1. strong positive spatial autocorrelation, then
Table 7.3 shows the details for some of the we should expect some significant impacts
statistics. The first row in Table 7.3 (V1 and due to the zoning effect.
V2) lists the statistics of the original values. Besides the spatial distribution of the
Assuming that we aggregate the original 100 data, another major factor in determining
areal units into four larger units by taking the impacts of the MAUP is the spatial
the averages of the original values, the first aggregation mechanism, or the process used
batch of rows shows the results from the to derive a representative value for the
averaging process. The only situation that aggregated units. The above example used
can preserve some of the statistics (standard averaging as the process, i.e., the average
deviation and to some extent maximum) of value of the original data will be used for the
the original values reasonably well is V1C1, aggregated unit. But there are other possible
when the spatial configuration coincides with choices for the representative values, such
Table 7.3 Selected statistics for using the two hypothetical congurations (1 and 2) to
aggregate Variable 1 and Variable 2 in Figure 5
Variables (V1 and V2) and Mean Minimum Maximum Standard deviation
congurations (C1 and C2)
V1 and V2 49.00 2.00 96.00 27.00
V1C1 (Average) 48.60 15.00 84.00 29.70

V1C2 (Average) 48.60 42.80 54.76 4.98
V2C1 (Average) 48.60 40.68 53.60 5.55
V2C2 (Average) 48.60 43.24 59.60 7.45
V1C1 (Minimum) 37.00 2.00 71.00 29.00

V1C2 (Minimum) 7.00 2.00 11.00 4.00
V2C1 (Minimum) 5.00 2.00 10.00 3.00
V2C2 (Minimum) 7.00 2.00 12.00 5.00
as median, minimum, maximum and others. One needs to recognize that this division
The second batch of rows in Table 7.3 shows is somewhat artificial and not exclusive in
the aggregation results when the minimum nature, since their labels simply reflect the
values are taken as the representative values. dominant types of research during those
Again, applying Configuration 1 to periods.
Variable 1 best preserves the original
information, but still the results are quite
different from using the averaging process
7.5.1. Discovery and impact
and the original values. Therefore, how
assessment
values of sub-units are aggregated to larger
units will also affect the magnitude of the The impacts of MAUP have been
MAUP. Although our discussion focuses on documented thoroughly. Given that changing
the zoning effect, both the spatial distribution the correlation among variables is a typical
of the data and the aggregation mechanism and fundamental impact of the MAUP, it
are also applicable to explain the scale effect. is not surprising to find that most statistical
analyses are subject to the MAUP. In
addition, non-statistical-based mathematical
models or quantitative methods are also
7.5. STAGES OF THE MAUP likely impacted by the MAUP. Although
RESEARCH Openshaw and Taylor coined the term, many
researchers prior to them had documented
In the following section, I attempt to provide some aspects of the MAUP. The earliest
an account of MAUP research over the past seems to be the work by Gehlke and Biehl
several decades. To facilitate the discussion, (1934), who reported patterns of correlation
I divide the history into two periods charac- coefficient changes when census tracts were
terized by the major types of MAUP research grouped differently. Another early work by
appearing during those periods: discovering Robinson (1956) moved a step forward by
and assessing the impacts of the problem; and arguing that a weighting scheme was neces-
conceptualizing and formulating solutions. sary to correct the correlation coefficient to
account for different numbers of observations his entropy-based approach to deal with
among areal units. While not targeted at the the aggregation problem in the context
MAUP specifically, Moellering and Tobler of developing gravity-based models (Batty
(1972) offered a better understanding of the and Sikdar, 1982ad). Putman and Chung
smoothing process of the scale effect by (1989) also joined the British geographers
explaining how variance changes over scale to address zonal design issues for spatial
levels. Sawicki (1973), and later Clark and interaction models. Blair and Miller (1983)
Avery (1976), launched among the earliest demonstrated the impacts of MAUP on
attempts to assess the MAUP effects on inputoutput models.
general statistical analyses. Perle (1977) The formation of the NCGIA and the
explicitly links the MAUP to the issue of launching of the spatial data accuracy
ecological fallacy (Wong, 2007), although research initiative created a boost for the
the potential problems of using ecological MAUP research since 1989. Fotheringham
correlation to infer individual behavior (1989) called for the recognition of scale
were well documented (Robinson, 1950). sensitivity issues in spatial analysis, as well
Parallel to these developments, some British as the need to perform multi-scale analyses.
geographers, including Openshaw, focused In the same volume, Tobler (1989) argued
on a related issue of developing optimal zonal that the MAUP is a spatial problem and
systems, partly for regionalization purposes therefore the solution has to be spatial
and partly to deal with the MAUP problem. in nature. Subsequently, he proposed a
As Batty (1976) adopted the information migration modeling framework that was
approach to handle spatial aggregation, not sensitive to scale changes, probably
others aimed at designing the best zonal the first scale-independent spatial analytical
systems to support spatial interaction technique to be introduced. Unrelated to
modeling (Masser and Brown, 1975; the development of NCGIA, Arbia (1989)
Openshaw, 1977a, b, 1978a, b). Creating published a highly in-depth monograph
zones or regions is often needed in regional addressing the MAUP.
analysis, and these zones or regions provide
the basis for locationallocation models.
Goodchild (1979) first recognized the MAUP
7.5.2. From conceptualization
effect on locationallocation modeling.
to problem solving
Mathematical modelers occasionally picked
up this topic (Fotheringham et al., 1995; With the NCGIA research initiative on spatial
Hodgson et al., 1997; Murray and Gottsegen, data accuracy as the platform, a new wave
1997; Horner and Murray, 2002), but these of research activities on the MAUP began in
studies were limited to assessing the impacts the early 1990s, starting with the paper by
of the MAUP. Fotheringham and Wong (1991), a frequently
After Openshaw and Taylor coined the cited paper, systematically addressing the
term MAUP in 1979, the next major impacts of the MAUP on correlation analysis
concerted effort in addressing the MAUP and regression models. While researchers
started around 1989, partly due to the were still interested in, and to some extent
research initiative of the National Center for obsessed with the impacts of the MAUP,
Geographic Information Analysis (NCGIA) the community had gradually moved toward
on data accuracy. In between, there were finding solutions to the MAUP. This search
intermittent developments in identifying dif- for solutions was in parallel to the effort
ferent aspects of the MAUP. Batty continued of several researchers who had provided
evidence that the MAUP effects may not be optimal zoning was firmly believed to be a
as pervasive as some others claimed (e.g., potential solution to the MAUP in the past
Fotheringham and Wong, 1991). Amrhein (Openshaw, 1977a), and this direction was
and Flowerdew (1989) show that the MAUP still an interest at this stage of the research
has limited impact on Poisson regressions. (Openshaw and Schmidt, 1996).
Trying to identify when MAUP will be Most of the research on the MAUP
significant, Amrhein (1995) and Amrhein mentioned above focused on aggregating
and Reynolds (1996) conducted a series of polygon feature data, a popular operation
simulation, controlling for various statistical in manipulating vector format data in GIS
properties of the data, including various and frequently used in the handling of
levels of spatial autocorrelation. They con- socioeconomic phenomena. However, the
cluded that the MAUP effects may not be impacts of the MAUP are also present in
significant given certain levels of aspatial and physical geography, environmental modeling
spatial correlation among variables, but their and in general, the analysis of raster format
relationships are extremely complex. While data. Outside of human geography, some
most impact analyses of the MAUP focused landscape ecologists and physical geogra-
on statistical or mathematical modeling, phers started developing an appreciation
some analyses were more narrowly focused of the MAUP problems (Jelinski and Wu,
on index formulations, particularly using 1996), and a series of research followed this
indices to measure segregation (Wong, 1997; direction. While Arbia et al. (1996) might
Wong et al., 1999). Besides conceptualizing have been the first linking the scale effect
the scale effect on measuring segregation, in raster or remote sensing data analysis
this line of research also shows that spa- to the MAUP explicitly, the scale effect
tial measures are likely more sensitive to or scale dependency issue was definitely
changing scale than aspatial measures (Wong, not new to remote sensing scientists (e.g.,
2004). Bian and Walsh, 1993) since remote sens-
A coordinated effort during this phase ing data are often available and can be
of the MAUP research was the publishing tabulated easily to multiple scale levels
of a special issue of Geographical Systems (Bian, 1997). Part of the issue, which
(Wong and Amrhein, 1996). In this special has historically been a problem in remote
issue, some researchers still focused on the sensing analysis, is to select the resolution
MAUP effects (e.g., Okabe and Tagashira, appropriate for the analysis (e.g., Townshend
1996; Hunt and Boots, 1996), but others and Justice, 1988). Lam and Quattrochi
delved deeper into the sources of the MAUP (1992) reviewed several concepts related to
(e.g., Amrhein and Reynolds), including the scale and resolution, attempting to address
change-of-support concept in geostatistics the issue of choosing the optimal scale
(Cressie, 1996). A clear direction was to or resolution to analyze a particular phe-
develop solutions. Holt et al. (1996) argued nomenon. Some researchers also recognized
that the source of the scale effect was the that the scale effect is essentially a change-
changes in correlation between variables and of-support problem in geostatistics (Atkinson
thus they proposed a framework to model and Curran, 1995). The edited volume by
the changes of correlation over scale by Quattrochi and Goodchild (1997) collected
taking into account spatial autocorrelation papers partly focusing on the impacts of
implicitly. Unfortunately, the complexity of the MAUP on remote sensing, and also
the computational method was beyond a on modeling the scale effect and develop-
practical solution to the problem. Creating ing solutions (e.g., Bian, 1997; Xia and
Clarke, 1997). Still, no clear solutions have spatial in nature. Thus, he called for the
been identified. development of scale-insensitive or frame-
Outside of the geographical literature, the independent spatial analytical techniques to
MAUP attracted additional attention after deal with the MAUP and he employed a
the appearance of Kings monograph (1997), population migration model that was rela-
which focused on ecological inference issues tively insensitive to scale changes (Tobler,
across social science disciplines, but also 1989). Toblers migration model is one
addressed the related MAUP. He made a bold of the very few analytical tools that are
claim that an error-bound approach can solve relatively scale-insensitive. Another one that
the scale effect, part of the MAUP and is has demonstrated some level of stability
conceptually related to the ecological fallacy in correlation over different scale levels is
problem. His claim triggered reactions from location-specific correlation analysis (Wong,
the geographic realm, and some of these 2001). But all of these potentially scale-
reactions were aired through a series of coor- insensitive tools have limited applications.
dinated comments (Sui, 2000), although the A popular spatial solution to the MAUP
focus was still on the ecological fallacy issue. even before Openshaw and Taylor coined the
But geographers responses (Fotheringham, term was to create optimal zoning systems
2000; Anselin, 2000; OLoughlin, 2000) (Openshaw, 1977a, b, 1978a, b; Openshaw
were not too optimistic that Kings solutions and Baxter, 1977; Openshaw and Rao, 1995;
can solve the ecological fallacy issue and Openshaw and Schmidt, 1996). Given that
specifically the MAUP. On the other hand, most aggregation problems involve multiple
Johnston and Pattie (2001) rebuffed the claim variables, derivations of zonal systems have
that geographers have not spent adequate to be based upon multiple variables and
effort in dealing with the ecological fallacy multiple objectives. In general, the principle
by citing previous research on entropy is to create zonal systems to minimize the
maximization, which offers promising results intra-zonal variances and to maximize inter-
in dealing with the ecological inference zonal variances. But often there is no unique
problem. solution and therefore, heuristic processes
seem to be quite promising (Bong and Wang,
2004).
Recently, the edited volume by Tate and
7.6. POTENTIAL SOLUTIONS Atkinson (2001) pointed to three directions of
MAUP research: impacts of the scale effects,
The recent exchanges between geographers the potential of fractal analysis in dealing
and King raise doubt that Kings solutions the scale issue and the use of geostatistics,
can solve the MAUP. Even though the early specifically kriging and related methods such
phase of the MAUP research was fascinated as variograms to handle and model the scale
by the pervasiveness of the MAUP effects effect. Although the intended coverage of
and overwhelmed by impact-analysis type the volume included both vector and raster
of studies, researchers have never stopped data, the impact assessment tended to focus
searching for solutions since the very more on vector data while the modeling
beginning. Robinson (1956) suggested sim- and solutions were geared more toward
plistic weighting methods to overcome some raster data. Fractals have a strong relationship
of the MAUP effects on correlation analysis. historically with the scale effect as remote
Tobler (1989) argued that because the MAUP sensing data can be tabulated and analyzed
is a spatial problem, solutions have to be at multiple scale levels and fractal geometry
is a powerful mathematical tool to handle internal homogeneity (low variance), and the
multiscale phenomena (Lam and Quattrochi, magnitude of the scale effect will be partly
1992; Lam and De Cola, 1993; Pecknold a function of the internal homogeneity. As
et al., 1997; Quattrochi et al., 1997). The a result, one may model the scale effect
volume by Tate and Atkinson (2001) includes or statistics describing the data at different
several papers on using fractals to handle scale levels as long as we can establish
the scale problem. But so far, although the rules of aggregation and how the scale
fractal analysis has been demonstrated to effect is related to the level of internal
be effective in describing and modeling homogeneity. Since the foundation of most
phenomena at multiple scales, it has not yet classical statistics is the variancecovariance
been proven to be a viable solution to the matrix, this group of researchers proposed
MAUP, or more specifically the scale effect. using the correlation at the individual level
Tate and Atkinson (2001) also suggested to estimate the correlation at the aggregated
geostatistical analysis as a potential solution level and thus can estimate the variance
to the scale problem. Geostatistical tools, covariance matrix at the aggregate level.
especially variograms, can identify the geo- The statistical derivations involved were very
graphical range of spatial autocorrelation. sophisticated and the computation was very
This is an important piece of information demanding. As a result, this has not been a
to understand and model the scale effect. practical solution to the MAUP.
They claimed that geostatistical tools are Although tremendous efforts have been
not used to rescale the data themselves, spent to deal with the scale problem, to many
but to rescale statistics describing the data researchers, the zoning problem seems to
(Atkinson and Tate, 2000). This is an inter- be easier to handle. Flowerdew and Green
esting idea, but has not been fully validated (1989, 1992) treated the zoning problem in
or operationalized. More recently, following the same way as resolving incompatible zonal
the introduction of Geographically Weighted systems. The general approach is to use spa-
Regression (GWR), the potential for using tial interpolation methods to transform data
GWR to depict spatial heterogeneity related gathered according to one zonal pattern to
to the MAUP was alluded to (Fotheringham another pattern. Fisher and Langford (1995,
et al., 2000). Because a major source of the 1996) have evaluated the reliability of this
scale effect is spatial heterogeneity and GWR technique in handling the zoning problem.
can model local variability reasonably well, A related technique, dasymmetric mapping,
it is believed that GWR may be more robust was also shown to be effective to handle
than other global models and less sensitive to incompatible zonal patterns from a carto-
the scale effect (Fotheringham et al., 2002, graphic perspective (Fisher and Langford,
pp. 144158). Still GWR cannot really be 1996; Mennis, 2003). An older smoothing or
regarded as a solution to the scale effect or interpolation technique, the smooth pycno-
the MAUP. phylactic interpolation introduced by Tobler
Somewhat similar to the geostatistical (1979), has also been revisited and is believed
approach to rescale statistics over multiple to be a solution candidate for the MAUP,
scale levels was the direction taken by a specifically in addressing the problem from
group of social statisticians (Holt et al., the change-of-support perspective (Gotway
1996; Steel and Holt, 1996; Tranmer and and Young, 2002).
Steel, 1998). They realized that the scale To summarize, the MAUP effects can
effect can be kept to a minimum when possibly be tackled by sophisticated models
the aggregated areas have a high degree of and computationally intensive techniques,
while their practical and operational poten- forward to the stage that some very promising
tials are yet to be affirmed. Relatively simple and operational modeling techniques are
techniques can handle the zoning problem, available to handle spatial autocorrelation
but not the scale problem. Thus, without quite effectively (e.g., Griffith, 2003). For
generally feasible methods to handle the the MAUP, we have accumulated pieces of
MAUP, the old call for recognizing the knowledge and developed some comprehen-
MAUP is still the most affordable approach sive understanding and conceptualizations
to deal with this long-term stubborn problem of the problems. But a systematic research
(Fotheringham, 1989). Given the advances agenda seems to be needed in order to
in GIS technology and computational tools, bring significant advancements along this
and the availability of digital data at various direction. Assessing the impacts of the
scales, repeating the same analysis but MAUP should be a topic confined to the past,
using different scales or partitioning schemes and the future should focus on developing
is within reach of most researchers. This operational solutions.
approach is probably the minimum standard
in handling the MAUP given where we are
on this topic.
Taking one step further, using segregation REFERENCES
indices as examples, Wong (2003) disaggre-
gated segregation at different geographical Amrhein, C.G. (1995). Searching for the elusive
aggregation effect: evidence from statistical
levels to demonstrate that one can document simulations. Environment and Planning A, 27:
the sources of the MAUP effects. This 105119.
accounting framework is to identify and
Amrhein, C.G. and Flowerdew, R. (1989). The effect
quantify the amount of the MAUP effects of data aggregation on a Poisson regression model
contributed by different locations at different of Canadian migration. In: Goodchild, M.F. and
scale levels. This detailed mapping of the Gopal, S. (eds), Accuracy of Spatial Databases,
MAUP effects by scale and space is not just pp. 229238. London: Taylor and Francis.
informative, but also sheds light on where the Amrhein, C.G. and Reynolds, H. (1996). Using spatial
MAUP effects may be the most acute in the statistics to assess aggregation effects. Geographical
geographic hierarchy and highlights locations Systems, 3(2/3): 143158.
that deserve more attention. Anselin, L. (2000). The alchemy of statistics, or creating
data where no data exist. Annals, Association of
American Geographers, 90(3): 586592.
Arbia, G. (1989). Spatial Data Conguration in
7.7. CONCLUDING REMARK Statistical Analysis of Regional Economic and Related
Problems. Dordrecht: Kluwer.
Many methodological or technical problems
Arbia, G., Benedetti, R. and Espa, G. (1996). Effects
can be found in the geographical literature.
of the MAUP on image classication. Geographical
Some have broad impacts and are very Systems, 3(2/3): 123141.
complex, while some are confined to certain
Atkinson, P.M. and Curran, P.J. (1995). Dening
areas and are more manageable. Two very an optimal size of support for remote sensing
stubborn but pervasive problems in statistical investigations. IEEE Transactions on Geosciences and
analysis of spatial data are spatial auto- Remote Sensing, 33(3): 768776.
correlation and the MAUP. The past two Atkinson, P.M. and Tate, N.J. (2000). Spatial scale
decades of research in spatial statistics and problems and geostatistical solutions: a review. The
spatial econometrics have moved the field Professional Geographer, 52(4): 607623.
Batty, M. (1976). Entropy in spatial aggregation. Flowerdew, R. and Green, M. (1989). Statistical
Geographical Analysis, 8: 121. methods for inference between incompatible zonal
systems. In: Goodchild, M. and Gopal, S. (eds),
Batty, M. and Sikdar, P.K. (1982a). Spatial aggregation
The Accuracy of Spatial Data Bases, pp. 239247.
in gravity models: 1. An information-theoretic
London: Taylor and Francis.
framework. Environment and Planning A, 14:
377405. Flowerdew, R. and Green, M. (1992). Developments
in areal interpolating methods and GIS. Annals of
Batty, M. and Sikdar, P.K. (1982b). Spatial aggregation Regional Science, 26: 6778.
in gravity models: 2. One-dimensional population
density models. Environment and Planning A, 14: Fotheringham, A.S. (1989). Scale-independent spatial
525553. analysis. In: Goodchild, M. and Gopal, S. (eds),
Accuracy of Spatial Databases. pp. 221228.
Batty, M. and Sikdar, P.K. (1982c). Spatial aggre- London: Taylor and Francis.
gation in gravity models: 3. Two-dimensional trip
distribution and location models. Environment and Fotheringham, A.S. (2000). A bluffers guide to
Planning A, 14: 629658. a solution to the ecological inference problem.
Annals, Association of American Geographers,
Batty, M. and Sikdar, P.K. (1982d). Spatial aggregation 90(3): 582586.
in gravity models: 4. Generalisations and large-
scale applications. Environment and Planning A, 14: Fotheringham, A.S., Brunsdon, C. and Charlton, M.E.
795822. (2000). Quantitative Geography: Perspectives on
Spatial Analysis. London: Sage.
Blair, P. and Miller, R.E. (1983). Spatial aggregation
in multiregional inputoutput models. Environment Fotheringham, A.S., Brunsdon, C. and Charlton, M.E.
and Planning A, 15: 187206. (2002). Geographically Weighted Regression.
England: Wiley & Sons.
Bian, L. (1997). Multiscale nature of spatial data in
scalling up environment models. In: Quattrochi, D.A. Fotheringham A.S., Densham P.J. and Curtis A.
and Goodchild, M.F. (eds). Scale in Remote Sensing (1995). The zone denition problem in location-
and GIS. pp. 1327. Lewis Publishers. allocation modeling. Geographical Analysis, 27:
6077.
Bian, L. and Walsh, S. (1993). Scale dependencies of
vegetation and topography in a mountainous envi- Fotheringham A.S. and Wong, D.W.S. (1991). The
ronment of Montana. The Professional Geographer, modiable areal unit problem in multivariate
45(1): 111. statistical analysis. Environment and Planning A, 23:
10251044.
Bong, C.W. and Wang, Y.C. (2004). A multiobjective
hybrid metaheuristic approach for GIS-based spatial Gehlke, C.E. and Biehl, K. (1934). Certain effects of
zoning model. Journal of Mathematical Modelling grouping upon the size of the correlation coefcient
and Algorithms, 3: 245261. in census tract material. Journal of the American
Statistical Association, 29: 169170.
Clark, W.A.V. and Avery, K.L. (1976). The effects of
Gotway, C.A. and Young, L.J. (2002). Combining
data aggregation in statistical analysis. Geographical
incompatible spatial data. Journal of the American
Analysis, 8: 428438.
Cressie, N. (1996). Change of support and the mod-
Grifth, D.A. (1987). Spatial Autocorrelation: a Primer.
iable areal unit problem. Geographical Systems,
Washington, D.C.: Association of American
3(2/3): 159180.
Geographers.
Fisher, P.F. and Langford, M. (1995). Modelling
Grifth, D.A. (2003). Spatial Autocorrelation and
the errors in areal interpolation between zonal
Spatial Filtering. Berlin: Springer-Verlag.
systems by Monte Carlo simulation. Environment and
Planning A, 27(2): 211224. Grifth, D.A. and Amrhein, C.G. (1997). Multivariate
Statistical Analysis for Geographers. Upper Saddle
Fisher, P.F. and Langford, M. (1996). Modeling
River, NJ: Prentice Hall.
sensitivity to accuracy in classied imagery: a study
of areal interpolation by dasymetric mapping. The Goodchild, M.F. (1979). Aggregation problem in location-
Professional Geographer, 48(3): 299309. allocation. Geographical Analysis, 11: 240255.
Hodgson, M.J., Shmulevitz, F. and Krkel, M. (1997). OLoughlin, J. (2000). Can Kings ecological inference
Aggregation error effects on the discrete-space method answer a social scientic puzzle: who
p-median model: the case of Edmonton, Canada. voted for the Nazi party in Weimar Germany?
Canadian Geographer, 41: 415429. Annals, Association of American Geographers,
90(3): 592601.
Holt, D., Steel, D.G. and Tranmer, M. (1996). Area
homogeneity and the Modiable Areal Unit Problem. Openshaw, S. (1977a). Geographical solution to
Geographical Systems, 3(2/3): 181200. scale and aggregation problems in region-building,
partitioning and spatial modeling. Transactions of
Horner, M.W. and Murray, A.T. (2002). Excess
the Institute of British Geographers, 2: 459472.
commuting and the modiable areal unit problem.
Urban Studies, 39: 131139. Openshaw, S. (1977b). Optimal zoning systems
Hunt, L. and Boots, B.N. (1996). MAUP effects in for spatial interaction models. Environment and
the principal axis factoring technique. Geographical Planning A, 9: 169184.
Systems, 3(2/3): 101122. Openshaw, S. (1978a). An optimal zoning approach to
Jelinski, D.E. and Wu, J. (1996). The modiable areal the study of spatially aggregated data. In: Masser, I.
unit problem and implications for landscape ecology. and Brown, P.J.B. (eds), Spatial Representation and
Landscape Ecology, 11: 129140. Spatial Interaction, pp. 93113. Leiden: Martinus
Nijhoff.
Johnston, R. and Pattie, C. (2001). On geographers
and ecological inference. Annals, Association of Openshaw, S. (1978b). Empirical-study of some zone-
American Geographers, 91(2): 281282. design criteria. Environment and Planning A, 10:
781794.
King, G. (1997). A Solution to the Ecological Inference
Problem: Reconstructing Individual Behavior from Openshaw, S. (1984). The Modiable Areal Unit Prob-
Aggregate Data. Princeton: Princeton University lem. CATMOG, 38. Norwich, England: Geobooks.
Press. Openshaw, S. (1996). Developing GIS-relevant zone-
Lam, N.S.-N. and De Cola, L. (eds) (1993). Fractals in based spatial analysis methods. In Longley, P. and
Geography. Englewood Cliffs, NJ: Prentice-Hall. Batty, M. (eds), Spatial Analysis: Modelling in a
GIS Environment, pp. 5573. Cambridge, U.K.:
Lam, N.S.-N. and Quattrochi, D.A. (1992). On the GeoInformation International.
issues of scale, resolution, and fractal analysis in
the mapping sciences. The Professional Geographer, Openshaw, S. and Baxter, R.S. (1977). Algorithm 3
44(1): 8898. procedure to generate pseudo-random aggregations
of n-zones into m-zones, where m is less than n,
Masser, I. and Brown P.J.B. (1975). Hierarchical aggre- Environment and Planning A, 9: 14231428.
gation procedures for interaction data. Environment
and Planning A, 7: 509523. Openshaw, S. and Rao L. (1995). Algorithms for
reengineering 1991 census geography. Environment
Mennis, J. (2003). Generating surface models of popu- and Planning A, 27: 425446.
lation using dasymetric mapping. The Professional
Geographer, 55(1): 3142. Openshaw, S. and Schmidt, J. (1996). Parallel simulated
annealing and genetic algorithms for re-engineering
Moellering, H. and Tobler, W.R. (1972). Geographical zoning systems. Geographical Systems, 3(2/3):
variances. Geographical Analysis, 4: 3442. 201220.
Murray, A. and Gottsegen, J. (1997). The inuence Openshaw, S. and Taylor, P.J. (1979). A million or
of data aggregation on the stability of p-median so correlation coefcients: three experiments on
location model solutions. Geographical Analysis, 29: the modiable areal unit problem. In: Wrigley, N.
200213. (ed.), Statistical Applications in the Spatial Sciences,
Odland, J. (1988). Spatial Autocorrelation. London: pp. 127144. London: Pion.
Sage.
Pecknold, S., Lovejoy, S., Schertzer, D. and Hooge, C.
Okabe, A. and Tagashira, N. (1996). Spatial aggre- (1997). Multifractals and resolution dependence of
gation bias in a regression model containing a remotely sensed data: GSI to GIS. In: Quattrochi, D.A.
distance variable. Geographical Systems, 3(2/3): and Goodchild, M.F. (eds), Scale in Remote Sensing
7799. and GIS, pp. 361394. Lewis Publishers.
Perle, E.D. (1977). Scale changes and impacts Townshend, J.R.G. and Justice, C.O. (1988). Selecting
on factorial ecology structures. Environment and the spatial resolution of satellite sensors required for
Planning A, 9: 549558. global monitoring of land transformation. Interna-
tional Journal of Remote Sensing, 9: 187236.
Putman, S.H. and Chung, S.H. (1989). Effects of spatial
system-design on spatial interaction models.1. the Tranmer, M. and Steel, D.G. (1998). Using census data
spatial system denition problem. Environment and to investigate the causes of the ecological fallacy.
Planning A, 21: 2746. Environment and Planning A, 30: 817831.
Quattrochi, D.A. and Goodchild, M.F. (eds) (1997). Wong, D.W.S. (1995). Aggregation effects in geo-
Scale in Remote Sensing and GIS. Lewis Publishers. referenced data. In: Arlinghaus, S.L. and Grifth,
D.A. (eds), Practical Handbook of Spatial Statistics,
Quattrochi, D.A., Lam, N.S.-N., Qiu, H-L. and Zhao, W.
pp. 83106. Boca Raton, FL: CRC Press.
(1997). ICAMS: A geographic information system
for the characterization and modeling of multiscale Wong, D.W.S. (1997). Spatial dependency of segre-
remote sensing data. In: Quattrochi, D.A. and gation indices. The Canadian Geographer, 41(2):
Goodchild, M.F. (eds), Scale in Remote Sensing and 128136.
GIS, pp. 295308. Lewis Publishers.
Wong, D.W.S. (2001). Location-specic cumulative
Robinson, A.H. (1956). The necessity of weighting distribution function (LSCDF): an alternative to
values in correlation analysis of areal data. Annals, spatial correlation analysis. Geographical Analysis,
the Association of American Geographers, 46: 33(1): 7693.
233236.
Wong, D.W.S. (2003). Spatial decomposition of
Robinson, W.S. (1950). Ecological correlations and segregation indices: a framework toward measur-
the behavior of individuals. American Sociological ing segregation at multiple levels. Geographical
Review, 15: 351357. Analysis, 35(3): 179194.
Sawicki, D.S. (1973). Studies of aggregated areal data Wong, D.W.S. (2004). Comparing traditional and
problems of statistical inference. Land Economics, spatial segregation measures: a spatial scale
49: 109114. perspective, Urban Geography, 25(1): 6682.
Steel, D.G. and Holt, D. (1996). Rules for random Wong, D.W.S. (2007). Ecological fallacy, In: B. Warf
aggregation. Environment and Planning A, 28: (ed), Encyclopedia of Human Geography, Sage
957978. Publications, pp. 117118.
Sui, D. (2000). New directions in ecological inference: Wong, D.W.S. and Amrhein, C.G. (1996). Research on
an introduction. Annals, Association of American the MAUP: old wine in a new bottle or real break-
Geographers, 90(3): 579582. through? Geographical Systems, 3(2/3): 7377.
Tate, N.J. and Atkinson, P.M. (eds) (2001). Modelling Wong, D.W.S., Lasus, H. and Falk, R.F. (1999).
Scale in Geographical Information Sciences. London: Exploring the variability of segregation index D with
Wiley & Sons. scale and zonal systems: an analysis of thirty US
cities. Environment and Planning A, 31: 507522.
Tobler, W. (1979). Smooth pycnophylactic interpolation
for geographical regions. Journal of the American Wong, D.W.S. and Lee, J. (2005). Statistical Analysis of
Statistical Association, 74: 519536. Geographic Information. New York: Wiley and Sons.
Tobler, W. (1989). Frame independent spatial analysis. Xia, Z-G. and Clarke, K.C. (1997). Approches to
In: Goodchild, M. and Gopal, S. (eds), The Accuracy scaling of geo-spatial data, In: Quattrochi, D.A. and
of Spatial Data Bases, pp. 115122. London: Taylor Goodchild, M.F. (eds), Scale in Remote Sensing and
and Francis. GIS, pp. 309360. Lewis Publishers.
8
Spatial Weights
Robin Dubin
8.1. WEIGHT MATRICES on a regular grid or lattice, but this is not

necessary. The numbers in the weight matrix
A weight matrix summarizes the spatial can indicate whether or not a relationship is
relationships in the data. In particular, the ith present or they can indicate the strength of the
row of a weight matrix shows observation is relationship. The former weighting schemes
relationship to all of the other observations. are called discrete, and the latter, continuous.
By convention, the main diagonal of this It is common, but not necessary, to row
matrix consists of zeros. Because the weight normalize the weight matrix. This means
matrix shows the relationships between that the matrix is transformed so that each
all of the observations, its dimension is of the rows sums to one. Row normalizing
always N N, where N is the number gives the weight matrix some nice theoretical
of observations. In most applications, the properties. For example, row normalizing
weight matrix itself is treated as exogenous; allows the weight matrices from different
that is, it is assumed that the researcher weighting schemes to be compared, since
knows how the observations are related to all elements must lie between 0 and 1
each other. Note that the space in which (inclusive). Row normalizing also allows l
the observations are located need not be (a parameter discussed later in the chapter)
geographic; any type of space is acceptable, to be bounded by 1 and 1. All of the
as long as the researcher can specify the weight matrices presented in this chapter
spatial interactions. will be row normalized. The cost of row
Spatial data can appear in many forms. The normalizing is that it may interfere with the
data can come from regions (e.g., counties) or interpretation of the weights. For example,
points (e.g., houses). The data may be located in the case of inverse distance weighting,
row normalizing will change the weights so The most natural way to represent the
that they sum to one. Thus pairs with the spatial relationships with areal data is through
same separation distance can have different the concept of contiguity. That is, regions
weights, depending on the number of nearby will be considered to be related if their
observations. boundaries share common points. There are
In the remainder of this chapter, I will three types of contiguity that are commonly
explore weight matrices for the following considered: rook contiguity, bishop conti-
cases: regular lattice data for points, regular guity, and queen contiguity. Contiguity is
lattice data for areas, irregularly located data determined by imagining that the regions
for points, and irregularly located data for form a chess board; neighbors are determined
areas. by the regions that the appropriate chess piece
Consider the data presented in Figure 8.1. could reach.
This is a map of 25 regions arranged on a
regular lattice. The borders of the regions
are shown with solid lines, the centroids
8.1.1. Rook contiguity
are shown with heavy black points, and the
lattice itself is shown with dashed lines. Each With rook contiguity, the neighbors are due
region is identified by a number between north, south, east and west. Region 7s
1 and 25. neighbors are regions 2, 6, 8 and 12 and
5 10 15 20 25
5
4 9 14 19 24
4
3 8 13 18 23
3
2 7 12 17 22
2
1 6 11 16 21
1
1 2 3 4 5
Figure 8.1 Map of regular lattice areas.

SPATIAL WEIGHTS 127
5 10 15 20 25
5
4 9 14 19 24
4
3 8 13 18 23
3
2 7 12 17 22
2
1 6 11 16 21
1
1 2 3 4 5
Figure 8.2 Neighbors in rook contiguity.
are indicated with stars in Figure 8.2. The To obtain the row normalized version of
weight matrix for this data will have 25 rows this weight matrix, divide each row by the
and columns. The first 10 rows and columns number of neighbors (ones). Thus in rows
are of the unstandardized weight matrix are with 4 neighbors, the entries will be 0.25,
shown in Figure 8.3. and in rows with only two, the entries will
This symmetric matrix has zeros on its be 0.5. This is a common occurrence: row
main diagonal. A one indicates that regions normalizing often makes symmetric weight
i and j are neighbors. Regions in the interior matrices asymmetric.
of the study area will have four ones in their
rows. For example, the seventh row of the
weight matrix contains four ones, because
8.1.2. Bishop contiguity
region 7 has four neighbors (only three
are shown in Figure 8.3 because the fourth In bishop contiguity, region is neighbors are
neighbor is region 12). Regions on the located at its corners. Figure 8.4 shows the
periphery will have fewer neighbors. For neighbors for region 7 under this scheme. The
example, the first row (representing region 1) neighbors are regions 1, 3, 11 and 13.
has only two ones. These are in the second Again, regions in the interior will have four
and sixth cells, indicating that region 1 has neighbors, while those on the periphery will
only two neighbors: region 2 and region 6. have fewer. Figure 8.5 shows the first 10 rows
0.00 1.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00
1.00 0.00 1.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00
0.00 1.00 0.00 1.00 0.00 0.00 0.00 1.00 0.00 0.00
0.00 0.00 1.00 0.00 1.00 0.00 0.00 0.00 1.00 0.00
0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 1.00
1.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00
0.00 1.00 0.00 0.00 0.00 1.00 0.00 1.00 0.00 0.00
0.00 0.00 1.00 0.00 0.00 0.00 1.00 0.00 1.00 0.00
0.00 0.00 0.00 1.00 0.00 0.00 0.00 1.00 0.00 1.00
0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 1.00 0.00
Figure 8.3 Subset of unstandardized weight matrix for rook contiguity.
5 10 15 20 25
5
4 9 14 19 24
4
3 8 13 18 23
3
2 7 12 17 22
2
1 11
6 16 21
1
1 2 3 4 5
Figure 8.4 Neighbors in bishop contiguity.
and columns of the unstandardized weight

8.1.3. Queen contiguity
matrix for this case.
Examination of the first row of Figure 8.5 In queen contiguity, any region that touches
shows a 1 in position 7, indicating that the boundary of region i, whether on a side
region 7 is region 1s (only) neighbor. The or a corner, is considered to be a neighbor.
second row shows that region 2 has two The maximum number of neighbors for this
neighbors: regions 6 and 8. case is eight. In Figure 8.6, stars indicate
SPATIAL WEIGHTS 129
0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 1.00 0.00 1.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 1.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 1.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00
0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
1.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 1.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 1.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00
Figure 8.5 Subset of unstandardized weight matrix for bishop contiguity.
5 10 15 20 25
5
4 9 14 19 24
4
3 8 13 18 23
3
2 7 12 17 22
2
1 6 11
16 21
1
1 2 3 4 5
Figure 8.6 Neighbors in queen contiguity.
the neighbors for region 7 under queen columns of the unstandardized weight matrix
contiguity. are shown in Figure 8.7.
The weight matrix for queen contiguity Comparing Figure 8.7 to Figures 8.5 and
is the sum of the weight matrices for rook 8.6 shows that Figure 8.7 can be obtained
and bishop contiguity. The first 10 rows and by summing the other two weight matrices.
0.00 1.00 0.00 0.00 0.00 1.00 1.00 0.00 0.00 0.00
1.00 0.00 1.00 0.00 0.00 1.00 1.00 1.00 0.00 0.00
0.00 1.00 0.00 1.00 0.00 0.00 1.00 1.00 1.00 0.00
0.00 0.00 1.00 0.00 1.00 0.00 0.00 1.00 1.00 1.00
0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 1.00 1.00
1.00 1.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00
1.00 1.00 1.00 0.00 0.00 1.00 0.00 1.00 0.00 0.00
0.00 1.00 1.00 1.00 0.00 0.00 1.00 0.00 1.00 0.00
0.00 0.00 1.00 1.00 1.00 0.00 0.00 1.00 0.00 1.00
0.00 0.00 0.00 1.00 1.00 0.00 0.00 0.00 1.00 0.00
Figure 8.7 Subset of unstandardized weight matrix for queen contiguity.
For example, the first row now has three ones, The variance-covariance matrix for the
in positions 2, 6 and 7, showing that these data is given by the following formula:
three regions are neighbors of region 1.
1
= 2 (I lW ) (I lW ) (8.3)
8.2. CORRELATION MATRICES

where I is the identity matrix. The variance
covariance matrix can be converted into
Suppose that the data has been generated by
a correlation matrix, K, as shown in
the following process:
equation 8.4.
Y =+ (8.1) ij
Kij = (8.4)
ii jj
= lW + e (8.2)
As equations (8.3) and (8.4) show, the

where W is the weight matrix, and e is a correlation matrix depends on the choice
white noise error term (that is, the elements of W . In what follows, I examine the
of e are assumed to be independent and correlation matrices that result from using
have zero mean and constant variance, 2 ). the three types of contiguity: bishop, rook
In this system, Y is a random variable and queen. In the examples l is set to 0.67,
with constant mean, , and a spatially 2 to 1, and all weight matrices are row
correlated error term, . The error term normalized.
in the first equation is spatially correlated The correlation matrices for the example
because the error term for one observation set of regions will be 25 25. Because
depends on the value of the errors of it is difficult to look at so many numbers
its neighbors, as shown in equation 8.2. at one time, I will present the correlation
The parameter l shows the strength of matrices using symbols, rather than numbers.
the spatial autocorrelation and must lie A diamond will represent correlations that
between 1 and 1 for a normalized weight are equal to or greater than 0.8. A cross
matrix. will indicate that the correlation is less
SPATIAL WEIGHTS 131
0
1
2
3
4
6 5
7
26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Legend:
0.8 r
0.6 r < 0.8
0.4 r < 0.6
0.2 r < 0.4
r < 0.2
Figure 8.8 Correlation matrix for bishop contiguity.
than 0.8 but greater than or equal to 0.6.

8.2.1. Correlation matrix for
A triangle shows that the correlation is
bishop contiguity
between 0.4 and 0.6. A square shows that the
correlation is between 0.2 and 0.4, while a Even a brief examination of the three correla-
dot shows that the correlation is less than tion matrices shows that the three weighting
0.2. Figures 8.8, 8.9 and 8.10 show the schemes produce very different correlations
correlation matrices for bishop, rook and in the data. The bishops case produces
queen contiguity, respectively. a particularly interesting correlation matrix.
0
1
2
3
4
5
6
7
8
26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Legend:
0.8 r
0.6 r < 0.8
0.4 r < 0.6
0.2 r < 0.4
r < 0.2
Figure 8.9 Correlation matrix for rook contiguity.
In what follows, I will use region 7 for direct neighbors and region 7 would be
illustration purposes. Figure 8.11 shows the the same. However, Figure 8.11 shows
seventh row of the correlation matrix. that the correlations are 0.76, 0.59, 0.59
Recall that under bishop contiguity the and 0.48, respectively. The correlations
neighbors of region seven are regions 1, 3, 11 differ because these regions have different
and 13. One might then reasonably expect numbers of neighbors themselves, as shown
that the correlation between all of the in Table 8.1. The general rule is that the
SPATIAL WEIGHTS 133
0
1
2
3
4
5
6
7
8
26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Legend:
0.8 r
0.6 r < 0.8
0.4 r < 0.6
0.2 r < 0.4
r < 0.2
Figure 8.10 Correlation matrix for queen contiguity.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
0.76 0.00 0.59 0.00 0.22 0.00 1.00 0.00 0.31 0.00 0.59 0.00 0.48 0.00 0.20 0.00 0.31 0.00 0.20 0.00 0.22 0.00 0.20 0.00 0.14
Figure 8.11 Row 7 from the bishops correlation matrix.

Table 8.1 Neighbors of region 7 using effects. Edge effects occur when the spatial
bishop contiguity processes continue outside of the study area.
Neighbors Correlation Number of neighbors Regions with fewer neighbors are assigned
with region 7 of neighbors higher correlations; however, these regions
(excluding region 7)
occur on the boundaries of the study area,
1 0.76 0
where they will be influenced by regions not
3 0.59 1
11 0.29 1
included in the study.
13 0.48 3 Also of note in the correlation matrix
is the large number of zeros. These are
shown as dots in Figure 8.8 and can be seen
explicitly in Figure 8.11. This occurs because
there are regions which are impossible to
reach from region i using bishops moves.
more connected a region is, the lower For example, as Figure 8.12 shows, it is
its correlation with another region. This impossible to reach regions 8 or 10 from
makes sense because the more neighbors region 7, and so Figure 8.11 shows zeros for
a region has, the greater the different these cells.
influences on it. Figure 8.12 also shows that, although
Although sensible, this connectedness not direct neighbors, regions 9 and 5 can
property will increase the severity of edge be reached from region 7. It takes two
5 10 15 20 25
5
4 9 14 19 24
4
3 8 13 18 23
3
2 7 12 17 22
2
1 6 11 16 21
1
1 2 3 4 5
Figure 8.12 Relationship paths for bishop contiguity.

SPATIAL WEIGHTS 135
moves to reach region 9 and three to reach Table 8.2 Neighbors of region 7 using rook
region 5. Therefore, we should see non- contiguity
zero correlations in these cells, with a larger Neighbors Correlation Number of neighbors
correlation in cell 9 than in cell 5, and this is with region 7 of neighbors
(excluding region 7)
confirmed by Figure 8.11.
2 0.51 2
6 0.51 2
8 0.44 3
12 0.44 3
8.2.2. Correlation matrix for rook
contiguity
An examination of Figure 8.9 reveals a much
different pattern of correlations for the rooks
case. The 7th row of the correlation matrix is
shown in Figure 8.13. neighbors are also neighbors of region 7.
The same principal of greater connectivity Region 9 has only one neighbor in common
leading to smaller correlations continues to with region 7. This means that region
be demonstrated by the rook contiguity, as 13 should have the stronger relationship
shown in Table 8.2. with region 7, and this is borne out by
In the rooks case, it is possible to get Figure 8.13.
from one region to any other region, although
many moves may be required. This means
that there are no unrelated regions, as in
8.2.3. Correlation matrix for queen
the bishops case, and hence no zeros in
contiguity
the correlation matrix. There are, however,
many small values, and these are shown Although the weight matrix for the queens
as dots in Figure 8.9. The greater the case is the sum of the rooks and the
number of moves required, the smaller bishops case, the same cannot be said for
the correlation. For example, starting from the correlation matrix. However, the same
region 7, three moves are required to get principals noted above apply: more con-
to region 10 but only two to get to region nected neighbors have lower correlations and
9. Figure 8.13 shows that the correlation shared neighbors increase the correlations.
between regions 7 and 9 is larger than that The seventh row of the correlation matrix is
between regions 7 and 10. Likewise, it is shown in Figure 8.14.
also possible to get to region 25, but this Clearly, in the queens case, it is possible
requires 6 moves, and the correlation here is to reach one region from any other region,
very small (0.02). and so there are no isolated regions. However,
Finally, it is of interest to examine regions there are again some very small correlations,
9 and 13. Both are two moves away from which appear as dots in Figure 8.10.
region 7. However, two of region 13s Table 8.3 shows the relationship between
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
0.40 0.51 0.31 0.14 0.07 0.51 1.00 0.44 0.16 0.07 0.31 0.44 0.24 0.10 0.05 0.14 0.16 0.10 0.05 0.03 0.07 0.07 0.05 0.03 0.02
Figure 8.13 Row 7 from the rooks correlation matrix.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
0.52 0.49 0.40 0.18 0.11 0.49 1.00 0.39 0.19 0.11 0.40 0.39 0.32 0.15 0.09 0.18 0.19 0.15 0.10 0.07 0.11 0.11 0.09 0.07 0.05
Figure 8.14 Row 7 from the queens correlation matrix.
Table 8.3 Neighbors of region 7 using of its four neighbors are also neighbors to
queen contiguity region 7.
Neighbors Correlation Number of Number of
with region 7 neighbors shared
neighbors neighbors
1 0.52 2 2 8.3. CORRELOGRAMS
2 0.49 4 4
3 0.40 4 2
The final tool that I will use to analyze differ-
6 0.49 4 4
8 0.39 7 4 ences between these weighting schemes is the
11 0.40 4 2 correlogram. A correlogram shows how the
12 0.39 7 4 correlations change as the distance between
13 0.32 7 2 the regions increases. Thus, the correlation is
graphed on the vertical axis, and separation
distance is graphed on the horizontal axis.
In general, we expect that the correlations
will fall as separation distance increases.
connectedness and the correlations. The Figure 8.15 shows the correlograms for the
queens case is somewhat more complex than three cases under consideration.
the rook and the bishop because now there are Since the data is on a regular lattice,
more neighbors. and hence the centroids of the regions are
Examination of Table 8.3 shows that, as evenly spaced, one might think that the
before, the correlations with region 7 fall as correlations would be the same for any given
its neighboring regions have more neighbors separation distance. However, Figure 8.15
themselves. For example, region 1 has the shows that this is not the case: there is
smallest number of neighbors (two) and the a range of correlations for each separation
highest correlation (0.52). However, we see distance, although this range gets smaller as
that the relationship is more complex than the separation distance increases. The range
before, because regions 2, 3, 6, and 11 all of correlations comes from the dependence
have four neighbors but their correlations of the correlations on the connectedness of
differ. The answer can be found in the the neighbors and the number of shared
last column of Figure 8.17, which shows neighbors as discussed above.
the number of shared neighbors. That is, The correlations for both the rooks and the
these are the number of regions that are queens cases do tend to fall with separation
neighbors to both region 7 and the region distance. For the bishops case, there is a
in the first column. Thus, regions 3 and tendency for the correlations to decline with
11 have the same correlation because both separation distance, but this decline is not
the number of neighbors and the number monotonic. Relatively large correlations are
of shared neighbors is the same. Similarly, interrupted by the zero correlations of the
region 6 has a higher correlation because all isolated regions, as discussed previously.
SPATIAL WEIGHTS 137
Rook contiguity
1.0
0.8
0.6
Correlation
0.4
0.2 0.0 0.2
0 1 2 3 4 5 6
Separation distance
Bishop contiguity
1.0
0.8
0.6
Correlation
0.4
0.2 0.0 0.2
0 1 2 3 4 5 6
Separation distance
Queen contiguity
1.0
0.8
0.6
Correlation
0.4
0.2 0.0 0.2
0 1 2 3 4 5 6
Separation distance
Figure 8.15 Correlograms for regular lattice data.

11
2 6
10
5 7
9
1 9
8
8
7
3
6
3
4
2
10
1
1 2 3 4 5 6 7 8 9 10 11
Figure 8.16 Irregularly located point example data.
8.4. REGULAR LATTICE POINT DATA points, located as shown in Figure 8.16, for
purposes of illustration. The coordinates of
This is point data that is located at the inter- these points are given in Table 8.4.
section points of a regular grid. The data of I have chosen to use a small number
the previous section can be used here by sim- of points to keep the weight and corre-
ply considering the centroids of the regions to lation matrices small. Cluster 1 consists
be the data points. This means that applicable of observations 1 though 4. This cluster
weighting schemes include: rook, bishop and
queen contiguity. Also, weighting schemes
that are used primarily for irregularly located
point data can be used here as well. These
will be discussed in the next section. Table 8.4 Coordinates for irregularly
located point example data
Observation X Coordinate Y Coordinate
1 2 9
8.5. IRREGULARLY LOCATED 2 2 10
3 2 7
POINT DATA
4 2 3
5 9 9
The discussion thus far has pertained to data 6 9 10
located at regular intervals along a grid. 7 9.75 9
However, spatial data is not always located 8 9 7.75
so conveniently, and it is to this case that 9 7.5 9
10 11 1
we now turn. In what follows, I will use ten
SPATIAL WEIGHTS 139
is somewhat dispersed, with observation 4 The relatively small numbers in the shaded
having the weakest link. Cluster 2 consists upper left portion of Table 8.5 reveal
of observations 5 though 9. Cluster 2 is much Cluster 1. The very small numbers in the
tighter than Cluster 1. Observation 10 is an shaded lower right portion reveal Cluster 2.
isolated point and not part of any cluster. The large numbers in the last row and column
The example data makes it clear that this show that observation 10 is isolated from the
type of data is very different from the rest of the data.
regular lattice data, in which no clusters In what follows, I explore the properties of
could appear. There are a number of five different weighting schemes, which can
weighting schemes that can be used for be characterized as discrete or continuous.
this type of data; the analyst must be A discrete weighting scheme will have a
skillful in choosing the weighting scheme non-normalized weight matrix consisting of
that best represents the spatial interactions in ones and zeros, with the ones indicating
the data. the interactions. In the continuous weighting
Since the coordinates of the data points schemes, the cells will consist of numbers
are known, the distances separating each which indicate the strength of the interac-
pair of observations can be calculated. These tions. Each of these weighting schemes has
distances can be stored in an N N distance a parameter, the value of which must either
matrix. The distance matrix for the example be determined by the researcher or estimated.
data is shown in Table 8.5. All of the The weighting schemes are summarized in
weighting schemes discussed in this chapter Table 8.6 and described below. Note that
will be functions of separation distance. in most of the presented matrices, the
Table 8.5 Distance matrix for irregularly located point example data
0.00 1.00 2.00 6.00 7.00 7.07 7.75 7.11 5.50 12.04
1.00 0.00 3.00 7.00 7.07 7.00 7.81 7.35 5.59 12.73
2.00 3.00 0.00 4.00 7.28 7.62 8.00 7.04 5.85 10.82
6.00 7.00 4.00 0.00 9.22 9.90 9.80 8.46 8.14 9.22
7.00 7.07 7.28 9.22 0.00 1.00 0.75 1.25 1.50 8.25
7.07 7.00 7.62 9.90 1.00 0.00 1.25 2.25 1.80 9.22
7.75 7.81 8.00 9.80 0.75 1.25 0.00 1.46 2.25 8.10
7.11 7.35 7.04 8.46 1.25 2.25 1.46 0.00 1.95 7.04
5.50 5.59 5.85 8.14 1.50 1.80 2.25 1.95 0.00 8.73
12.04 12.73 10.82 9.22 8.25 9.22 8.10 7.04 8.73 0.00
Table 8.6 Weighting schemes

Scheme Type Parameter
Nearest neighbors Discrete Number of neighbors (NN )
Limit Discrete Distance limit (L)
Pace and Gilleys nearest neighbors Continuous Exponent (), Maximum number of neighbors (k )
Inverse distance Continuous Exponent (P )
Negative exponential Continuous Denominator (A)
elements representing the pairs in the two true (observation 2 is 1s nearest neighbor).
clusters will be shaded. If the text refers Also note that this weighting scheme gives
to specific cells, these will be highlighted observation 4 the same relationship with
instead. observation 3 that 3 has with 1, even though
3 is much closer to 1 than 4 is to 3. For one
nearest neighbor, the unstandardized weight
8.5.1. Nearest neighbors matrix will have one 1 in each row; in
general, the number of ones per row will be
A nearest neighbor weight matrix is defined equal to NN.
so that: Figure 8.17 shows the correlation matrix
for 1 nearest neighbor. The two clusters show
Wij = 1 if j is is nearest neighbor up clearly. Note that observation 10 appears
to be part of Cluster 2, even though it
= 0 otherwise is distant from the other points in that
cluster. Figure 8.18 provides a closer look at
Cluster 2.
A nearest neighbor is the observation that Let a doublet be a pair such that each is
is the closest to observation i. Nearest the others nearest neighbor, and a singlet be
neighbors can be generalized to include any a pair in which one member is the others
number of neighbors. For example, if the nearest neighbor, but not the reverse. The
number of nearest neighbors is set to five, only doublet shown in Figure 8.18 is the pair
then the non-normalized W will have five (5, 7), and this pair has the highest correlation
ones in each row, indicating the five closest shown, at 0.92. Observations 6, 8 and 9 have
observations to i. The number of neighbors only singlet connections to observation 5,
(NN) is the parameter of this weighting and their correlations are lower at 0.83.
scheme. Table 8.7 shows the weight matrix Observation 10 has a singlet connection to 8,
when NN is set to 1. but this correlation is even lower at 0.76. This
An examination of this table shows that is because 8 (the nearest neighbor) is less well
the weight matrix is not symmetric. For connected to the rest of the cluster than is
example, (3, 1) = 1 but (1, 3) = 0. This point 5.
is because observation 1 is observation 3s It may seem that there is a contradiction
nearest neighbor, but the reverse is not between the correlation patterns in the
Table 8.7 Weight matrix for example data with one nearest neighbor
0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00
SPATIAL WEIGHTS 141
0
1
2
3
4
Legend:
5
0.8 r
6
0.6 r < 0.8

7
8
0.4 r < 0.6

9
0.2 r < 0.4

10
r < 0.2
11
0 1 2 3 4 5 6 7 8 9 10 11
Figure 8.17 Correlation matrix for one nearest neighbor ( = 0.67).
.76 6
.68 .76
.83
.68
.83 .92
9 5 7
.83 .76
.68
.58
.63
.52
.76 Legend:
Doublet
Singlet
Not neighbors
10
Figure 8.18 Correlations between selected points for cluster two, one nearest neighbor.
regular and irregular lattice cases: greater

8.5.2. Two nearest neighbors
connectedness causes lower correlations
in the regular lattice case and higher Table 8.8 shows the standardized weight
correlations here. The apparent contradiction matrix and Figure 8.19 the symbolic cor-
is caused by differences in what is being held relation matrix for two nearest neighbors.
constant in each case. For the regular lattice, Increasing the number of neighbors to two
the number of neighbors can vary but the makes the weight matrix more symmetric in
relationships between the data points is fixed. the clusters, which means that there will be
For the nearest neighbors weighting scheme more doublets. Since there are more doublets,
with irregularly spaced data, the number of the impact of the most central points (1 and 5)
neighbors is fixed, but the spatial relation- is reduced. This has the effect of making the
ships can change. For the regular lattice, clusters appear more compact, as shown in
more neighbors means more influences and Figure 8.19.
therefore lower correlations. Here, having Figure 8.20 explores Cluster 2 more
a central neighbor (as indicated by a high closely. Since there are two neighbors, as
correlation) causes point i to be more central opposed to one, Figure 8.20 shows more
as well. doublets and singlets than Figure 8.18.
With nearest neighbors, points can only The general rules for the magnitudes of
be related through nearest neighbor pairs the correlations are that: (a) doublets have
(whether doublet or singlet). Thus a move higher correlations than singlets and (b) the
here is a step along a path connecting nearest more connected the partners, the higher the
neighbor pairs. As in the regular lattice correlations (this rule holds for both doublets
case, as the number of moves increases, and singlets). For example, 8,10 is smaller
the correlations fall. Observation 9 has a than 8,5 because point 8 is better connected
correlation of 0.52 with point 10. This than point 10.
is because 6 is related to 10 through The pairs shown by the dotted lines are
observation 5 (path: 9, 5, 8, 10). Observation not nearest neighbor pairs, and therefore are
7 has a slightly higher correlation with 10 reached by steps along a path. As before,
(0.58) even though the same number of if the path goes through a doublet, the
moves are involved, because the path to correlations are higher than if it goes through
10 goes through the doublet (5, 7) (path: a singlet. Also, the larger the number of steps,
7, 5, 8, 10). the lower the correlation.
Table 8.8 Standardized weight matrix for two nearest neighbors

0.00 0.50 0.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.50 0.00 0.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.50 0.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.50 0.00 0.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.50 0.50 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.50 0.00 0.50 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.50 0.50 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.50 0.00 0.50 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.50 0.50 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.50 0.50 0.00 0.00
SPATIAL WEIGHTS 143
Number of neighbors = 2
0
1
2
3
Legend:
4
0.8 r
5
0.6 r < 0.8

6
7
0.4 r < 0.6

8
0.2 r < 0.4

9
r < 0.2
11 10
0 1 2 3 4 5 6 7 8 9 10 11
Figure 8.19 Correlation matrix for two nearest neighbors ( = 0.67).
.67 6
.84
.73
.84
.67
.73 .84
9 5 7
.73
.73
.56
.69
.62
.48 .67 Legend:
Doublet
Singlet
Not neighbors
10
Figure 8.20 Correlations between selected points for cluster two, two nearest neighbors.
Figure 8.22, which illustrates the correla-

8.5.3. Three nearest neighbors tions for Cluster 2, shows that pair (5, 7) has
Table 8.9 shows the standardized weight regained its dominant position, having the
matrix for this case and Figure 8.21 shows the highest correlation at 0.74. This pair is the
correlation matrix. Because there are more most connected because all of its connections
doublets and singlets, more pairs become are doublets. Even though this pair has the
well connected or central. However, the highest correlation, it had a much higher
influence of any individual connection is correlation in the one nearest neighbor case
diminished, since there are more connections. (0.92). This is because there are many more
Thus the clusters look more diffused and the doublets here, so each has a smaller impact.
two clusters begin to affect each other, as Comparing Figure 8.22 to Figures 8.18
shown in Figure 8.21. and 8.20 shows that most of the dotted lines
Table 8.9 Standardized weight matrix for three nearest neighbors

0.00 0.33 0.33 0.00 0.00 0.00 0.00 0.00 0.33 0.00
0.33 0.00 0.33 0.00 0.00 0.00 0.00 0.00 0.33 0.00
0.33 0.33 0.00 0.33 0.00 0.00 0.00 0.00 0.00 0.00
0.33 0.33 0.33 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.33 0.33 0.33 0.00 0.00
0.00 0.00 0.00 0.00 0.33 0.00 0.33 0.00 0.33 0.00
0.00 0.00 0.00 0.00 0.33 0.33 0.00 0.33 0.00 0.00
0.00 0.00 0.00 0.00 0.33 0.00 0.33 0.00 0.33 0.00
0.00 0.00 0.00 0.00 0.33 0.33 0.00 0.33 0.00 0.00
0.00 0.00 0.00 0.00 0.33 0.00 0.33 0.33 0.00 0.00
0
1
2
3
4
5
6
7
8
9
10
11
0 1 2 3 4 5 6 7 8 9 10 11
Figure 8.21 Correlation matrix for three nearest neighbors.

SPATIAL WEIGHTS 145
.59 6
.69 .71
.72 .60
.66 .74
9 5 7
.72
.71
.69
.63
.63
.50 Legend:
.62
Doublet
Singlet
Not neighbors
10
Figure 8.22 Correlations between selected points for cluster two, three nearest neighbors.
have been replaced by solid and dashed lines, separation distance pairs are not related to
because there are more neighbors. Thus, most each other.
of the points in this cluster are related, which Increasing the number of neighbors to two
leads to the diffused pattern of correlations reduces the size of the large correlations,
shown in Figure 8.21. and hence diminishes the diminution with
separation distance. The same pattern of large
correlations interspersed with zeros persists.
Finally, when the number of neighbors is
8.5.4. Correlograms for nearest
increased to three, all points are related at
neighbors
least weakly (keep in mind that there are
The correlograms for nearest neighbors are only 10 observations). The upper end of the
shown in Figure 8.23. Examination of strong (greater than 0.5) correlations has been
this figure shows that the correlations do reduced further. There is still a separation
not decline monotonically with separation distance range in which strong correlations
distance. For one nearest neighbor, although are interspersed with weak ones, and so there
there is a slight diminution with distance, is no monotonic relationship between sep-
the basic pattern is that the correlations aration distance and correlation. It remains
are either very strong or zero. Further- true, however, that the strongest correlations
more, the strong correlations are interspersed are associated with the smallest separation
with the zeros. This means that some distances and the largest separation distances
pairs can be highly correlated, while others, have the smallest correlations.
that are closer together, are not. However, All of these results show that the number of
all of the small separation distance pairs neighbors is an important parameter for this
are highly correlated and all of the large weighting scheme; the number of neighbors
One nearest neighbor

1.0
0.2 0.0 0.2 0.4 0.6 0.8
Correlation
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Separation distance
Three nearest neighbors

1.0
0.2 0.0 0.2 0.4 0.6 0.8
Correlation
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Separation distance
Figure 8.23 Nearest neighbor correlograms, = 0.67.
has a sizeable impact on the weight matrix where N(k) is an N N matrix such that:
and the associated correlation matrix. It is
traditional for researchers to pick the number
of neighbors a priori. These results show that N (k)ij =1 if j is is kth nearest neighbor
this should be done with care.
=0 otherwise,
8.5.5. Pace and Gilleys continuous

is a parameter to be estimated, and NN
version of nearest neighbors
is chosen by the researcher. This model was
(P&G)
developed to finesse the fact that the number
In this model, described in pace and Gilley of neighbors is generally chosen by the
(1998), the unstandardized weight matrix is researcher, rather than estimated. As the value
given by: of increases, the influence of more distant
neighbors increases. Thus, if the researcher

NN does not know the number of neighbors, he
W= k N(k) can pick a number k (which is generally
k=1 larger than the probable number of neighbors)
SPATIAL WEIGHTS 147
and then estimate to find the optimal all of the neighbors are assigned equal
degree of influence.1 In the example, NN is weight. In the P&G weighting scheme, the
set to 5. first nearest neighbor always has the greatest
This weighting scheme falls into the weight. As the value of increases, the more
continuous category, because the unstan- distant neighbors are given greater weight,
dardized weight matrix does not consist of but the weights are always less than for
ones and zeros, as in nearest neighbors. the first nearest neighbor. For example, in
The standardized weight matrices are shown Table 8.9 (three nearest neighbors) W has
in Tables 8.10 and 8.11, for = 0.1 and 0.33 in the second, third and ninth elements
= 0.5. These weight matrices differ from of the first row, while in Table 8.11 ( = .3)
the nearest neighbor weight matrices, partic- the corresponding elements are 0.52, 0.26,
ularly as becomes larger. For example, a and 0.13.
comparison of Tables 8.7 and 8.10 shows Figure 8.24 shows the correlation matrices
that = 0.1 produces a weight matrix that for = 0.1 and 0.5. Examination of this
is similar to that for one nearest neighbor. figure shows that = 0.1 corresponds well
However, comparing Tables 8.9 and 8.11 to one nearest neighbor. Not surprisingly
shows that the = 0.5 case is quite different however, = 0.5 differs from three nearest
from three nearest neighbors. This is because, neighbors in that there is less bleeding of the
in the nearest neighbors weighting scheme, clusters with P&G.
Table 8.10 Standardized weight matrix for = 0.1

0.00 0.90 0.09 0.00 0.00 0.00 0.00 0.00 0.01 0.00
0.90 0.00 0.09 0.00 0.00 0.00 0.00 0.00 0.01 0.00
0.90 0.09 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00
0.09 0.01 0.90 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.09 0.90 0.01 0.00 0.00
0.00 0.00 0.00 0.00 0.90 0.00 0.09 0.00 0.01 0.00
0.00 0.00 0.00 0.00 0.90 0.09 0.00 0.01 0.00 0.00
0.00 0.00 0.00 0.00 0.90 0.00 0.09 0.00 0.01 0.00
0.00 0.00 0.00 0.00 0.90 0.09 0.00 0.01 0.00 0.00
0.00 0.00 0.00 0.00 0.01 0.00 0.09 0.90 0.00 0.00
Table 8.11 Standardized weight matrix for = 0.5

0.00 0.52 0.26 0.06 0.03 0.00 0.00 0.00 0.13 0.00
0.52 0.00 0.26 0.06 0.00 0.03 0.00 0.00 0.13 0.00
0.52 0.26 0.00 0.13 0.00 0.00 0.00 0.03 0.06 0.00
0.26 0.13 0.52 0.00 0.00 0.00 0.00 0.03 0.06 0.00
0.03 0.00 0.00 0.00 0.00 0.26 0.52 0.13 0.06 0.00
0.00 0.03 0.00 0.00 0.52 0.00 0.26 0.06 0.13 0.00
0.03 0.00 0.00 0.00 0.52 0.26 0.00 0.13 0.06 0.00
0.00 0.00 0.03 0.00 0.52 0.06 0.26 0.00 0.13 0.00
0.03 0.00 0.00 0.00 0.52 0.26 0.06 0.13 0.00 0.00
0.00 0.00 0.00 0.03 0.13 0.00 0.26 0.52 0.06 0.00
1 0 Alpha = 0.1 Alpha = 0.5
1 0
11 10 9 8 7 6 5 4 3 2
11 10 9 8 7 6 5 4 3 2
0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 11
Figure 8.24 Correlation matrices for Pace and Gilley model.
Figure 8.25 shows the correlograms for neighbors model. That is, it changes the
the P&G model. These should be compared weighting on the neighbors so that more
to the nearest neighbors correlograms in distant (in terms of neighbors) neighbors
Figure 8.23. Not surprisingly, the first panels have less weight. Further, the rate of decline
of these two figures agree quite closely, while of the weight is estimated. Thus it should
the second panels do not. In the P&G model, be considered to be an important weighting
the attenuation of the strong correlations scheme in its own right. The features of this
with separation is more pronounced than in weighting scheme may make sense in some
nearest neighbors. Additionally, the range of situations. For example, if the data looks like
the strong correlations does not become as cluster one, then it makes sense to weight the
compressed when increases as it does when third neighbor less than the second. However,
the number of neighbors increases. Finally, if the data looks like cluster two, it makes
comparing the two panels in Figure 8.25, less sense. Note also that the researcher must
the bleeding of the clusters is shown by choose the maximum number of neighbors,
the zero correlations in the first panel ( = NN. As in all choices of parameters, the
0.1) becoming positive in the second panel researcher must use his judgment as to what
( = 0.5). is best.
8.6.1. Limit models

8.6. DISCUSSION Limit models use a weighting scheme such
that:
Paces model is sufficiently different from
nearest neighbors that I do not believe it
should be considered as a replacement with Wij = 1 if Dij L
an estimable parameter. Rather, I believe that
it adds an additional feature to the nearest =0 otherwise
SPATIAL WEIGHTS 149
1.0
0.8
0.6 Alpha = 0.1
Correlation
0.4
0.2
0.0
0.2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Separation distance
Alpha = 0.5
1.0
0.8
0.6
Correlation
0.4
0.2
0.0
0.2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Separation distance
Figure 8.25 Correlograms for Pace and Gilley model.
where L is the distance limit. The parameter Tables 8.12 and 8.13 show standardized
L is usually chosen by the researcher and weight matrices for distance limits of 1 and 3,
its value can have a profound effect on respectively. When L = 1, W is very
both the weight and correlation matrices. sparse, because there are very few pairs
The unstandardized version of W is sym- with separation distance less than one. The
metric because Dij = Dji . However, the asymmetry is illustrated by the shaded
standardized W is usually not symmetric cells in Table 8.12. Observation 5 has two
because the number of points within the other points located within 1 distance unit
distance limit will vary by observation. This (points 6 and 7), and so these weights
is a discrete weighting scheme because the are standardized to 0.5. However, points
unstandardized W consists entirely of ones 6 and 7 only have one neighbor each
and zeros. (point 5), and so the weight for point 5 in
Table 8.12 Standardized weight matrix, distance limit = 1

0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.50 0.50 0.00 0.00 0.00
0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Table 8.13 Standardized weight matrix, distance limit = 3

0.00 0.50 0.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.50 0.00 0.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.50 0.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.25 0.25 0.25 0.25 0.00
0.00 0.00 0.00 0.00 0.25 0.00 0.25 0.25 0.25 0.00
0.00 0.00 0.00 0.00 0.25 0.25 0.00 0.25 0.25 0.00
0.00 0.00 0.00 0.00 0.25 0.25 0.25 0.00 0.25 0.00
0.00 0.00 0.00 0.00 0.25 0.25 0.25 0.25 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
rows 6 and 7 is 1. Table 8.13 shows that, data, the correlation matrix does not resemble
when the distance limit is increased to 3, those of the previously discussed weighting
the weight matrix becomes symmetric. This schemes. Also note that L is not required to
distance limit reveals the two clusters, putting be an integer.
observations 1 through 3 in Cluster 1 and Figure 8.27 shows the correlograms for
5 through 9 in Cluster 2. Points 4 and 10 L = 1 and L = 3. For the small distance
have no neighbors; these rows contain only limits shown here, there is no intermingling
zeros. (with respect to distance) of correlated and
Figure 8.26 shows correlation matrices for uncorrelated points, as in nearest neighbors.
the Limit Model, for L = 1 and L = 3. Pairs are either correlated or not, and when
When L = 1, the correlation matrix is very they are, the correlation is high. The strictly
sparse, and the correlations that are present positive correlations end at the distance
are very high. Observations 1 and 5 are very limit. However, when the distance limit
dominant. The two clusters are apparent, but is larger (not shown), there is a range
include too few observations. Expanding the of separation distances in which positive
distance limit to 3 reveals the two clusters and zero correlations are interspersed. This
more accurately, although observations 4 and range occurs beyond the distance limit
10 remain excluded from either. Although and shows the neighbors of neighbors
L = 3 seems to make the most sense for this effect.
SPATIAL WEIGHTS 151
Limit = 1 Limit = 3
1 0
1 0
11 10 9 8 7 6 5 4 3 2
0 1 2 3 4 5 6 7 8 9 10 11 11 10 9 8 7 6 5 4 3 2
0 1 2 3 4 5 6 7 8 9 10 11
Figure 8.26 Correlation matrices for limit model.
Distance limit = 1
0.6 0.8 1.0
Correlation
0.2 0.0 0.2 0.4
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Separation distance
Distance limit = 3
0.6 0.8 1.0
Correlation
0.2 0.0 0.2 0.4
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Separation distance
Figure 8.27 Correlograms for limit model.

8.6.2. Inverse distance When P = 1 (Table 8.14), the weights in

the clusters are relatively low. For example,
In this weighting scheme, the weights are
the weights associated with point 1 (the most
inversely related to separation distance as
central point in Cluster 1) are all less than 0.5,
shown below:
and the weights associated with point 5 are
all less than 0.4. Additionally, the weights for
1 pairs outside the clusters are relatively large,
Wij =
DijP reaching values as high as 0.11.
When P = 3 the in-cluster weights are
very large, while the out of cluster weights
where the exponent P is a parameter that is are close to zero. For example, the weights
usually set by the researcher. This weighting associated with point 1 are as high as 0.95
scheme falls into the continuous category (since these weight matrices are standardized,
because the unstandardized weights are all weights lie between 0 and 1). The weights
between 1 and 0 (inclusive), rather than being associated with point 5 are as large as 0.72.
restricted to 1 or 0. Points 4 and 10 have relatively large weights.
Tables 8.14 and 8.15 show the standardized Figure 8.28 shows the correlation matrices
weight matrices for P = 1 and P = 3. for P = 1 and P = 3. When P = 0.5
Table 8.14 Standardized weight matrix P = 1

0.00 0.40 0.20 0.07 0.06 0.06 0.05 0.06 0.07 0.03
0.44 0.00 0.15 0.06 0.06 0.06 0.06 0.06 0.08 0.03
0.27 0.18 0.00 0.13 0.07 0.07 0.07 0.08 0.09 0.05
0.14 0.12 0.20 0.00 0.09 0.08 0.08 0.10 0.10 0.09
0.03 0.03 0.03 0.02 0.00 0.22 0.30 0.18 0.15 0.03
0.04 0.04 0.04 0.03 0.29 0.00 0.23 0.13 0.16 0.03
0.03 0.03 0.03 0.03 0.34 0.21 0.00 0.18 0.11 0.03
0.05 0.04 0.05 0.04 0.26 0.14 0.22 0.00 0.16 0.05
0.06 0.06 0.06 0.04 0.23 0.19 0.15 0.17 0.00 0.04
0.09 0.08 0.10 0.11 0.12 0.11 0.13 0.15 0.12 0.00
Table 8.15 Standardized weight matrix P = 3

0.00 0.87 0.11 0.00 0.00 0.00 0.00 0.00 0.01 0.00
0.95 0.00 0.04 0.00 0.00 0.00 0.00 0.00 0.01 0.00
0.65 0.19 0.00 0.08 0.01 0.01 0.01 0.01 0.03 0.00
0.15 0.09 0.50 0.00 0.04 0.03 0.03 0.05 0.06 0.04
0.00 0.00 0.00 0.00 0.00 0.24 0.57 0.12 0.07 0.00
0.00 0.00 0.00 0.00 0.56 0.00 0.29 0.05 0.10 0.00
0.00 0.00 0.00 0.00 0.72 0.16 0.00 0.10 0.03 0.00
0.00 0.00 0.00 0.00 0.48 0.08 0.30 0.00 0.13 0.00
0.01 0.01 0.01 0.00 0.42 0.24 0.12 0.19 0.00 0.00
0.05 0.04 0.06 0.10 0.14 0.10 0.15 0.23 0.12 0.00
SPATIAL WEIGHTS 153
3 2 1 0 Exponent = 1 Exponent = 3
3 2 1 0
11 10 9 8 7 6 5 4
11 10 9 8 7 6 5 4
0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 11
Figure 8.28 Correlation matrices for inverse distance model.
(not shown), all points are correlated with all Tables 8.16 and 8.17 show the standardized
other points, and the correlations are roughly weight matrices for A = 0.5 and A = 2.
the same. When P = 1, we begin to see some When A = 0.5, the weights within the
stronger correlations associated with points clusters are reasonably large, and the largest
1 and 5, but non-zero correlations still exist weights are associated with points 1 and 5.
between all pairs. When P = 3, the clusters The weights for pairs outside of the clusters
appear very clearly, and points 1 and 5 appear are all zero, except for point 10. When A = 2,
to be influential. Note that points 4 and 10 are the weights associated with points 1 and 5
always included in the clusters. get smaller, but there is an indeterminate
Figure 8.29 shows the correlograms for effect on the weights on the other pairs in
P = 1 and P = 3. Examination of this the cluster: some get smaller and some get
figure shows that the correlations decline larger. The weights on the pairs outside of
monotonically when P = 1. At P = 3, an the cluster become larger.
intermixing of large and small correlations Figure 8.30 shows the correlation matrices
occurs when separation distance is in the for A = 0.5 and A = 2. When A =
range of 5 to 9. 0.25 (not shown), the clusters are clearly
indicated, and correlations associated with
points 1 and 5 are very large, indicating
8.6.3. Negative exponential model their centrality. When A is increased to 0.5,
point 5 loses some of its centrality, remaining
This is another continuous weighting scheme. highly correlated only with point 7. When A
Here the weights decline exponentially with is set to 1 (not shown), point 5 appears no
separation distance. different from the other points in Cluster 2,
and the centrality of point 1 becomes weaker.
Wij = exp (Dij /A) Finally, when A = 2, all of the points become
correlated, although the highest correlations
remain in the clusters. Note that point 10 is
where A is a parameter that is commonly always included in Cluster 2, regardless of
chosen by the researcher. the value of A.
P=1
0.8 1.0
0.2 0.0 0.2 0.4 0.6
Correlation
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Separation distance
P=3
0.8 1.0
0.2 0.0 0.2 0.4 0.6
Correlation
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Separation distance
Figure 8.29 Correlograms for inverse distance model.
Table 8.16 Standardized weight matrix for negative exponential model, A = 0.5
0.00 0.88 0.12 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.98 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.87 0.12 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.00
0.02 0.00 0.98 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.28 0.46 0.17 0.10 0.00
0.00 0.00 0.00 0.00 0.53 0.00 0.32 0.04 0.11 0.00
0.00 0.00 0.00 0.00 0.60 0.22 0.00 0.15 0.03 0.00
0.00 0.00 0.00 0.00 0.49 0.07 0.32 0.00 0.12 0.00
0.00 0.00 0.00 0.00 0.46 0.25 0.10 0.19 0.00 0.00
0.00 0.00 0.00 0.01 0.07 0.01 0.10 0.79 0.03 0.00
Figure 8.31 shows correlograms for with zero correlations, and zero correlations
A = 0.5 and A = 2. When A = 0.5, the at separation distances greater than 9. When
pattern is familiar: high correlations at small A = 2, previously strong correlations are
separation distances, a range between 5 and reduced somewhat, and the previously zero
9 where strong correlations are interspersed correlations become stronger.
SPATIAL WEIGHTS 155
Table 8.17 Standardized weight matrix for negative exponential model, A = 2

0.00 0.51 0.31 0.04 0.03 0.02 0.02 0.02 0.05 0.00
0.59 0.00 0.22 0.03 0.03 0.03 0.02 0.02 0.06 0.00
0.42 0.25 0.00 0.15 0.03 0.03 0.02 0.03 0.06 0.01
0.18 0.11 0.48 0.00 0.04 0.03 0.03 0.05 0.06 0.04
0.01 0.01 0.01 0.00 0.00 0.25 0.28 0.22 0.20 0.01
0.01 0.02 0.01 0.00 0.31 0.00 0.27 0.16 0.21 0.01
0.01 0.01 0.01 0.00 0.33 0.25 0.00 0.23 0.15 0.01
0.02 0.01 0.02 0.01 0.29 0.18 0.26 0.00 0.20 0.02
0.04 0.03 0.03 0.01 0.26 0.23 0.18 0.21 0.00 0.01
0.02 0.02 0.04 0.10 0.15 0.10 0.17 0.28 0.12 0.00
A = 0.5 A = 0.2
1 0
1 0
5 4 3 2
5 4 3 2
8 7 6
8 7 6
11 10 9
11 10 9
0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 11
Figure 8.30 Correlation matrices for negative exponential model, selected values of A.
8.7. IRREGULARLY LOCATED AREAS where i(j) is the proportion of the perimeter
of area i that is shared with area j, and a and
All of the weighting schemes described in b are parameters. Dacey (1968) suggested
the previous section can be used for areas, taking the relative size of each area into
if they are applied to the centroids of the consideration, and proposed the following
regions. Other weighting schemes have been weights:
suggested for areas. For example, Cliff and
Ord (1981) suggest using weights based on
centriod separation distance and the length Wij = dij i i(j)
of the shared boundary.
where dij is one if the areas are contiguous
and zero otherwise, and i is the fraction of
i(j)
b
Wij = the study area that is contained in area i.
Dija Many other weighting schemes have been
A = 0.5
0.8 1.0
0.6
Correlation
0.4
0.2 0.0 0.2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Separation distance
A = 0.2
0.8 1.0
0.6
Correlation
0.4
0.2 0.0 0.2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Separation distance
Figure 8.31 Correlograms for negative exponential model.
proposed. These will not be explored further means that both the family and parameter
in this chapter. value are known by the researcher. While
this makes the estimation of other parameters
easier, it is not very satisfying, since it
8.8. DISCUSSION implies that the researcher knows a great
deal about the spatial interactions in the
The weight matrix is a powerful tool for data. Furthermore, the remaining parameter
representing spatial relationships. There are estimates can be biased, since they will
many choices for the form that this matrix be conditional upon the specification of W .
can take; only a few have been described Given the impact of the choice of family and
in this chapter. The researcher will always parameters upon the analysis, it is incumbent
have to specify a family of schemes (e.g., upon the researcher to choose carefully.
nearest neighbor, limit) and will often have Estimation of W is appealing, although
to choose at least one parameter to complete difficult. Maximum likelihood methods can
the specification. Despite this, it is standard to be used, at the cost of assuming normality.
treat the weight matrix as exogenous, which Bhattacharjee and Jensen-Butler (2006) have
SPATIAL WEIGHTS 157
recently suggested an approach that is based Bhattacharjee, A. and Jensen-Butler, C. (2006).

on the eigenvalues and eigenvectors of the Estimation of spatial weights matrix in a spatial error
variance/covariance matrix estimated from a model, with an application to diffusion in housing
demand, CRIEFF Discussion Paper 0519. This paper
first stage OLS regression. Clearly, this is an can be downloaded from http://ideas.repec.org/p/
area for future research. For further reading san/crieff/0519.html
see Anselin (1988), Upton and Fingleton
Cliff, A.D. and Ord, J.K. (1981). Spatial Processes:
(1985), and Cliff and Ord (1973). Models and Applications. London: Pion.
Cliff, A.D. and Ord, J.K. (1973). Spatial Autocorrelation.
London: Pion
NOTE
Dacey, M. (1968). A Review of Measure of Contiguity
1 It should be pointed out that it is probably as for Two and K-Color Maps. In Spatial Analysis: A
easy to estimate the number of neighbors as it is to Reader in Statistical Geography, edited by B. Berry
estimate . and D. Marble, pp. 47995. Englewood Cliffs, N.J.:
Prentiss-Hall.
Pace, K. and Gilley, O. (1998). Generalizing the
REFERENCES OLS and grid estimator. Real Estate Economics,
pp. 331347.
Anselin, L. (1988). Spatial Econometrics: Method and Upton, G. and Fingleton, B. (1985). Spatial Data
Models. Dordrecht: Kluwer. Analysis by Example. New York: Wiley.
9
Geostatistics and Spatial
Interpolation
Peter M. Atkinson and Christopher D. Lloyd
9.1. INTRODUCTION below. Such techniques include those

for spatial prediction, spatial simulation,
This chapter is concerned with geostatistics, regularization and spatial optimization.
a set of techniques for the analysis of Commonly, the RF model is defined to be
spatial data (Journel and Huijbregts, 1978; stationary in the sense that the parameters
Goovaerts, 1997; Chils and Delfiner, of the model are invariant through space.
1999). Oliver and Webster (1990) and In this chapter, the focus of later discussion
Burrough and McDonnell (1998) are two is on non-stationarity of parameters through
accessible introductions to geostatistics, the space, in keeping with the local spatial
latter describing geostatistics within the analysis described in other chapters of
context of geographical information systems. this book.
Geostatistics has its origins in mining A RF Z(x) may be defined as a random
but geostatistical approaches have been variable (RV), that is, a stochastic process Z
applied in many other disciplines including that varies as a function of location x.
glaciology (Herzfeld and Holmlund, 1990), The process of rolling a six-sided die is
remote sensing (Curran and Atkinson, 1998) a commonly quoted example of a RV.
and archaeology (Lloyd and Atkinson, 2004). The die can take any one of six possible
Geostatistics is characterized by the common outcomes (an integer between 1 and 6)
dependence of its constituent techniques on where, for an unbiased die, each number
the random function (RF) model, described has an equal chance of 1/6 of being rolled.
These possible outcomes define the discrete 1997, and section 9.4.1 below). For the
distribution function of the die. Rolling the present purpose, a stationary mean provides
die leads to a particular outcome, called a basic starting point. A second important
a realization. For continuous variables, a restriction, which is not as restrictive as
continuous function (the probability density defining a stationary variance, is to define
function, pdf, or cumulative distribution a stationary spatial covariance function (rep-
function, cdf) replaces the discrete def- resenting second-order stationarity) or semi-
inition of the distribution function. The variogram (representing intrinsic stationarity,
cdf defines the probability of the outcome a weaker form of stationarity). Although
being less than a selected value (Goovaerts, much of the computation in geostatis-
1997). See Isaaks and Srivastava (1989) tics is based on the spatial covariance,
for a discussion of RVs in a geostatistical the equations are often written in terms
context. of semi-variograms and, thus, we shall
In defining a RF it is important to focus on the semi-variogram from this
consider how the RV will be allowed to point onwards.
vary through space x. One simple possibility The semi-variogram defines the relations
is to allocate to every position x in space between points and, thus, facilitates spatial
its own cdf, with each independent of all statistical inference. It is usually estimated
other cdfs. A problem is that this model from empirical data as a plot of half the
requires a large number of parameters; average squared difference between pairs
one set (e.g., mean and variance of the of values (the semivariance) against the
Gaussian model) for each possible location. vector separation or lag. Then a mathe-
Moreover, such a possibility is unlikely matical model is commonly fitted to the
to be realistic in practice; we know that empirical semi-variogram plot for use in
places close together tend to have similar geostatistical operations. Various methods
characteristics. Therefore, this model is too may be employed in the fitting, although
loosely controlled and does not make use of weighted least squares is a common basic
our practical knowledge of spatially varying starting point. Several important considera-
phenomena. For these reasons, we place tions should be taken into account during
some restrictions on the RF model. The model fitting (see McBratney and Webster,
most common set of restrictions are referred 1986). Once the parameters are estimated
to as stationarity constraints, meaning that (either with or without the uncertainty of
particular parameters are invariant with x. estimation accounted for) the RF is defined
In the strictest sense, the mean and variance and geostatistical operations can proceed.
parameters can be held constant for all Variogram estimation and model fitting are
locations x. However, under this model each described in section 9.2.
point is identical, independent distributed The mean and semi-variogram are, thus,
(iid), meaning that spatial inference is the parameters that define the RF model,
severely limited (we now have too tight a and that need to be estimated, effectively
control over the possibilities). replacing the mean and variance of the
In geostatistics, it is common to define a RV model. It should be pointed out that
stationary mean parameter. Various alterna- the variogram may itself be comprised of
tive models have been proposed in which the several further parameters. For example,
mean is allowed to vary through space. Such the spherical model is an example of a
a non-stationary mean parameter is generally transitive variogram model (i.e., for which
referred to as a trend (see Goovaerts, a positive finite maximum value is defined).
GEOSTATISTICS AND SPATIAL INTERPOLATION 161
The spherical model has two parameters; the In particular, the variance of the conditional cdf
sill c and the non-linear parameter a usually (ccdf) is likely to be less than that of the original
referred to as the range. The sill defines the cdf. In general terms, this means that the range
maximum value of semivariance while the of possible values for the unknown value is
range defines the lag at which the sill is restricted to be close to the neighbouring data by
an amount determined by the spatial proximity of
reached.
the prediction location to the neighbours. Such
Geostatistical operations include spatial
information can be used to extend the process
prediction, spatial simulation, regularization of spatial prediction (in which the mean of the
and spatial optimization. In spatial prediction posterior or conditional cdf is drawn) to spatial
or kriging, the objective is to predict the simulation (in which a value is drawn from the
value of z(x0 ) at some unobserved location x0 ccdf randomly).
given a sample of data z(xi ), i = 1, 2, . . ., n
usually defined on point supports (the space
on which each observation is defined) or Geostatistics, as described above, has
quasi-points. The RF model helps because been used widely to characterize spatial
it is useful to base the prediction of z(x0 ) variation (using the semi-variogram or other
on a model that captures our knowledge function) in relatively small data sets and
of the underlying processes or form. In to predict unobserved values using kriging
environmental science (in the broadest sense) informed by the modelled semi-variogram.
process knowledge is often limited and In such circumstances, the decision to
the RF model provides a useful stochastic adopt a stationary model of the mean
framework that builds on some general and semi-variogram makes sense. In fact,
principles. it is necessary for statistical inference.
The RF model is useful for several reasons, However, very large spatially-extensive and
but prime among them are: spatially-detailed data sets are increasingly
readily available. Examples include digital
elevation data and image-based data sets
1 the dependence of the prediction z (x0 ) on the provided primarily through remote sensing
data z (xi ), i = 1, 2, . . . , n is estimated by the
(Atkinson, 2005). Researchers and practi-
semi-variogram. In a general sense, the closer
tioners are increasingly overwhelmed by
z (x0 ) and a given data point the more similar the
the magnitudes of the datasets available
two values are likely to be. The semi-variogram
quanties this spatial dependence. Critically, for analysis. This has led to a realization
this means in a linear weighting of proximate that the biggest problem facing spatial
data to be used in spatial prediction the analysts today is one of data richness
weights can be determined automatically through rather than data sparcity. In these cir-
linear algebra. This process is referred to as cumstances, a stationary RF model is not
kriging. only inappropriate, but wasteful of data.
More suitable solutions can be found by
2 In kriging, the relations between the sample data allowing the previously stationary parame-
themselves are accounted for so that, at a given
ters to vary across the region of interest
separation, a cluster of data points will contribute
(Atkinson, 2001).
less to the prediction than a dispersed set (Journel
The present chapter provides an
and Huijbregts, 1978).
introduction to linear geostatistics, but with a
3 The cdf of the predicted value (i.e., the set of particular focus on models that include non-
possible values from which one realization is stationarity parameters, particularly of (a) the
drawn) can be conditioned on the sample data. mean and (b) the semi-variogram. The next
section describes the process of fitting the RF these weights. The most commonly used
model parameterized by a spatial covariance approach is based on the estimated semi-
or semi-variogram, while section 9.3 variogram. The experimental semi-variogram
describes geostatistical prediction (kriging). is estimated by calculating the squared
Section 9.4 considers non-stationary models differences between all the available paired
and section 9.5 discusses a range of observations and obtaining half the average
issues related to the use of geostatistics for all observations separated by a given
within GIS. lag (or within a lag tolerance where the
observations are not on a regular grid). So,
while the semi-variogram cloud provides
semivariances as a function of a set of
actual lags the experimental semi-variogram
9.2. CHARACTERIZING SPATIAL provides only a set of average semivariances
VARIATION at a set of discrete lags. Examination of
the semi-variogram cloud provides a means
9.2.1. Estimating the experimental
of identifying heterogeneities in spatial
semi-variogram
variation within a variable (Webster and
Much of the effort and time associated with Oliver, 2000) that are obscured through
geostatistical analysis is expended in analysis the summation over lags that occurs with
of the spatial structure of a variable. One the experimental semi-variogram. Therefore,
simple way of examining spatial structure is examination of the semi-variogram cloud is
through estimating the semi-variogram cloud. a sensible step prior to estimation of the
The semi-variogram cloud is a plot of the experimental semi-variogram.
semivariances for paired data against the The experimental semi-variogram, (h),
distances separating the paired data points can be estimated from p(h) paired observa-
in a given direction. The semivariance is tions, z(x ), z(x + h), = 1, 2, . . ., p(h)
half the squared difference between values using:
at two locations, and can be thought of
as a measure of dissimilarity. Thus, the
1
p(h)
semi-variogram cloud shows how dissimilar
2
(h) = z(x ) z(x + h)
paired data points are as a function of their 2 p(h)
=1
separation distance and direction (termed (9.1)
spatial lag, h). If data are spatially structured
then pairs separated by small lags will tend
to be less dissimilar than pairs separated by The semi-variogram can be estimated for dif-
large lags. ferent directions to enable the identification
A core idea in geostatistics is that the of directional variation (termed anisotropy).
spatial structure in a variable should be Where a variable is preferentially sampled
characterized and used for spatial prediction in areas with large or small values of the
and simulation. The objective of geostatis- property of interest, the histogram will be
tical prediction is to find optimal weights unrepresentative and often a declustering
to assign to observations located around algorithm is necessary to correct this. For
the prediction location. If information is example, values in areas or cells with more
available on how dissimilar two observations data may be given smaller weights than
are likely to be for a given lag then values in sparsely sampled areas (Deutsch
this information can be used to determine and Journel, 1998). Preferential sampling
of a variable also impacts on the form of a priori variance. The range, a, represents the
the experimental semi-variogram. Richmond scale of spatial variation (Atkinson and Tate,
(2002) shows that clustering can, in some 2000). For example, if a measured property
cases, alter drastically the form of the semi- varies markedly over small distances then the
variogram. Two methods of declustering for property can be said to exhibit short range
weighting paired data in estimation of the spatial variation.
experimental semi-variogram are given by Some of the most commonly used author-
Richmond (2002). ized models are detailed below. The nugget
In the presence of large-scale, low- effect model, defined above, is given by:
frequency variation (e.g., that would be fitted
well by a trend model), the form of the

semi-variogram will be affected. If the semi- 0 for h = 0
(h) = (9.2)
variogram increases more rapidly than a c0 for |h| > 0.
quadratic polynomial for large lags then a RF
which is non-stationary in the mean should
be adopted (Armstrong, 1998). This topic is Three of the most frequently used bounded
explored in greater depth in section 9.4.1. models are the spherical model, the expo-
nential model and the Gaussian model and
these are defined in turn. The spherical
model is perhaps the most widely used
9.2.2. Fitting a semi-variogram
semi-variogram model. Its form corresponds
model
closely with what is often observed in
A mathematical model may be fitted to many real world studies; almost linear
the experimental semi-variogram and the growth in semivariance with separation and
coefficients of this model can be used for then stabilization (Armstrong, 1998). It is
a range of geostatistical operations such as given by:
spatial prediction (kriging) and conditional
simulation. A model is usually selected from

one of a set of so-called authorized models. c[1.5(h/a)0.5(h/a)3 ] if h a
(h) =
McBratney and Webster (1986) provide a c if h > a
review of some of the most widely used (9.3)
authorized models. There are two principal
classes of semi-variogram model. Transitive
(bounded) models have a sill (finite variance), where c is the sill of the spherical model and
and indicate a second-order stationary pro- a is the non-linear parameter, known as the
cess. Unbounded models do not reach an range.
upper bound; they are intrinsically station- The exponential model is given by:
ary only (McBratney and Webster 1986).
Figure 9.1 shows the parameters of a bounded

semi-variogram model (the spherical model h
as defined below). The nugget effect, c0 , (h) = c 1 exp (9.4)
d
represents unresolved variation (a mixture
of spatial variation at a scale finer than
the sample spacing and measurement error). where d is the non-linear distance parameter.
The sill, c, represents the spatially correlated The exponential model reaches the sill
variation. The total sill, c0 + c, is the asymptotically and the practical range is 3d
Range (a)
Sill (c)
Total sill (c0 + c)
Nugget (c0)
Lag (h)
Figure 9.1 Bounded semi-variogram.
(i.e., the separation at which approximately where is a power 0 < < 2 with
95% of the sill is reached). a positive slope, m (Deutsch and Journel,
The Gaussian model is given by: 1998). The linear model is a special case of
the power model.
One of the advantages of kriging is
2 that it is often fairly straightforward to
h
(h) = c 1 exp 2 . (9.5) model anisotropic structure using the semi-
d
variogram. Two primary forms of anisotropy
have been outlined in the geostatistical
literature. If the sills for all directions are not
The Gaussian model does not reach a sill
significantly different and the same structural
at a finite
separation and the practical range components (for example, spherical or
is a 3 (Journel and Huijbregts, 1978).
Gaussian) are used then anisotropy can be
Semi-variograms with parabolic behaviour at
accounted for by a linear transformation of
the origin, as represented by the Gaussian
the co-ordinates: this is called geometric
model here, are indicative of very regular
or affine anisotropy (Webster and Oliver,
spatial variation (Journel and Huijbregts,
1990). Where the sill changes with direction
1978). Authorized models may be used in
but the range is similar for all directions
positive linear combination where a single
the anisotropy is called zonal (Isaaks and
model is insufficient to represent well the
Srivastava, 1989). However, the modelling of
form of the semi-variogram.
zonal anisotropy is much more problematic
Where the semi-variogram appears to
than the modelling of geometric anisotropy.
increase indefinitely with separation the most
In practice, a mixture of geometric and zonal
widely used model is the power model:
anisotropy has been found to be common
(Isaaks and Srivastava, 1989).
There are various approaches for fitting mod-
(h) = mh (9.6) els to semi-variograms. Some geostatisticians
prefer fitting semi-variogram models by used in combination in this way to model

eye on the grounds that it enables one to nested spatial structures. In Figure 9.3, the
use personal experience and to account for directional semi-variogram, estimated from
features or variation that may be difficult the same data, is shown. It indicates that
to quantify (Christakos, 1984; Journel and the scale of spatial variation is similar
Huijbregts, 1978). Weighted least squares in all directions while the magnitude of
(WLS) has been proposed as a suitable the variation (the semivariance) is clearly
means of fitting models to semi-variograms different for different directions.
(Cressie, 1985; Pardo-Igzquiza, 1999)
and the approach has been used by many
geostatisticians. The technique is preferred
to unweighted ordinary least squares (OLS) 9.3. SPATIAL PREDICTION AND
as in WLS the weights can be used to SIMULATION
reflect the uncertainty in the individual
semivariance estimates or the desire to 9.3.1. Ordinary kriging
fit at certain lags more accurately than at There are many varieties of kriging. Its sim-
others. For example, the weights are often plest form is called simple kriging (SK). To
selected to be proportional to the number use SK it is necessary to know the mean of the
of pairs at each lag (Cressie, 1985), such property of interest and this must be modelled
that lags with many pairs have greater as constant across the region of interest. In
influence in the fitting of the model. The practice, this model is often unsuitable. The
use of generalized least squares (GLS) has most widely used variant of kriging, ordinary
also been demonstrated in a geostatistical kriging (OK), allows the mean to vary
context (Cressie, 1985; McBratney and spatially: the mean is modelled as constant
Webster, 1986). Use of maximum likelihood within each prediction neighbourhood only.
(ML) estimation (McBratney and Webster, For each point to be predicted a new
1986) has become widespread amongst neighbourhood is defined and so effectively
geostatisticians and has been used for the mean is allowed to vary locally.
WLS. The goodness of fit of models to OK predictions are weighted averages
the semi-variogram, and of the relative of the n available data. The OK weights
improvement or otherwise in using different define the best linear unbiased predictor
numbers of parameters, may be compared (BLUP). The OK prediction, zOK (x0 ), is
through the examination of the sum of defined as:
squares of the residuals or through the
use of the Akaike Information Criterion
(McBratney and Webster, 1986; Webster and
n
McBratney, 1989). zOK (x0 ) = lOK

z(x ) (9.7)
=1
Figure 9.2 shows an experimental semi-
variogram estimated from precipitation data
acquired in Great Britain in January 1999.
with the constraint that the weights, lOK
, sum
The semi-variogram was estimated using
to 1 to ensure an unbiased prediction:
the Gstat software (Pebesma and Wesseling,
1998). The data are described by Lloyd
(2002, 2005). The semi-variogram was
n
fitted with a nugget and two spherical lOK
= 1. (9.8)
components. Authorized models are often =1
800
700
600
Semivariance (mm2)
500
400
300
200
100
Precipitation
50.34 Nug(0) + 229.939 Sph(14159.8) + 475.979 Sph(154817)
0
0 50000 100000 150000 200000
Lag (m)
Figure 9.2 Omnidirectional semi-variogram of precipitation.
1200 Semivariance
0 degrees
22.5 degrees
45 degrees
1000 67.5 degrees
90 degrees
112.5 degrees
135 degrees
Semivariance (mm2)
800 157.5 degrees
600
400
200
0
0 50000 100000 150000 200000
Lag (m)
Figure 9.3 Directional semi-variogram of precipitation.

So, the objective of the kriging system is where OK is a Lagrange muliplier. Knowing
to find appropriate weights by which the OK , the kriging variance, an estimator of the
available observations will be multiplied prediction variance of OK, can be given as:
before summing them to obtain the predicted
value. These weights are determined using
the coefficients of a model fitted to the semi-
n
OK
2
= lOK
(x x0 ) + OK . (9.12)
variogram (or another function such as the
=1
covariance function).
The kriging prediction error must have an
expected value of 0: The kriging variance is a measure of
confidence in predictions and is a function of
the form of the semi-variogram, the sample
E{ZOK (x0 ) Z(x0 )} = 0. (9.9) configuration and the sample support (Journel
and Huijbregts, 1978). The kriging variance
is not conditional on the data values locally
The kriging (or prediction) variance, OK
2 , is
and this has led some researchers to use alter-
expressed as: native approaches such as conditional simu-
lation (discussed in the next section) to build
models of spatial uncertainty (Goovaerts,
OK
2
(x0 ) = E[{ZOK (x0 ) Z(x0 )}2 ] 1997).
There are two varieties of OK: punctual

n
OK and block OK. With punctual OK the pre-
=2 lOK
(x x0 )
dictions cover the same area (the support, v)
=1
as the observations. In block OK, the

n
n predictions are made to a larger support than
lOK OK
l (x x ). the observations. With punctual OK the data
=1 =1 are honoured. That is, they are retained in
(9.10) the output map. Block OK predictions are
averages over areas (that is, the support has
increased). Thus, at x0 the prediction is not
That is, we seek the values of l1 , . . . , ln the same as an observation and does not need
(the weights) that minimize this expression to honour it.
with the constraint that the weights sum to The choice of semi-variogram model
one (equation (9.8)). This minimization is affects the kriging weights and, therefore,
achieved through Lagrange multipliers. The the predictions. However, if the form
conditions for the minimization are given by of two models is similar at the origin
the OK system comprising n + 1 equations of the semi-variogram then the two sets of
and n + 1 unknowns: results may be similar (Armstrong, 1998).
The choice of nugget effect may have
n marked implications for both the predictions
OK
l (x x ) + OK = (x x0 ) and the kriging variance. As the nugget
=1 effect is increased, the predictions become

= 1, . . . , n closer to the global average (Isaaks and
lOK
=1
Srivastava, 1989).
=1 A map of precipitation in Britain in
(9.11) January 1999 generated using OK is shown
in Figure 9.4. It was generated using the the expected values (i.e., the mean) but
semi-variogram model given in Figure 9.2 are values drawn randomly from the
and the 16 nearest neighbours to each grid conditional cdf: a function of the available
cell were used in the prediction process. observations and the modelled spatial
The map is very smooth in appearance; variation (Dungan, 1999). The simulation
this is a common feature of maps derived is considered conditional if the simulated
using OK. values honour the observations at their
locations (Deutsch and Journel, 1998).
As noted above, simulated realizations
represent a possible reality whereas kriging
9.3.2. Cokriging does not. Simulation allows the generation
of many different possible realizations that
Where a secondary variable (or variable)
may be used as a guide to potential errors
is available that is cross-correlated with
in the construction of a map (Journel, 1996)
the primary variable both variables may
and multiple realizations encapsulate the
be used simultaneously in prediction using
uncertainty in spatial prediction. Arguably,
cokriging. To apply cokriging, the semi-
the most widely used form of conditional
variograms (that is, auto semi-variograms)
simulation is sequential Gaussian simulation
of both variables and the cross semi-
(SGS). With sequential simulation, simulated
variogram (describing the spatial dependence
values are conditional on the original data
between the two variables) are required.
and previously simulated values (Deutsch
The operation of cokriging is based on
and Journel, 1998). In SGS the ccdfs
the linear model of coregionalization (see
are all assumed to be Gaussian. SGS is
Webster and Oliver, 2000). For cokriging to
discussed in detail in several texts (for
be beneficial, the secondary variable should
example, Goovaerts, 1997; Deutsch and
be cheaper to obtain or more readily available
Journel, 1998; Chils and Delfiner, 1999;
to make the most of the technique. If the
Deutsch, 2002).
variables are clearly linearly related then
cokriging may estimate more accurately than,
for example, OK.
9.4. NON-STATIONARY MODELS

9.3.3. Conditional simulation
This section discusses non-stationarity in the
Kriging predictions are weighted moving mean and the semi-variogram. Approaches
averages of the available sample data. for dealing with non-stationarity in the mean
Kriging is, therefore, a smoothing are well developed and are the subject of
interpolator. Conditional simulation (also section 9.4.1. There is a variety of methods
called stochastic imaging) is not subject for estimating the local semi-variogram
to the smoothing associated with kriging where the spatial structure in the property of
(conceptually, the variation lost by kriging interest varies from place to place. However,
due to smoothing is added back) as such approaches are less widely used than
predictions are drawn from equally probable methods that allow for non-stationarity in the
joint realizations of the RVs which make mean. Some approaches for estimating the
up a RF model (Deutsch and Journel, non-stationary semi-variogram are discussed
1998). That is, simulated values are not in section 9.4.2.
Precipitation (mm) N
value
High : 271
Low : 1
0 100 200 300

Kilometres
Figure 9.4 OK derived map of precipitation.

raw precipitation values and of residuals from

9.4.1. Non-stationary mean:
a first-order and a second-order polynomial
tting a trend and kriging
trend. In this case, the form of each of
with a trend model
the semi-variograms is similar although the
OK is robust but, in some cases, an even more variance decreases as a higher-order trend
general form of kriging may be appropriate. is removed. Another approach to obtaining
In cases where the mean of the variable the trend-free semi-variogram is to estimate
changes markedly over small distances a non- the semi-variogram for several directions and
stationary model of the mean may provide retain the semi-variogram for the direction
more accurate spatial prediction. While the that has least evidence of trend, that is,
mean varies from place to place with OK for which the variance is smallest. Figure 9.6
it does not vary within the search window. shows the semi-variogram of precipitation for
Several approaches exist that provide a non- the direction with the smallest variance.
stationary mean. The most widely used approach to
One approach is to fit a global polynomial prediction where the mean is non-stationary
trend model and estimate the semi-variogram is called kriging with a trend model (KT;
of the residuals. SK can then be used to sometimes termed universal kriging). In KT,
make predictions after which the trend can be the mean is modelled using a polynomial.
added back to the predicted values. Figure 9.5 The principal problem with KT is that the
shows the omnidirectional semi-variogram of underlying trend-free semi-variogram must
800
700
600
Semivariance (mm2)
500
400
300
200
Order 0
100
Order 1
Order 2
0
0 50000 100000 150000 200000
Lag (m)
Figure 9.5 Semi-variogram of precipitation: raw data (order 0) and residuals from a
polynomial trend of order 1 and 2.
700
Semivariance (mm2, dir. <x, y > 45 +/ 22.5)
600
500
400
300
200
100
Semivariance
82.16 Nug(0) + 229.658 Sph(15581.7) + 337.892 Sph(131143)
0
0 20000 40000 60000 80000 100000 120000 140000
Lag (m)
Figure 9.6 Semi-variogram of precipitation: direction with the smallest variance.
be estimated yet the local trend (or drift) is approaches that make use of secondary
estimated as a part of the KT procedure which variables that describe the shape of the mean
itself requires the semi-variogram. Various in the primary variable. If some variable is
approaches for estimating the trend-free available that is linearly related to the primary
semi-variogram are described in the literature variable and varies smoothly (i.e., there are
and two approaches are summarized above. no marked local changes in values) it could
Figure 9.7 shows the KT predictions made be used to inform spatial prediction of values
using 16 nearest neighbours with the semi- of the primary variable. Two such approaches
variogram model given in Figure 9.6; the are described below.
semi-variogram for the direction with the With SK, the mean is assumed to be
least evidence of trend. An alternative constant (there is no systematic change in
approach is Intrinsic Random Functions of the mean of the property across the region of
Order k kriging whereby the generalized study) and known. If the mean is not constant,
covariance is used in place of the semi- but we can estimate the mean at locations in
variogram (Chils and Delfiner, 1999). the domain of interest, then this locally vary-
ing mean can be used to inform prediction.
That is, the local mean can be estimated prior
to kriging. The locally-varying mean can
Making use of secondary variables: be estimated in various different ways. One
KED and SKlm approach, termed simple kriging with locally
As well as estimating the form of the trend varying means (SKlm), is to use regres-
from the variable of interest, there are various sion to estimate the value of the primary
Precipitation (mm) N
value
High : 408
Low : 0
0 100 200 300

Kilometres
Figure 9.7 KT derived map of precipitation.

variable at (a) all observation locations and estimating directional semi-variograms and
(b) all locations where SKlm predictions retaining the semi-variogram for the direction
will be made. The semi-variogram is then that showed least evidence of trend. That is,
estimated using the residuals from the temperature values systematically increase or
regression predictions at the data locations. decrease in one direction (there is a trend
SKlm is conducted using the residuals and in the values), but values of temperature
the trend is added back after the prediction are more constant in the perpendicular
process is complete (an example is given by direction. In such cases, the concern is
Lloyd, 2005). to characterize spatial variation in the
An alternative approach is kriging with direction for which values of temperature
an external drift model (KED). In KED, are homogeneous. Hudson and Wackernagel
the secondary data act as a shape func- (1994) assumed that the trend-free semi-
tion (the external trend) and the function variogram was isotropic and the semi-
describes the average shape of the primary variogram for the direction selected was used
variable (Wackernagel, 2003). The local for kriging.
mean of the primary variable is derived as
a part of the kriging procedure using the
secondary information and SK is carried
out on the residuals from the local mean.
9.4.2. Non-stationary
So, the approach differs from SKlm in
semi-variogram
that the local mean is estimated as part
of the kriging procedure and not before In cases where the semi-variogram does
it, as is the case with SKlm (Goovaerts, not represent well spatial variation across
1997). Lloyd (2002, 2005) illustrates the use the whole of the region of interest some
of KED in mapping monthly precipitation approach may be necessary to account
whereby elevation is used as the external for the change in spatial variation locally.
trend. In the geostatistical literature, there are
As noted above, a major problem with KT several approaches presented for estimation
and KED is that the underlying (trend-free) of non-stationary semi-variograms. These
semi-variogram is assumed known. That is, vary from approaches that estimate and
if the mean changes from place to place the model automatically the semi-variogram in a
semi-variogram estimated from the raw data moving window (this approach is discussed
will be biased, so it is necessary to remove the below) to approaches that transform the
local mean and estimate the semi-variogram data so that the transformed data have
of the residuals. Since the trend (that is, a stationary semi-variogram. Reviews of
local mean) is estimated as a part of the some methods are provided by Sampson
KED (and KT) system, which requires the et al. (2001) and Schabenberger and
semi-variogram model coefficients as inputs, Gotway (2005).
we are faced with a circular problem. The estimation and automated modelling
A potential solution is to infer the trend- of local semi-variograms for kriging is one
free semi-variogram from paired data that are published approach that accounts for non-
largely unaffected by any trend (Goovaerts, stationarity in the semi-variogram (Haas,
1997; Wackernagel, 2003). Hudson and 1990). This approach is employed here.
Wackernagel (1994), in an application The WLS semi-variogram model fitting
concerned with mapping mean monthly routine presented by Pardo-Igzquiza (1999)
temperature in Scotland, achieved this by was used to fit models to semi-variograms
estimated in a moving window. Fortran 77 9.5. DISCUSSION

code was written to visit each observation
in the precipitation dataset and estimate 9.5.1. Automatic tting of
the semi-variogram using the n nearest variogram models
neighbours to each observation. The routine
Fitting semi-variogram models automatically
of Pardo-Igzquiza (1999) was then used
is not straightforward. In Figure 9.9, five
to fit a model to each semi-variogram
local semi-variograms are selected and
automatically with the result that there were
illustrated. In most of the selected cases,
3037 (equal to the number of observa-
the model appears to fit the experimental
tions in the precipitation dataset) sets of
semi-variogram well. However, in one case
semi-variogram model coefficients. The WLS
(the second semi-variogram from the bottom)
routine allows the fitting of several different
the form of the semi-variogram is not
models, but in this case a spherical model
well represented by a (single) spherical
was fitted to all of the semi-variograms.
structure. This problem could be at least
No nugget effect was fitted as it proved
partially resolved by fitting several models
problematic to fit a nugget effect while at
and selecting the best fitting model. However,
the same time obtaining a feasible range
as the complexity of the model fitting process
parameter.
increases further problems can arise with
In the example presented, the semi-
automatic fitting. Generally, we have found
variograms were estimated using the 1000
that the use of simple constraints to guide the
nearest neighbours to each observation. The
fitting (e.g., nugget variance is constrained to
variogram bin size was 5000 m and the
lie within a sensible range of between zero
number of bins was 14. In Figure 9.8,
and some positive value less than half of the
the values of the spherical model sills are
total sill, for very smooth variation) leads
mapped. There is a clear trend in values
to acceptable results in the vast majority of
from the south (small semivariances) to
cases.
the north (large semivariances) of Britain.
This corresponds with expectations: the
magnitude of variability in precipitation is
greater in the north and west than in the south
9.5.2. Non-stationary
and east of Britain. In Figure 9.9 the ranges
semi-variograms and kriging
are shown. As for the sills, there is spatial
variation. In the south of Britain the range It is easy to see how the local semi-
values tend to be large while in the north variograms estimated in section 9.4 could
they tend to be smaller. This suggests that be used in kriging. The parameters of
precipitation amount varies less over short the local semi-variogram are inserted
distances in the south than it does in the north into the local kriging equations instead
of Britain. of the global parameters (Haas, 1990).
It is clear that the spatial structure of There is little restriction on the variant
precipitation in Britain varies spatially of kriging to which this non-stationary
and, as such, a global semi-variogram set of semi-variogram parameters can be
model does not represent variability across applied. For example, in recent years,
Britain well. Use of locally-estimated local semi-variogram parameters have been
and modelled semi-variograms may used in local space-time kriging (Gething
increase the accuracy of predictions using et al., 2007) and local semi-variogram
kriging. and cross semi-variogram parameters have
Str. comp. N
170328
329531
532776
7771044
10451193
0 100 200 300

Kilometres
Figure 9.8 Structured component of spherical model for a moving window.

1400
N
Range 1200
Semivariance (m2)
1136017396 1000
1739720373 800
2037423703 600
400
2370426995
200 Semivariance
2699630795 1168 Sph(18841)
0
0 10000 20000 30000 40000 50000 60000 70000
Lag (m)
1200
1000
Semivariance (m2)
800
600
400
200
Semivariance
973 Sph(21728)
0
0 10000 20000 30000 40000 50000 60000 70000
Lag (m)
700
600
Semivariance (m2)
500
400
300
200
100 Semivariance
571 Sph(16723)
0
0 10000 20000 30000 40000 50000 60000 70000
Lag (m)
200
Semivariance (m2)
150
100
50
Semivariance
172 Sph(21669)
0
0 10000 20000 30000 40000 50000 60000 70000
Lag (m)
200
150
Semivariance (m2)
0 100 200 300 100

Kilometres
50
Semivariance
180 Sph(28911)
0
0 10000 20000 30000 40000 50000 60000 70000
Lag (m)
Figure 9.9 Range of spherical model for a moving window, showing ve selected
semi-variograms with automatically tted models.
been used in local downscaling cokriging Lloyd et al. (2005) have used the local
(Pardo-Igzquiza and Atkinson, 2007). range to show that the choice of optimum
spatial resolution for a given scene itself
varies locally. In the multivariate case, local
variation in the parameters of the linear
model of co-regionalization contains more
9.5.3. The objective of
information than the parameters mapped
non-stationary modelling
through GWR. The latter omits information
Several other chapters of this book have been on the spatial correlation in each variable,
concerned with geographically weighted as well as the cross-correlation between
regression (GWR). The non-stationary variables.
approaches presented in this chapter differ One of the reasons that local modelling
from GWR in their objective. For GWR is so important for remotely sensed images
the objective is to explore the spatially is that remotely sensed scenes rarely lend
varying parameters of a local regression themselves to description using the RF model
model; spatial variation in the estimated directly. Often, scenes are comprised of sev-
parameters is the primary interest. For the eral objects arranged on a background (e.g.,
non-stationary mean and semi-variogram buildings in a rural area) or comprised of a
modelling presented in this chapter the mosaic of objects (e.g., an agricultural scene).
objective is spatial prediction or some other In such circumstances, it is unreasonable to
geostatistical operation. Thus, non-stationary expect the RF model parameterized with a
modelling will be useful where it leads to an global semi-variogram function to capture
increase in the precision of prediction and the full range of variability in the image
where it leads to an increase in the precision locally. Non-stationary variogram modelling
of the estimation of the prediction variance. achieved by fitting within a moving window
While the objective is spatial prediction, it goes some way to addressing this problem,
is often informative to map the non-stationary but probably not far enough. It would be
parameters (in the sense of GWR). For preferable to define the objects of interest
GWR, the coefficients inform on local and then fit the RF model locally within the
relations between variables. For geostatistics, boundaries of those objects. For example,
the non-stationary mean and (especially) Berberoglu et al. (2000) and Lloyd et al.
semi-variogram parameters inform on the (2004) estimated semi-variograms on a per-
nature of local spatial structure. For example, field basis; semi-variograms were estimated
the local sill c is very much related to the using values within pre-defined boundaries.
magnitude of variation locally. The local The semivariances were then used as inputs,
sill parameter is related mathematically to along with spectral values, to maximum
the local variance (LV), which itself has likelihood and artificial neural network
been used repeatedly as a texture mea- (ANN) classifiers.
sure in describing remotely sensed images
(e.g., Bocher and McCloy, 2006). The local
range parameter is related to the scale of
spatial variation locally. The local range has
9.5.4. When is local, local enough?
also been mapped and used as a texture
measure in the classification of remotely The size of neighbourhood within which
sensed images (e.g., Ramstein and Raffy, the local variogram is estimated, whether
1989; Atkinson and Lewis, 2000). Recently, defined in terms of a search radius or the
nearest number of data points, represents a this chapter (see section 9.4), and such
compromise between two competing factors. approaches overcome the problem of nonsta-
The first is the desire to achieve suf- tionarity of the mean and variogram which
ficient data points (i.e., sufficiently large is likely to be encountered if the region of
neighbourhood) to reduce the uncertainty concern is large.
of variogram estimation to a tolerable Perhaps the biggest change in focus in the
level. McBratney and Webster (1986) and application of geostatistics in the last 20 years
Webster and Oliver (1992) provide excellent has been a shift from prediction (kriging)
discussions of the number of data required based analyses to those based on conditional
for reliable estimation of the variogram. simulation (see section 9.3.3). Simulation
The second is the desire to reduce the allows the generation of many equally-
neighbourhood such as to localize suf- probable realizations and the exploration
ficiently the variogram parameters. With of spatial uncertainty in the property of
regard to the latter point, it should be interest. In cases where extreme values are
remembered that since the objective is of interest kriging is problematic because
precise spatial prediction, what is actually of its smoothing properties. In such cases,
required is to represent accurately the local conditional simulation is more appropriate
variogram within the window used for (Goovaerts, 1997).
local kriging. So an extremely localized Another research focus has been on
variogram may be counter-productive. Ulti- the use and development of model-based
mately, a balance between these factors geostatistics (Diggle and Ribeiro, 2006).
should be achieved, potentially through The term was coined by Diggle et al.
calibration of the window size, although (1998) who introduced a body of approaches
this possibility is often too expensive that is applicable where Gaussian distribu-
computationally. tional assumptions, and therefore classical
geostatistics, are inappropriate. A Bayesian
approach is presented that the authors
argue enables uncertainty in the prediction
9.6. FUTURE TRENDS IN of model parameters to be accounted for
GEOSTATISTICS properly.
The advances in geostatistical methodol-
The availability of extensive data sets which ogy that have been made are limited in their
cover large areas and have a variety of application if extensive expert knowledge
supports poses problems for conventional is required to apply such models. In the
geostatistics, as this chapter indicates. Much last decade, the range of software packages
research is being conducted to develop with extensive geostatistical functionality has
solutions to the kinds of problems that grown markedly. Functions for estimating
have arisen. Gotway and Young (2002) variograms and for kriging and simulation
review a variety of approaches for area are now commonplace in GIS software.
to point interpolation while Kyriakidis Undoubtedly, with widespread access to
(2004) outlines one possible framework in often very sophisticated methods misuse and
the univariate (kriging) case and Pardo- misunderstanding are apparent (Atkinson,
Igzquiza and Atkinson (2007) a possible 2005). However, an increasingly well edu-
solution in the multivariate (cokriging) case. cated user base will hopefully contribute
There are various nonstationary geostatistical to more effective use of spatial data in all
models, as discussed at some length in application areas.
9.7. SUMMARY Armstrong, M. (1998). Basic Linear Geostatistics. Berlin:

Springer.
Geostatistics represents a set of tools for Berberoglu, S., Lloyd, C.D., Atkinson, P.M. and
the analysis of spatial data. This set is Curran, P.J. (2000). The integration of spectral
characterized by its shared dependence on the and textural information using neural networks
RF model. Central to the RF is the notion of for land cover mapping in the Mediterranean.
Computers and Geosciences, 26: 385396.
parameter stationarity. For many data sets in
mining engineering and petroleum geology Bocher, P.K. and McCloy, K.R. (2006). The fun-
damentals of average local variance part I:
the decision to adopt a stationary model is a
detecting regular patterns. IEEE Transactions on
necessity due to sparcity of data. For many Image Processing, 15: 300310.
geographical data sets such as are provided
Burrough, P.A. and McDonnell, R.A. (1998). Principles
by remote sensing (e.g., LiDAR elevation of Geographical Information Systems. Oxford: Oxford
data) it is sensible to relax the constraint University Press.
of stationarity and estimate the parameters
Chils, J.-P. and Delner, P. (1999). Geostatistics:
of the RF model locally. This chapter has Modeling Uncertainty. New York: Wiley.
reviewed geostatistics with a particular focus
Christakos, G. (1984). On the problem of permissible
on non-stationary approaches. Readers are covariance and semi-variogram models. Water
now directed to Chils and Delfiner (1999) Resources Research, 20: 251265.
which is widely regarded as a standard
Cressie, N.A.C. (1985). Fitting semi-variogram models
reference on the subject. by weighted least squares. Mathematical Geology,
17: 563586.
Curran, P.J. and Atkinson, P.M. (1998). Geostatistics
and remote sensing. Progress in Physical Geography,
ACKNOWLEDGEMENTS 22: 6178.
Deutsch, C.V. (2002). Geostatistical Reservoir
The authors thank the British Atmospheric Modelling. New York: Oxford University Press.
Data Centre (BADC) for providing access to Deutsch, C.V. and Journel, A.G. (1998). GSLIB:
the United Kingdom Meteorological Office Geostatistical Software and Users Guide, 2nd edn.
(UKMO) Land Surface Observation Stations New York: Oxford University Press.
Data used in the case study. Diggle, P.J. and Ribeiro, P.J. (2006). Model-based
Geostatistics. New York: Springer.
Diggle, P.J., Tawn, J.A. and Moyeed, R.A. (1998).
Model-based geostatistics. Journal of the Royal
REFERENCES Statistical Society: Series C (Applied Statistics), 47:
299350.
Atkinson, P.M. (2001). Geographical information Dungan, J.L. (1999). Conditional simulation. In:
science: GeoComputation and non-stationarity. A. Stein, F. van der Meer and B. Gorte (eds),
Progress in Physical Geography, 25: 111122. Spatial Statistics for Remote Sensing, pp. 135152.
Dordrecht: Kluwer Academic Publishers.
Atkinson, P.M. (2005). Spatial prediction and surface
modelling. Geographical Analysis, 36: 113123. Gething, P.W., Atkinson, P.M., Noor, A.M., Gikandi,
P.W., Hay, S.I. and Nixon, M.S. (2007) A
Atkinson, P.M. and Lewis, P. (2000). Geostatistical local space-time kriging approach applied to a
classication for remote sensing: an introduction. national outpatient malaria dataset. Computers and
Computers and Geosciences, 26: 361371. Geosciences, 33: 13371350.
Atkinson, P.M. and Tate, N.J. (2000). Spatial scale Goovaerts, P. (1997). Geostatistics for Natural
problems and geostatistical solutions: a review. Resources Evaluation. New York: Oxford University
Professional Geographer, 52: 607623. Press.
Gotway, C.A. and Young, J.J. (2002). Combining Lloyd, C.D., Berberoglu, S., Curran P.J. and Atkinson
incompatible spatial data. Journal of the American P.M. (2004). Per-eld mapping of Mediterranean
Statistical Association, 97: 632648. land cover: A comparison of texture measures.
International Journal of Remote Sensing, 15:
Haas, T.C. (1990). Lognormal and moving window
39433965.
methods of estimating acid deposition. Journal of
the American Statistical Association, 85: 950963. McBratney, A.B. and Webster, R. (1986). Choosing
Herzfeld, U.C. and Holmlund, P. (1990). Geostatistics functions for semi-variograms of soil properties and
in glaciology: implications of a study of tting them to sampling estimates. Journal of Soil
Scharffenbergbotnen, Dronning Maud Land, East Science, 37: 617639.
Antarctica. Annals of Glaciology, 14: 107110. Oliver, M.A. and Webster, R. (1990). Kriging: a
Hudson, G. and Wackernagel, H. (1994). Mapping method of interpolation for geographical information
temperature using kriging with external drift: theory systems. International Journal of Geographical
and an example from Scotland. International Journal Information Systems, 4: 313332.
of Climatology, 14: 7791.
Pardo-Igzquiza, E. (1999). VARFIT: a Fortran-77
Isaaks, E.H. and Srivastava, R.M. (1989). An Intro- program for tting semi-variogram models by
duction to Applied Geostatistics. New York: Oxford weighted least squares. Computers and Geosciences,
University Press. 25: 251261.
Journel, A.G. (1996). Modelling uncertainty and spatial Pardo-Igzquiza, E. and Atkinson, P.M. (2007).
dependence: stochastic imaging. International Automatic modelling of variograms and cross-
Journal of Geographical Information Systems, 10: variograms in downscaling cokriging by numerical
517522. convolution-deconvolution. Computers and Geo-
sciences, 33 12731284.
Journel, A.G. and Huijbregts, C.J. (1978). Mining
Geostatistics. London: Academic Press. Pebesma, E.J. and Wesseling, C.G. (1998). Gstat,
Kyriakidis, P.C. (2004). A geostatistical framework a program for geostatistical modelling, prediction
for area-to-point spatial interpolation. Geographical and simulation. Computers and Geosciences, 24:
Analysis, 36: 259289. 1731.
Lloyd, C.D. (2002). Increasing the accuracy of Ramstein, G. and Raffy, M. (1989). Analysis of the
predictions of monthly precipitation in Great Britain structure of radiometric remotely-sensed images.
using kriging with an external drift. In: Foody, G.M. International Journal of Remote Sensing, 10:
and Atkinson, P.M. (eds), Uncertainty in Remote 10491073.
Sensing and GIS, pp. 243267. Chichester: John
Richmond, A. (2002). Two-point declustering for
Wiley and Sons.
weighting data pairs in experimental semi-variogram
Lloyd, C.D. (2005). Assessing the effect of integrating calculations. Computers and Geosciences, 28:
elevation data into the estimation of monthly 231241.
precipitation in Great Britain. Journal of Hydrology,
308: 128150. Sampson, P.D., Damien, D. and Guttorp, P. (2001).
Advances in modelling and inference for envi-
Lloyd, C.D. and Atkinson P.M. (2004). Archaeology and ronmental processes with non-stationary spatial
geostatistics. Journal of Archaeological Science, 31: covariance. In: Monestiez, P., Allard, D. and
151165. Froidevaux, R. (eds), GeoENV III: Geostatistics for
Environmental Applications, pp. 1732. Dordrecht:
Lloyd, C.D., Atkinson, P.M. and Aplin, P. (2005).
Kluwer Academic Publishers.
Characterising local spatial variation in land cover
imagery using geostatistical functions and the dis- Schabenberger, O. and Gotway, C.A. (2005). Statistical
crete wavelet transform. In: Renard, P., Demougeot- Methods for Spatial Data Analysis. Boca Raton:
Renard, H. and Froidevaux, R. (eds), Geostatistics for Chapman and Hall/CRC.
Environmental Applications: Proceedings of the
Fifth European Conference on Geostatistics for Wackernagel, H. (2003). Multivariate Geostatistics.
Environmental Applications. pp. 391402. Berlin: An Introduction with Applications, 3rd edn. Berlin:
Springer. Springer.
Webster, R. and Oliver, M.A. (1990). Statistical Webster, R. and Oliver, M.A. (2000). Geostatistics
Methods in Soil and Land Resource Survey. Oxford for Environmental Scientists. John Wiley and Sons:
University Press: Oxford. Chichester.
Webster, R. and Oliver, M.A. (1992). Sample Webster, R. and McBratney, A.B. (1989). On the Akaike
adequately to estimate variograms of soil information criterion for choosing models for semi-
properties. Journal of Soil Science, 43: variograms of soil properties. Journal of Soil Science,
177192. 40: 493496.
10
Spatial Sampling
Eric Delmelle
10.1. INTRODUCTION relatively inexpensive. As a rule of thumb,

it is generally desirable to have a higher
When trying to make inferences about concentration of samples where exhaustive
a phenomenon, we are forced to collect and accurate information is needed, keep-
a limited number of samples instead of ing in mind that the number of samples
trying to acquire information at every should always be as representative as pos-
possible location (see, e.g., Cochran, 1963; sible of the entire population (Berry and
Dalton et al., 1975; Hedayat and Sinha, Baker, 1968).
1991; and Thompson 2002 for various When surveying a phenomenon charac-
summaries). A full inventory would yield terized by spatial variation, it is necessary
a clear picture of the variability of the to find optimal sample locations in the
variable of interest, although this process study area D. This problem is referred to
is very time-consuming and expensive. spatial or two-dimensional sampling and
Haining (2003) underlines that the cost of has been applied to many disciplines such
acquiring information on each individual as mining, soil pollution, environmental
may rule out a complete census. Sparse monitoring, telecommunications, ecology,
sampling on the other hand is cheap, but geology, and geography, to cite a few.
misses important features. However, there Specific studies on spatial sampling can be
are instances where the level of precision found in Ripley (1981), Haining (2003),
may be the major motivation of the sampling Cressie (1991), Stehman and Overton (1996)
process, especially when sampling remains and Muller (1998). Spatial and non-spatial
sampling strategies share common characte- an optimal sampling arrangement, to obtain

ristics: a maximum amount of information. If we
undersample in some areas, the spatial vari-
ability will not be captured. Oversampling
1 the size m of the set of samples;
on the other hand can result in redundant
2 the selection of a sample design, limited by the data. Consequently, both the location and
available budget; quantity of the samples is important. This
chapter is concerned primarily with the
3 an estimator (e.g., the mean) for the population second category of sampling challenges,
characteristic; and i.e., capturing the spatial structure of the
primary variable.
4 an estimation of the sampling variance to
compute condence intervals.
Following Haining, spatial sampling 10.1.2. Structure of the chapter

challenges can be divided into three different
In this chapter, spatial sampling configura-
categories. The first pertains to problems
tions are reviewed along with their benefits
concerned with estimating some non-spatial
and drawbacks. Second, the influence of
characteristics of a spatial population; for
geostatistics on sampling schemes is dis-
example, the average income of households
cussed. Sampling schemes can be designed
in a state. The second category deals with
to capture the spatial variation of the variable
problems where the spatial variation of a
of interest. Two common objectives therein
variable needs to be known, in the form
are the estimation of the covariogram and
of a map, or as a summary measure that
the minimization of the kriging variance.
highlights scales of variation. The third
Third, methods of adaptive sampling and
category includes problems where the
second-phase sampling are presented. Such
objective is to obtain observations that
methods are of a nonlinear nature, and
are independent of each other, allowing
appropriate optimization techniques are nec-
classical statistical procedures to assist in
essary to solve such problems. Finally, salient
classifying data.
sampling problems such as sampling in the
presence of multivariate information, and the
use of heuristics are discussed.
10.1.1. Spatial structure
A common objective in both spatial and
non-spatial approaches is to design a sam-
pling configuration that minimizes the vari- 10.2. SPATIAL SAMPLING
ance associated with the estimation. In this CONFIGURATIONS
regard, the location of the samples is very
critical and depends heavily on the structure This section reviews significant sampling
of the variable. In non-spatial problems, schemes for the purpose of two-dimensional
it may be crucial to stratify the sampling sampling. In the following subsections
scheme according to important underlying I will assume that a limited number of
covariates. This holds for spatial phenomena samples m is collected within a study
as well. Unfortunately, this variation is often area denoted D. The variable of interest
unknown, and an objective is to design Z is sampled on m supports, generating
SPATIAL SAMPLING 185
observations {z(si ) | i = 1, 2, . . . m }. For ease the remaining m 1 elements are aligned

of illustration, a square study area is used. regularly by the size of the interval .
If the first sample is chosen at random, the
resulting scheme is called systematic random
10.2.1. Major spatial sampling. When the first sample point is not
sampling designs chosen at random, the resulting configura-
tion is called regular systematic sampling.
Random sampling A centric systematic sampling occurs when
A simple random sampling scheme consists the first point is chosen in the center
of choosing randomly a set of m sample of the first interval. The resulting scheme
points in D, where each location in D has an is a checkerboard configuration. The most
equal probability of being sampled (Ripley, common regular geometric configurations are
1981). The selection of a unit does not the equilateral triangular grid, the rectangular
influence the selection of any other one (square) grid, and the hexagonal one (Cressie,
(King, 1969). Figure 10.1(a) illustrates the 1991). Practically, consider the case where
random configuration. This type of design D is divided into a set of small, square
is also called uniform random sampling
cells of size = L/ m. A first point
since each point is chosen independently s1 = {x1 , y1 } is selected within the first cell in
uniformly within D. Practically, two random the bottom left of D. The coordinates of s1 are
numbers Ki and Ki are drawn from the subsequently used to determine the following
interval [0, 1]. Then the point si , defined by point si = {xi , yi } (Aubry, 2000):
the pair {xi , yi } is selected such that:
xi = x1 + (i 1), yi = y1 + (j 1)
xi = Ki L, yi = Ki L, (10.1)

i, j = 1, . . ., m. (10.2)
where L denotes the length of the study area

D (Aubry, 2000). The process is repeated To locate sample points along the x- and
m-times. According to Griffith and Amrhein y-directions, it is imperative to have a
(1997), the distribution of the points may desired number of samples m for which

not be representative of the underlying m must be an integer value. The bene-
geographic surface, because for most samples fits of a systematic approach reside in a
drawn, some areas will be oversampled good spreading of observations across D,
while other will be undersampled. The guaranteeing a representative sampling cov-
advantages of this design however reside in erage. Additionally, the spreading of the
its operational simplicity, and its capacity to observations prevents sample clustering and
generate a wide variety of distances among redundancy. This design however presents
pairs of points in D. two inconveniences:
1 the distribution of distances between points of D

Systematic sampling
is not sampled adequately because many pairs of
The population of interest is divided into
points are separated by the same distance; and
m intervals of equal size. The first element
is randomly or purposively chosen within 2 there is a danger that the spatial process
the first interval, starting at the origin. shows evidence of recurring periodicities that
Depending on the location of the first sample, will remain uncaptured, because the systematic
0.9
0.8
0.7
0.6
y 0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
(a)
0.9
0.8
0.7
0.6
y 0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
(b)
Figure 10.1 From left to right, top to bottom: random, centric systematic, systematic
random, and systematic unaligned sampling schemes. Sampling size m = 100.
0.9
0.8
0.7
0.6
y 0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
(c)
1
b e g
a d f
0.9
j
c k
0.8
l
h
0.7
i
0.6
y 0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
(d)
design coincides in frequency with a regular method that combines systematic and random
pattern in the landscape (Grifth and Amrhein, procedures (Dalton et al., 1975). One sample
1997; Overton and Stehman, 1993). point is randomly selected within each
cell. However, sample density needs to
The second drawback can be lessened be high enough to have some clustering
considerably by use of a systematic random of observations or the spatial relationship
between observations cannot be built. From Consider the estimation of the global
Figure 10.1(c), some patches of D remain mean zD :
undersampled, while others regions show
evidence of clustered observations. A system-
1
atic unaligned scheme prevents this problem zD = z(s) ds. (10.3)
from occurring by imposing a stronger [D] D
restriction on the random allocation of

observations (King, 1969). It is desirable, from a statistical standpoint
to select a configuration that minimizes the
prediction error of zD for a given estimator,
Stratied sampling for instance the arithmetic mean:
According to Haining (2003), there are
cases when local-area estimates are to be
1
m
examined, causing stratification to be built
z = z(si ). (10.4)
into the sampling strategy. In stratified m
i=1
sampling, the survey area (or D) is par-
titioned into non-overlapping strata.1 For
each stratum, a set of samples is collected, Efficiency is calculated for all possible
real-
where the sum of the samples over all izations of the variable Z by Var ZD ZD
strata must equal m. The knowledge of using k2 , which is the geostatistical pre-
the underlying process is a determining diction error, defined later. In terms of
factor in defining the shape and size of the sampling variance, stratified random
each stratum. Some subregions of D may sampling is at least always equally or more
exhibit stronger spatial variation, ultimately accurate than random sampling; its relative
affecting the configuration of each stra- efficiency is a monotone increasing function
tum (Cressie, 1991). Smaller strata are of sample size.
preferred in non-homogeneous subregions.
When points within each stratum are chosen
randomly, the resulting design is named Spatial autocorrelation
stratified random sampling. In Figure 10.2(a), Ideally, the density of sample points should
six strata are sampled in proportion to their increase in locations exhibiting greater spatial
size. For instance, stratum A represents 30% variability. Values of closely spaced samples
of D, therefore if m = 100, 30 sample points will show strong similarities and it may be
will be allocated within A. Figure 10.2(b) redundant to oversample in those areas. The
illustrates the allocation of one sample per spatial autocorrelation function summarizes
stratum (in casu the centroid), undersampling the similarity of the values of the variable
larger strata. of interest at different sample locations, as
a function of their distance (Gatrell, 1979;
Griffith, 1987). Morans I (Moran, 1948,
1950) is a measure of the degree of spatial
10.2.2. Efciency of spatial autocorrelation among data points:
sampling designs
The sampling efficiency is defined as the
m i, j w(sij )(z(si ) z)(z(sj ) z)
inverse of the sampling variance. According I=
i (z(si ) z)
1 W1 2
to Aubry (2000), the most efficient design
leads to the most accurate estimation. (10.5)
1
C
0.9 D
F
0.8
E
0.7
0.6
y 0.5
0.4
0.3
B A
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
(a)
1
C D
0.9
F
0.8
E
0.7
0.6
y 0.5
0.4
0.3
B A
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
(b)
Figure 10.2 Stratied sampling designs with six strata of different sizes (m = 6 on the right
gure and m = 100 to the left).
with W defined as a weight matrix w(sij ), of spatial proximity between points si and sj ;
m is the number of observations, the mean for example:
of the sampled values is denoted by z
and z(si ) is the measured attribute value at
location si . The weight w(sij ) is a measure w(sij ) = exp(d(sij )2 ) (10.6)
where d(sij )2 is the squared distance between these counties we might sample a number of
location si and point sj . Morans I is not quadrats, or say, townships and finally, within
implicitly constrained within the interval the latter, randomly select some farmsteads
[1, +1]. Spatial autocorrelation generally (King, 1969).
decreases as the distance between sample In the multivariate case, dependent and
points increases. A positive autocorrelation independent variables are hierarchically
occurs when values taken at nearby samples organized and are thus not collected at
are more alike than samples collected farther the same sampling frequency (Haining,
away. When the autocorrelation is a linearly 2003). The primary variable may exhibit
decreasing function of distance, stratified rapid change in spatial structure while the
random sampling has a smaller variance secondary variables are much more homo-
than a systematic design (Quenouille, 1949). geneous. A hierarchical sampling design
If the decrease in autocorrelation is not linear, captures such variation by collecting one
yet concave upwards, systematic sampling variable at points nested within larger sam-
is more accurate than stratified random pling units so that it can be collected more
sampling, and a centered systematic design, intensively than another variable.
where each point falls exactly in the middle
of each interval, is more efficient than a
random systematic sampling configuration Clustered sampling
(Madow, 1953; Zubrzycki, 1958; Dalenius This type of sampling consists of the
et al., 1960; Bellhouse, 1977; Iachan, 1985). random selection of groups of sites where
sites are spatially close within groups
(Cressie, 1991). Clusters of observations are
drawn independently with equal probability.
10.2.3. Other sampling designs
In the first stage, when the population
Nested or hierarchical sampling is grouped into clusters, the clusters are
Nested or hierarchical sampling designs first sampled (Haining, 2003). Either all of
require the study area D to be partitioned the observations in the clusters, or only
randomly into sample units (or blocks) a random selection from it, are included.
creating the first level in the hierarchy, Cluster sampling is essentially useful in the
and this is then further subdivided into discrete case, when a complete list of the
sample units nested within level 1, and members of a population cannot be obtained,
so forth (Haining, 2003). These units can yet a complete list of groups (i.e., clusters) of
be systematically or irregularly arranged. the variable is available. The method is also
As the process progresses, the distances useful in reducing sampling cost.
between observations decreases (Corsten and
Stein, 1994). One advantage of a nested
sampling design is that it allows for multiple
scale analysis and supports quadrat analysis. 10.3. SAMPLING RANDOM FIELDS
Spatially nested sampling designs may work USING GEOSTATISTICS
well for geographic phenomenon that are
naturally clustered and for exploring multiple Most classical statistical sampling methods
scale effects. Hierarchical sampling is also make no use of the spatial information
possible at the discrete level. In such cases, provided by nearby samples. Geostatistics
it is desirable to first select randomly one describes the spatial continuity that is an
or more counties in a state. Then within essential feature of many natural phenomena.
It can be seen as a collection of statistical The corresponding covariogram C(h) that

methods, describing the spatial autocorre- summarizes the covariance between any two
lation among sample data. In geostatistics, points is:
multidimensional random fields are formal-
ized and modeled as stochastic processes
(see, e.g., Matrn, 1960; Whittle, 1963). C(h) = C(0) (h) = 2 (h). (10.9)
In other words, the variable of interest is
modeled as a random process that can take a
series of outcome values, according to some The interpolated, kriged value at a location
probability distribution (Goovaerts, 1997). s in D is a weighted mean of surrounding
Kriging is an interpolation technique that values; each value is weighted according to
estimates the value of the primary variable the covariogram model:
at unsampled locations
(usually on
a set G
of grid points sg g = 1, 2, . . ., G , while
minimizing the prediction error. Using data
I
values of Z, an empirical semivariogram (h) z(s) = wi (s)z(si ) (10.10)
summarizing the variance of values separated i=1
by a particular distance lag (h) is defined:
where I is the set of neighboring points

1 2 that are used to estimate the interpolated
(h) = z(si ) z(sj ) value at location s, and wi (s) is the
2d(h)
|si sj |=h weight associated with each surrounding
(10.7) point. The optimization of spatial sampling
in a geostatistical context first requires
the estimation of a model to express the
where d(h) is the number of pairs of points spatial dependence at different pairs of
for a given lag value, and z(si ) is the distances. This is summarized in the covar-
measured attribute value at location si . The iogram function. Secondly, such a model
semivariogram is characterized by a nugget is then used for optimal interpolation of
effect a, and a sill 2 where (h) levels out. the variable under study (Van Groenigen,
The nugget effect is the spatial dependence 1997).
at micro scales, caused by measurement
errors at distances smaller than the possible
sampling distances (Cressie, 1991). Once the
lag distance exceeds a value r, called the 10.3.1. Optimal geometric designs
range, there is no spatial dependence between for covariogram estimation
the sample sites. The variogram function
To compute the most representative covar-
(h) becomes constant at a value called
iogram and to capture the main features
the sill, 2 . A model (h) is fitted to the
of spatial variability, a good spread-
experimental variogram (e.g., an exponential
ing of sample points across the study
model). With the presence of a nugget
area is necessary (Van Groenigen et al.,
effect a:
1999). In that context, systematic sampling
(Figure 10.1(b)) performs well. However,
such a sampling design does not guarantee a
(h) = a + ( 2 a)(1 e3h/r ). (10.8) wide range of separating distances (which is
necessary to estimate the covariogram), the Warrick/Myers (WM) criterion tries

because: to reproduce an a priori defined ideal
distribution of pairs of points for estimating
the covariogram. The procedure allows one to
1 distances are not evenly distributed; and
account for the variation in distance. Follow-
2 there are few pairs of points at very small ing Van Groenigen (1997), the WM-criterion
distances to estimate the nugget effect. is defined as:
A systematic random or systematic

K
K
unaligned sample will generate a greater Jw/m (S) = a wi (i i )2 +b (mi )

i=1 i=1
variety of distance pairs. Another solution
(10.11)
consists of designing a sampling arrangement
where a subset of the m observations are

K
m(m 1)
evenly spread across the study area D and i = (10.12)
the remaining points are somewhat more 2
i=1
clustered (Figure 10.3), to capture the
covariance at very small distances.
where i denotes a given lag class of the
covariogram, K represents the total number
Sample size and sample of classes, and the parameters a, b, and
conguration issues wi are user-defined weights. The term i
Optimizing the sampling configuration to is a prespecified number of point-pairs for
estimate the parameters of the covariogram the ith class, i is the actual number of
is not an easy task. Webster and Oliver distances within that class, and (mi ) is the
(1993) suggested that a total of at least standard deviation from the median of the
m = 150 samples over the study area distance lag class (Warrick and Myers, 1987).
is necessary. Moreover, the reliability of Equation (10.12) expresses the total number
the covariogram is partly dependent on of possible distance pairs, given the number
the number of pairs of points available of samples. So for instance, when m = 4,
within each distance class. In this context, six pairs of points are generated.
(a) (b)
Figure 10.3 A systematic sampling scheme of m = 36 points in D is improved by the

introduction of n = 12 additional samples () clustered among the initial samples.
Presence of anisotropy parts of the area. This in turn generates

Anisotropy (as opposed to isotropy) is a only a few distances for which covariogram
property of a natural process, where the values are available. Nested sampling designs
autocorrelation among points changes with are especially unsuitable when the observa-
distance and direction between two locations. tions collected according to such a design
In other words, spatial variability is direction- are used subsequently to estimate values
dependent. Spatial variables may exhibit at unvisited locations (Corsten and Stein,
linear continuity, such as in estimating 1994).
riparian habitat along rivers, aeolian deposits,
and soil permeability along prevailing wind
directions. We talk about an isotropic process
however when there is no effect of direction 10.3.2. Optimal designs to
in the spatial autocorrelation of the primary minimize the kriging
variable. It is generally desirable to aug- variance
ment the sampling frequency in the angle
of minimum continuity, since the spatial Kriging provides not only a least-squares
gradient of variation is maximum in that estimate of the attribute but also an error
direction. variance (Isaaks and Srivastava, 1989),
quantifying the prediction uncertainty at a
particular location in space. This uncertainty
Impact of the nugget effect is minimal, or zero when there is no
Bogaert and Russo (1999) made an attempt nugget effect, at existing sampling points
to understand how the covariogram param- and increases with the distance to the
eters are influenced by the choice of nearest samples. A major objective consists
particular sampling locations. Their objec- of designing a sampling configuration to
tive was to limit the variability of the minimize this uncertainty over the study
covariogram estimator. When the covari- area. This can be achieved when the covar-
ogram has no nugget effect, the benefits iogram, representing the spatial structure
of the optimization procedure are somewhat of the variable, is known a priori or
diminished. In the presence of a nugget has been estimated. In this regard, optimal
effect, a random sampling configuration sampling strategies have been suggested
will score poorly, because of the limited to reduce the prediction error associated
information offered by random sampling for with the interpolation process (Pettitt and
small distances. McBratney, 1993; Van Groenigen et al.,
1999). Equation (10.13) formulates the
kriging variance at a location s, where C1 M
Using nested designs is the inverse of the covariance matrix
A nested design allows good estimation of the CM based on the covariogram function
nugget effect at the origin. However, nested (Bailey and Gatrell, 1995). M denotes the
sampling configurations produce inaccurate set of initial samples and has cardinality m.
estimation of the covariogram in comparison The term c is a column vector and cT
to random and systematic sampling. This the corresponding row vector, as given in
occurs due to the rather limited area covered Equation (10.15):
by the sampling scheme, yielding a high
observation density in subregions of the area,
and a low observation density for other k2 (s) = 2 cT (s) C1
M c(s) (10.13)

2 C1,2 . . . C1,m a higher interpolation uncertainty, which
C2,1 2 C2,m is increasing away from existing points.

CM = . .. .. (10.14)
.. . . The estimation error is low at visited
Cm,1 Cm,2 2 points.

2
C2,1 Distance-based criteria

c = . , cT = 2 C1,2 . . . C1,m . It is possible to design sampling config-
..
urations considering explicitly the spatial
Cm,1 correlation of the variable (Arbia, 1994).
(10.15) What would you do if you were in a dark
room with candles? You would probably
light the first candle at a random location
The total kriging variance TKV is obtained
or in the middle of the room. Then you
by integrating Equation (10.13) over D:
would find it convenient to light the second
candle somewhere further away from the

first. How far away will depend on the
TKV = k2 (s)ds. (10.16)
D luminosity of the first candle. The stronger
the light, the further it can be located from
the first candle. You would then light the
Computationally, it is easier to discretize D third candle far away from the two first ones.
and sum the kriging variance over all grid Such an approach known as Depending
points sg . The average kriging variance AKV Areal Units Sequential Technique (DUST)
over the study area is defined as: is an infill sampling algorithm, and very
suitable to locate points to minimize the
kriging variance over D. Another method,
AKV = k2 (sg ). (10.17)
known as the Minimization of the Mean
gG
of the Shortest Distances (MMSD) requires
all sampling points spread evenly over
The only requirement to calculate the kriging the study area, ensuring that unvisited
variance is to have an initial covariogram and locations are never far from a sampling
the locations of the m initial sample points. point. Both MMSD and DUST methods
It then depends solely on the spatial depen- assume:
dence and configuration of the observations
(Cressie, 1991).
1 prior knowledge of the spatial structure of the
variable; and
Illustration
2 a stationary variable an assumption violated in
Since continuous sampling is not feasible,
practice.
it is necessary to discretize the area into
a set of potential points. Seeking the best
sampling procedure becomes a combinatorial Both criteria are purely deterministic, result-
problem. Figure 10.4 illustrates the kriging ing in spreading pairs of points evenly across
variance associated with random sampling the study area, similar to the systematic
and systematic random sampling from an configuration. Van Groenigen (1997) notes
exponential model. Darker areas denote that the area D is a continuous, infinite plane.
0.9
0.8
0.7
0.6
y 0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
(a)
1
0.9
0.8
0.7
0.6
y 0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
(b)
Figure 10.4 The kriging variance of a systematic random pattern (right gure) reduces the
value of Equation 17 by 20% from a random pattern. Sample patterns are similar to those in
Figure 10.1.
In reality, it is not physically possible arrangement, although the improvement is

to sample everywhere; the presence of marginal (Olea, 1984).
spatial barriers such as roads, buildings or
mountains restricts the sampling process and
limits the number and location of potential
Choice of a covariogram tting model
points.
Does the choice of a covariogram fitting
model affect the value of equation (10.17)?
According to Van Groenigen (2000), an expo-
Impact of the nugget effect nential model generates a point-symmetric
What is the influence of the nugget effect sampling configuration that is identical to a
and sampling densities on the final sam- linear model. However, the use of a Gaussian
pling configuration? As the ratio nugget/sill model tends to locate sample points very
increases, a different sampling configuration close to the boundary of D. This is explained
is reached, placing more observations near by the large kriging weights assigned to small
the boundaries of the study area, because distance values (parabolic behavior at the
of the high variance at short distances. origin).
In that case, more samples are needed to
obtain the same level of objective function
(equation (10.17)) over D (Burgess et al.,
1981). When the nugget effect is maximum 10.3.3. Sampling reduction
( sill), the covariogram is pure noise, and
Sampling density reduction of an existing
the resulting optimal sampling scheme is
spatial network is a problem related to
purely random, because no spatial correlation
sampling designs and is relevant in many
is present. At maximum sampling density, the
regions of the world where funding for
estimation variance can never be less than the
environmental monitoring is decreasing. The
nugget effect. When the variance among pairs
process entails lowering the number of sam-
of points at very small distances ( nugget
ples to reach an effective level of accuracy.
effect) is very high, a hexagonal design will
Technically, it consists of selecting existing
perform best.
samples from the original data set that will,
in combination with a spatial interpolation
algorithm, produce the best possible estimate
Presence of anisotropy of the variable of interest, in comparison
Which type of sampling design performs with the results obtained if all sample
better in reducing the maximum kriging points were used (Olea, 1984). Usually, it
variance, when anisotropy is present? When is assumed that the residuals come from a
the process is isotropic, a systematic equi- stationary process, and that the covariogram
lateral triangle design will keep the variance is linearly decreasing, with no nugget effect,
to a minimum, because it reduces the and that the process is isotropic. In a
farthest distance from initial sample points study aimed at predicting soil water content,
to points that are not visited. A square Ferreyra et al., (2002) developed a similar
grid performs well, especially in the case sampling density reduction method, from
of isotropy (McBratney and Webster, 1981; 57 observations to 10 observations. With an
McBratney et al., 1981). When anisotropy optimal arrangement of 10 samples, over
is present on the other hand, a square 70% of the predicted water content had an
grid pattern is preferred to a hexagonal error within 10%, showing that a similar
level of confidence is reached with a limited spacing of terrain elevation data points to
number of samples. produce a Digital Elevation Model (DEM).
The importance of evaluating the adequate
number of data points as well as the appropri-
ate sampling distribution of such points, that
10.4. SECOND-PHASE AND in turn constitute a good match to character-
ADAPTIVE SAMPLING ize a given terrain. Determining a sufficient
number of points is not straightforward, since
When there is a need or desire to gather it depends on terrain roughness in relation to
more information (i.e., additional samples) the size of the area occupied by the terrain.
about the variable of interest, we talk The ideas suggested in progressive sampling
about adaptive and second-phase sampling, were later carried over to the field of adaptive
depending on the study objective. In the sampling (see Thompson and Seber, 1996).
following subsections, both techniques are A major difference with conventional designs
discussed. lies in the selection of additional samples
in adaptive designs, because the location
of a new sample will depend upon the
value of the points observed in the field.
10.4.1. Adaptive sampling
In other words, the procedure for selecting
Adaptive sampling finds its roots in the additional samples depends on the outcome
concept of progressive sampling (Makarovic, of the variable of interest, as observed during
1973). It provides an objective and automatic the survey of an initial sampling phase.
method for sampling, for example, terrain of The addition of a new sample improves
varying complexity when sampling altitude confidence in the sampling distribution.
variation. As illustrated in Figure 10.5, Adaptive sampling is very efficient in the
progressive sampling involves a series of context of soil contamination (Cox, 1999).
successive runs, beginning with a coarse How should a risk manager decide where to
sampling grid and then proceeding to grids of re-sample in order to maximize information
higher densities. The grid density is doubled on contamination? In this particular context
on each successive sampling run and the it is generally recommended to sample in
points to be sampled are determined by a locations above a particular threshold and
computer analysis of the data obtained on draw a fixed number of additional samples
the preceding run. The analysis proceeds around them until subsequent measurement
as follows: a square patch of nine points values are below a pre-specified contami-
on the coarsest grid is selected and the nation threshold. Figure 10.6 illustrates the
height differences between each adjacent procedure for adaptive cluster sampling,
pair of points along the rows and columns where sample points represent measurement
are computed. The second differences are locations of hypothetical contamination rates.
then calculated. The latter carries information On the left, contamination rates have been
on the terrain curvature. If the estimated measured at seven locations. A geographic
curvature exceeds a certain threshold, it location is said to be at risk (and needs
becomes necessary on the next run to increase remediation) when its value is above 0.7
the sampling density and sample points at the or at 70% of the contamination threshold.
next level of grid density. Call a property fathomed if samples have
A similar study was carried out by Ayeni been taken from its immediate neighbors.
(1982) to determine the optimum number and A common choice is to define new neighbors
15
25
30
20
30
25
10
20
25
15
30
20
30
25
(a)
(b)
Figure 10.5 Initial systematic sampling of altitude is performed over the study in the top
gure. When strong variation in elevation is encountered, the sampling density is increased
until desirable threshold is met.
of a contaminated zone to the North, South, because there is little rationale in taking
East, and West: fathom each property on additional samples in areas where we know
the list by sampling and remove it from that the probability of exceeding a particular
the risk list when it has been fathomed. threshold is maximal.
In other words, the procedure re-samples
four neighboring locations of a contaminated
site. Once a site shows a contamination rate
10.4.2. Second-phase sampling
under the threshold value, it is fathomed.
Otherwise, the procedure continues until a In second-phase spatial sampling, a set
trigger condition is satisfied (e.g., a maximum M of m initial measurements has been
number of additional samples is reached). collected, and a covariogram C(h) has been
This approach has some limitations however, calculated. In the second-phase, the scientist
.4
.5
.7
.4 .8
.4
(a)
.4
. 65
. 78
.5
. 74 . 72 . 7
. 78 . 59
. 76 . 71
. 75
. 71 . 89
. 73
.4
.8
. 69
. 72 . 89
. 71
.4
(b)
Figure 10.6 The cluster adaptive sampling procedure, illustrated in the context of toxic
waste remediation. A site is fathomed (+) when its toxicity rate does not exceed the
contamination value.
augments the set of observations, guided kriging variance of the augmented set M N
by the covariogram. The objective function containing [m + n] samples:
aims to collect new samples to reduce the
kriging variance or uncertainty by as much
as possible. Equation (10.18) formulates the
change in kriging variance k2 over all k2 = TKV old TKV new
grid points sg , when a set N of size n
containing new sample points is added to 1
our initial sample set M. The change k2 is = k,old2 (sg ) k,new
2
(sg )
G
the difference between the kriging variance gG gG
calculated with initial sample points and the (10.18)

1 T
k,old
2
(sg ) = 2 c(sg ) C c (sg ) (10.19) containing m+n points that will maximize the
!"#$ !"#$ ! "# $
[1,m] [m] [m,1]
change in weighted kriging variance. From
equation (10.21):
k,new
2
C1 cT (sg ) .
(sg ) = 2 c(sg ) !"#$
!"#$ ! "# $
[1,m+n] [m+n] [m+n,1] 1
(10.20) MAX
! "# $ J(S) = w(sg )k2 (sg ; S).
G
{sm+1 ,...,sm+n } gG
(10.22)
The objective function (equation (10.21)) is
to find the optimal set S containing m + n
In an effort to detect contaminated zones in
points that will maximize this change in
the Rotterdam harbor, Van Groenigen et al.,
kriging variance (Christakos and Olea, 1992;
(2000) introduced the Weighted Means of
Van Groenigen et al., 1999), where S is a
Shortest Distance (WMSD) criterion, offer-
specific sampling scheme:
ing a flexible way of using prior knowledge
on the variable under study. However, the
1 weights do not reflect the spatial structure
MAX
! "# $ J(S) = k2 (sg ; S ).
G of the variable, but rather the scientists
{sm+1 , ...,sm+n } gG
perception of the risks of contamination. In
(10.21) the first sampling phase, sampling weights
are assigned to sub-areas based on their
For simplicity, the continuous region D is risks for contamination. In the second phase
usually approximated by a finite set P of however, a greater weight is assigned to
p points (Cressie, 1991). The set of new locations expected to exhibit a higher priority
points is selected from the set of potential for remediation. Four weighting factors are
considered with weights w = 1, 1.5, 2, and
p
points P. Hence, there is a total of 3, leading to more intensive sampling where
n
possible sampling combinations and it is too the weight is higher. In a more recent study,
time-consuming to find the optimal set using Rogerson et al., (2004) have developed a
combinatorics. Figure 10.7 illustrates the case second-phase sampling technique, allowing
where 50 sample points have been collected re-sampling in areas where there is some
in the first stage, leading to an exponen- uncertainty associated with a variable of
tial covariogram, with the sequential addition interest, and hence not in areas where
of n = 10 new points and an improvement in the probability of an event occurring is
the objective function of nearly 20%. near 0 or 1. A greedy algorithm was proposed
to locate the points that would maximize the
change in weighted kriging variance.
Weighting the kriging variance?
The use of a weighting function w () for
the kriging variance was originally suggested Shortcomings of the use of the
by Cressie (1991) and has been applied kriging variance
by Van Groenigen et al., (2000), Rogerson Many authors have advocated the use of the
et al., (2004), and Delmelle (2005). The kriging variance as a measure of uncertainty.
importance of a location to be sampled is It is unfortunately misused as a measure
represented by a weight w(s). The objective of reliability of the kriging estimate, as
is to find the optimal sampling scheme S noted by several authors (Deutsch and
x 106 Kriging variance

4.721
4.72
4.719
4.718
4.717
y
4.716
4.715
4.714
4.713
4.712
6.7 6.71 6.72 6.73 6.74 6.75 6.76 6.77 6.78 6.79
x x 105
(a)
Change in Kriging variance
0.18
0.16
0.14
0.12
Improvement
0.1
0.08
0.06
0.04
0.02
0
0 1 2 3 4 5 6 7 8 9 10
Additional points
(b)
Figure 10.7 An initial sampling network of m = 50 points (in white) has been augmented
with the addition of n = 10 new samples (in blue). The gure to the right displays the
improvement.
11 2
? ?
8 9 1 0
12 37
(a) (b)
Figure 10.8 Example of two-dimensional non-stationarity. Dark points are used as data
values to interpolate the center point (light gray). After Armstrong (1994).
Journel, 1997; Armstrong, 1994). It is solely left since there is less variation among the
a function of the sample pattern, sample neighbors. This illustrates that the prediction
density, the numbers of samples and their error is not suitable for setting up confidence
covariance structure. The kriging variance intervals and should not be used as an
assumes that the errors are independent of optimization criterion for additional sampling
each other. This means that the process is strategies.
stationary, an assumption usually violated
in practice. Stationarity entails that the
variation of the primary variable between
two points remains similar at different
locations in space, as long their separation 10.5. CURRENT RESEARCH
distance remains unchanged. Figure 10.8 DIRECTIONS
illustrates non-stationarity in two dimensions
(Armstrong, 1994). The objective in this
10.5.1. Incorporating multivariate
particular example is to interpolate the value
information
of the inner grid point, highlighted with a Sample data can be very difficult to
question mark. The interpolation depends on collect, and very expensive, especially in
the values of the four surrounding points. monitoring air or soil pollution for instance
Two scenarios are presented. The scenario (Haining, 2003). Secondary data can be a
in b shows three very similar values and valuable asset if they are available over
an extreme one. The scenario in a however an entire study area and combined within
shows four values in a very narrow range. the primary variable (Hengl et al., 2003).
Assuming the spatial structure is similar in Secondary spatial data sources include maps,
both cases, and since the configuration of socioeconomic, and demographic census
the data points is the same, the kriging data, but also data generated by public
variances are identical. However, we have sources (local and regional). This is very
more confidence in the scenario on the valuable and there has been a dramatic
growth in the availability of secondary data

10.5.3. The use of heuristics in
associated with DEMs and satellites (for
sampling optimization
environmental data). Such secondary data is
easily integrated within a GIS framework In second-phase sampling, the set N of
(Haining, 2003). In multi-phase sampling, additional samples will be chosen from
for instance, research has been confined to a set P of candidate sampling locations.
the use of covariates in determining the This set is relatively large in practice, and
locations of initial measurements, whereby hence the number of possible solutions
sample concentration is increased where forbids an exhaustive search for the optimum
covariates exhibit substantial spatial variation (Michalewicz and Fogel, 2000). A total
(Makarovic, 1973). Ideally, secondary vari- enumeration of all potential solutions is
ables should be used to reduce the sampling not possible, because of the combinatorial
effort in areas where their local contribution explosion. (Goldberg, 1989; Grtschel and
in predicting the primary variable is maxi- Lovsz, 1995). The search for an approximate
mum (Delmelle, 2005). If a set of covariates solution for complex problems is conducted
predicts accurately the data value where using a suitable heuristic method H. The
no initial sample has been collected yet, use of a heuristic is necessary to assist
there is little incentive to perform sampling in the identification of an optimal sample
at that location. On the other hand, when set S (or near optimal set S + ) P. The
covariates perform poorly in estimating the heuristic controls a process that intends
primary variable, additional samples may to solve this optimization problem. The
be necessary. The general issue pertains to set S is optimal for the objective func-
quantifying the spatial contribution given by tion J defined in equation (10.22). The
covariates. efficiency of a heuristic depends on its
capacity to give as often as possible a
solution S + close to S (Grtschel and
Lovsz, 1995). In second-phase sampling,
there are two different ways of supplementing
an initial set. Either n points are selected
10.5.2. Weighting the kriging
at one time and added to the initial set
variance appropriately
or one point at a time is added n-times
Some current research has looked at ways to the initial set. The former is defined as
to weight the kriging variance. Intuitively, simultaneous addition and the latter is known
one would like to sample at unvisited as sequential addition and is suboptimal.
locations, far from existing ones. This is Note that a hybrid approach that would
accomplished using the kriging variance as combine both techniques is possible as well.
a sampling criterion. However, the spatial In spatial sampling, limited research has
variability of the primary variable is not been devoted to comparing the benefits and
accounted for. It is recommended to weight drawbacks of these heuristics. The greedy
the kriging variance where the gradient of (or myopic) algorithm has been used by
the primary variable is maximum, because Aspie and Barnes (1990), Christakos and
there is a rapid change at that location in Olea (1992) and Rogerson et al., (2004).
the variable (Delmelle, 2005). It is also Simulated annealing has been applied to
desirable to reduce sampling effort by using spatial sampling problems in Ferri and
information provided by auxiliary variables, Piccioni (1992), Van Groenigen and Stein
when available. (1998), and Pardo-Igzquiza (1998).
10.5.4. Spatio-temporal Bellhouse, D.R. (1977). Optimal designs for sampling

sampling issues in two dimensions. Biometrika, 64: 605611.
Berry, B.J.L. and Baker, A.M. (1968). Geographic
Spatial sampling optimization as discussed sampling. In: Berry B.J.L. and Marble D.F.
in this chapter is based on the assumption (eds), Spatial Analysis: a Reader in Statistical
of stationarity of the variable itself over Geography ; pp. 91100. Englewood Cliffs, N.J.:
time ( no temporal variation). Variables Prentice-Hall.
such as rainfall, temperature, and snowfall Bogaert, P. and Russo, D. (1999). Optimal sampling
vary over time and it is not possible to design for the estimation of the variogram based on
take a second set of samples to improve a least squares approach. Water Resources Research,
the prediction of these variables without 35(4): 12751289.
affecting the stability of the model. Work in Burgess, T.M., Webster, R. and McBratney, A.B. (1981).
this context has been carried out by Lajaunie Optimal interpolation and isarithmic mapping of soil
et al., (1999). properties: IV. Sampling strategy. Journal of Soil
Science, 32: 643659.
Christakos, G. and Olea, R.A. (1992). Sampling design
for spatially distributed hydrogeologic and environ-
mental processes. Advanced Water Resources, 15:
NOTE 219237.
1 Note that a systematic sampling scheme is a Cochran, W.G. (1963). Sampling Techniques. Second
special case of a stratied design in that the strata Edition. New York: Wiley. 413p.
are all squares of equal size. Corsten, L.C.A. and Stein, A. (1994). Nested sampling
for estimating spatial semivariograms compared to
other designs. Applied Stochastic Models and Data
Analysis, 10: 103122.
REFERENCES Cox, L.A. (1999). Adaptive spatial sampling of
contaminated soil. Risk Analysis, 19: 10591069.
Arbia, G. (1994). Selection techniques in sampling Cressie, N. (1991). Statistics for Spatial Data. New York:
spatial units. Quaderni di Statistica e Matematica Wiley. 900p.
Applicata Alle Scienze Economico-Sociali,
16: 8191. Dalenius, T., Hajek, J. and Zubrzycki, S. (1960). On
plane sampling and related geometrical problems.
Armstrong, M. (1994). Is research in mining geostats Proceedings of the Fourth Berkeley Symposium,
as dead as a dodo? In: Dimitrakopoulos R. (ed.). 1: 125150.
Geostatistics for the Next Century, pp. 303312.
Dordrecht: Kluwer Academic Publisher. Dalton, R., Garlick, J., Minshull, R. and Robinson, A.
(1975). Sampling Techniques in Geography. London:
Aubry, P. (2000). Le Traitement des Variables Rgion- Georges Philip and Son Limited. 95p.
alises en Ecologie: Apports de la Gomatique et
de la Gostatistique}. Thse de doctorat. Universit Delmelle, E.M. (2005). Optimization of Second-Phase
Claude Bernard Lyon 1. Spatial Sampling Using Auxiliary Information. Ph.D.
Dissertation, Department of Geography, SUNY at
Aspie, D. and Barnes, R.J. (1990). Inll-sampling design Buffalo.
and the cost of classication errors. Mathematical
Deutsch, C.V. and Journel, A.G. (1997) Gslib:
Geology, 22: 915932.
Geostatistical Software Library and Users
Ayeni, O. (1982). Optimum sampling for digital Guide. 2nd edition, 369p. Oxford University
terrain models: A trend towards automation. Press.
Photogrammetric Engineering and Remote Sensing,
Ferreya, R.A., Apeztegua, H.P., Sereno, R. and
48: 16871694.
Jones, J.W. (2002). Reduction of soil water sampling
Bailey, T.C. and Gatrell, A.C. (1995). Interactive Spatial density using scaled semivariograms and simulated
Data Analysis. Longman. 413p. annealing. Geoderma, 110: 265289.
Ferri, M. and Piccioni, M. (1992). Optimal selection of Makarovic, B. (1973). Progressive sampling for digital
statistical units. Computational Statistics and Data terrain models. ITC Journal, 15: 397416.
Analysis, 13: 4761.
Matrn, B. (1960). Spatial variation. Berlin: Springer-
Goovaerts, P. (1997). Geostatistics for Natural Verlag. 151p.
Resources Evaluation. Oxford University Press. 483p.
McBratney, A.B. and Webster, R. (1981). The design of
Gatrell, A.C. (1979). Autocorrelation in spaces. optimal sampling schemes for local estimation and
Environmental and Planning A, 11: 507516. mapping of regionalized variables: II. Program and
examples. Computers and Geosciences, 7: 331334.
Goldberg, D.E. (1989). Genetic Algorithms in Search,
Optimization, and Machine Learning. Reading, MA: McBratney, A.B., Webster, R. and Burgess, T.M. (1981).
Addison-Wesley. The design of optimal sampling schemes for local
estimation and mapping of regionalized variables:
Grifth, D. (1987). Spatial Autocorrelation: A Primer.
I. Theory and method. Computers and Geosciences,
Washington, DC: AAG.
7: 335365.
Grifth, D. and Amrhein, C. (1997). Multivariate
McBratney, A.B. and Webster, R. (1986). Choosing
Statistical Analysis for Geographers. New Jersey:
functions for semi-variograms of soil properties and
Prentice Hall. 345p.
tting them to sampling estimates. Journal of Soil
Grtschel, M. and Lovsz, L. (1995). Combinatorial Science, 37: 617639.
optimization. In: Graham R.L., Grtschel and
Michalewicz, Z. and Fogel, D. (2000). How to Solve It:
Lovsz (eds), Handbook of Combinatorics, Vol. 2;
Modern Heuristics. Berlin: Springer. 467p.
pp. 15411579. Amsterdam, The Netherlands:
Elsevier. Moran, P.A.P. (1948). The interpretation of statistical
maps. Journal of the Royal Statistical Society, Series
Haining, R.P. (2003). Spatial Data Analysis: Theory and
B, 10: 245251.
Practice. Cambridge University Press. 452p.
Moran, P.A.P. (1950). Notes on continuous phenom-
Hedayat, A.S. and Sinha, B.K. (1991). Design and
ena. Biometrika, 37: 1723.
Inference in Finite Population Sampling. New York:
Wiley. 377p. Muller, W. (1998). Collecting Spatial Data: Opti-
mal Design of Experiments for Random Fields.
Hengl, T., Rossiter, D.G. and Stein, A. (2003).
Heidelberg: Physica-Verlag.
Soil sampling strategies for spatial prediction by
correlation with auxiliary maps. Australian Journal Olea, R.A. (1984). Sampling design optimization
of Soil Research, 41: 14031422. for spatial functions. Mathematical Geology, 16:
369392.
Iachan, R. (1985). Plane sampling. Statistics and
Probability Letters, 3: 151159. Oliver, M.A. and Webster, R. (1986). Combining nested
and linear sampling for determining the scale and
Isaaks, E.H. and Srivastava, R.M. (1989). An Intro-
form of spatial variation of regionalized variables.
duction to Applied Geostatistics. New York: Oxford
Geographical Analysis, 18: 227242.
University Press. 561p.
Overton, W.S. and Stehman, S.V. (1993). Properties
King, L.J. (1969). Statistical Analysis in Geography.
of designs for sampling continuous spatial resources
288p.
from a triangular grid. Communications in Statistics
Lajaunie, C., Wackernagel, H., Thiry, L. and Theory and Methods, 21: 26412660.
Grzebyk, M. (1999). Sampling multiphase noise
Pardo-Igzquiza, E. (1998). Optimal selection of
exposure time series. In: Soares A., Gomez-
number and location of rainfall gauges for areal
Hernandez J. and R. Froidevaux (eds), GeoENV II
rainfall estimation using geostatistics and simulated
Geostatistics for Environmental Applications;
annealing. Journal of Hydrology, 210: 206220.
pp. 101112. Dordrecht: Kluwer Academic
Publishers. Pettitt, A.N. and McBratney, A.B. (1993). Sampling
designs for estimating spatial variance components.
Madow, W.G. (1953). On the theory of systematic
Applied Statistics, 42: 185209.
sampling. III. Comparison of centered and random
start systematic sampling. Annals of Mathematical Quenouille, M.H. (1949). Problems in plane sampling.
Statistics, 24: 101106. Annals of Mathematical Statistics, 20: 355375.
Rogerson, P.A., Delmelle, E.M., Batta, R., Akella, M.R., simulated annealing. Journal of Environmental
Blatt, A. and Wilson, G. (2004). Optimal sampling Quality, 27: 10781086.
design for variables with varying spatial importance.
Van Groenigen, J.W., Siderius, W. and Stein, A.
(1999). Constrained optimisation of soil sampling for
Ripley, B.D. (1981). Spatial statistics. New York: minimization of the kriging variance. Geoderma, 87:
Wiley. 252p. 239259.
Stehman, S.V. and Overton, S.W. (1996). Spatial Van Groenigen, J.W., Pieters, G. and Stein, A.
sampling. In: Arlinghaus, S. (ed.) Practical Handbook (2000). Optimizing spatial sampling for multivariate
of Spatial Statistics; pp. 3164. Boca Raton, FL: CRC contamination in urban areas. Environmetrics, 11:
Press. 227244.
Thompson, S.K. and Seber, G.A.F. (1996). Adaptive Warrick, A.W. and Myers, D.E. (1987). Optimization of
Sampling. New York: Wiley. 288p. sampling locations for variogram calculations. Water
Resources Research, 23: 496500.
Van Groenigen, J.W. (1997). Spatial simulated
annealing for optimizing sampling different Webster, R. and Oliver, M.A. (1993). How large a
optimization criteria compared. In: Soares, A., sample is needed to estimate the regional variogram
Gmez-Hernndez, J. and Froidevaux, R. (eds). adequately. In: Soares, A. (ed.), Geostatistics Tria
GeoENV I Geostatistics for Environmental 92; pp. 155166. Dordrecht: Kluwer Academic
Applications. Dordrecht: Kluwer Academic Publishers.
Publishers.
Whittle, P. (1963). Stochastic processes in several
Van Groenigen, J.W. (2000). The inuence of variogram dimensions. Bulletin of the International Statistical
parameters on optimal sampling schemes for Institute, 40: 974994.
mapping by kriging. Geoderma, 97: 223236.
Zubrzycki, S. (1958). Remarks on random, stratied
Van Groenigen, J.W. and Stein, A. (1998). Constrained and systematic sampling in a plane. Colloquium
optimization of spatial sampling using continuous Mathematicum, 6: 251264.
11
Statistical Inference for
Geographical Processes
Chris Brunsdon
It is often necessary to make informed some form of statistical test appears, and
statements about something that cannot be in the wide range of software packages
observed or verified directly. It is equally use- (spreadsheets, statistical packages and others)
ful to assess how reliable these statements are in which code for carrying out such
likely to be. A great deal of research is based techniques appears.
on the collection of data, both qualitative and However, despite this clear recognition
quantitative in order to make such statements. of the importance of statistical inference,
For this reason, inference in science is a many commercial GIS packages claiming
fundamental topic, and the development of to offer spatial analysis facilities have no
theories of statistical inference should be procedures for this. The reasons for this are
seen as a cornerstone of any field of study complex, but one thing to note is that it
claiming to be based on scientific method. was the chi-squared test, and not statistical
Indeed, the American Association for the inference in general that was cited by the
Advancement of Science (AAAS) listed the AAAS as a key development. Chi-squared
development of the chi-squared test as one of tests are relatively simple computationally,
the twenty key scientific developments of the and make a number of assumptions about
twentieth century.1 the simplicity of the underlying processes
In general, the success of the statistical about which inferences are to be made. In
hypothesis testing methodology is reflected particular, they assume that each observation
in the vast number of publications in which is probabilistically independent, and drawn
from the same distribution. For spatial data The process model. This is a model, with a
this is unlikely to be the case recall number of unknown parameters, describing the
Toblers law stating that nearby things are process that generated the observations. This
likely to be more related than distant things. will take a mathematical form, describing the
In addition, the distributions of observations probability distribution of the observations. The
mathematical model can be very specic, so
may well depend on their geographical
that only a small number of parameters are
location. This violates the drawn from
unknown or quite broad so that for example a
the same distribution assumption. Thus, mathematical function of the general form f (x , y )
although tools of inference are just as is not known.
important for geographical data as for any
other kind of data, there are potential The inferential task. The task that the analyst
problems when borrowing standard statis- wishes to perform having obtained his or her
tical methods and applying them to spatial data. Typical tasks will be testing whether
phenomena. The aim of this chapter is a hypothesis about a given model is true,
to consider some fundamental ideas about estimating the value of a parameter in a given
inference, and then to discuss some of model, or deciding which model out of a set of
candidates is the most appropriate.
the difficulties of applying these ideas on
to spatial processes and hopefully offer The computational approach. Having chosen
a few constructive suggestions. It is also a process model, the inferential framework
important to note that although for some should determine what mathematical procedure
areas a degree of consensus has been reached, is necessary to carry out the inferential task. In
the subject of statistical inference is not many cases, the procedure is the relatively simple
without its controversies see Fotheringham application of a simple formula (for example a
and Brunsdon (2004) for example, and in chi-squared test). However, sometimes it is not.
particular there are unresolved issues in In such cases alternative strategies are needed.
inference applied to geographical data. Sometimes they involve numerical solution of
equations or optimizations. In other cases Monte
Carlo simulation-based approaches are used,
where characteristics of statistical distributions
are determined by simulating variables drawn
from those distributions. The strategy used to
11.1. BASIC CONCEPTS OF carry out the task is what will be termed the
STATISTICAL INFERENCE computational approach here.
To begin it is important to identify and

Probably the most fundamental of these
distinguish between some key concepts of
concepts is the inferential framework. This
statistical inference. These are:
is also the most invariant across different
kinds of statistical applications even if
geographers have special process models
The inferential framework. This is essentially
or computational approaches, or inferential
the model of how inferences are made. Examples
tasks, most of the time they are still appealing
of these are Bayesian inference (Bayes, 1763) and
classical inference. Each model provides a char- to the same fundamental principles when they
acteristic set of general principles underpinning draw inferences from their data. For example,
how some kind of decision related to a model one frequently sees geographers declare
(or set of models) can be made, given a set of parameters in models to be significantly
observations. different from zero, or quote confidence
STATISTICAL INFERENCE FOR GEOGRAPHICAL PROCESSES 209
intervals. When they do so, they are making in general, the particular nature of statis-
use of two key ideas from classical inference2 tical inference when spatial processes are
which may be applied to geographical and considered and the way in which these two
non-geographical problems alike. are related. This provides a broad frame-
The most geographically specific of the work for the chapter. First, a (very) brief
concepts is the process model. As stated overview of the key statistical inferential
earlier, many inferential tests are based on the frameworks will be outlined. Next, spatial
assumption that observations are independent process models and related inferential tasks
of one another in many geographi- will be considered, together with a discussion
cal processes (such as those influencing of how the inferential approaches may be
house prices) this is clearly not the case. applied in this context. Finally, a set of
In some cases, the geographical model is a suggested computational approaches will be
generalization of a simpler aspatial model considered.
perhaps the situation where geography plays
no role is a special case where some
parameter equals zero. In these situations,
one highly intuitive inferential task is to
11.2. AN OVERVIEW OF FORMAL
determine whether this parameter does equal
INFERENTIAL FRAMEWORKS
zero. In other cases, the task is to estimate
the parameters (and find confidence intervals)
The two most commonly encountered
that appear in both spatial and aspatial
inferential frameworks are Classical and
cases of the models (for example regression
Bayesian. Suppose we assume a model M
coefficients). In these cases, the spatial part
with some unobserved parameters , and
of the model is essentially a nuisance, making
some data x. Two kinds of tasks commonly
the inferential task related to another aspect
encountered are:
of the model more difficult.
The previous examples are relatively
simple from a geographical viewpoint, but 1 Given M and x , to infer whether some statement
more sophisticated geographical inferential about is likely to be true.
tasks can be undertaken. In particular, the
tasks above are related to what Openshaw 2 Given M and x , to estimate the value of or
(1984) terms whole-map statistics. That is, some function of , f ().
they consider single parameters (or sets of
parameters) that define the nature of spatial
Although both methods can address both
interaction at all locations, but supply no
types of question, they do so in quite
information about any specific locations. To
different ways.
the geographer, or GIS user, it is often more
important to identify which locations are in
some way different or anomalous. Arguably,
this is a uniquely geographical inferential
11.2.1. Classical inference
task. Although this inferential task can be
approached with standard inferential frame- The classical framework is most commonly
works, some careful thought is required. used, and will be defined first. The classical
Thus, to address the issue of statistical framework generally addresses two kinds of
inference for geographical data one must inferential tasks. The first task is dealt with
consider the nature of statistical inference using the significance test.
Hypothesis testing where s12 and s22 are the respective sample
The statement about mentioned above variances for the two samples. A significance
is termed the null hypothesis. Next a test test is often performed by looking up the
statistic is defined. Of interest here is the critical value of t from a set of tables,
distribution of the test statistic if the null computing the observed t for the two
hypothesis is true. The significance (or samples and comparing this to the critical
p-value) of the test statistic is the probability value. In this case, the t statistic has
of obtaining a value at least as extreme = n1 + n2 2 degrees of freedom.
as the observed value of the test statistic The above outlines the procedure of a
if the null hypothesis is true. When the significance test, one of the two inferential
significance is very low, this suggests that tasks performed using classical inference.
the null hypothesis is unlikely to be true. Of course, such inference is probabilistic
To perform an % significance test one one cannot be certain if we reject the
calculates the value of the test statistic with null hypothesis that it really is untrue.
a significance of this is called the However, we do know what the probability
critical value. Typical values of are 0.05 of incorrectly rejecting the null hypothesis is.
and 0.01. If the observed value is more This kind of error is referred to as the type I
extreme than the critical value, then the null error. Another form of error results when we
hypothesis is rejected. Note that adopting incorrectly accept the null hypothesis this
the above procedure has a probability of is called a type II error. It is generally harder
of rejecting the null hypothesis when it is to compute the probability of committing a
actually true. type II error usually denoted as 1 .
This may seem rather abstract without The relationship between and is given
an example. One commonly used technique in Table 11.1.
based on these principles is the two-sample For the two-sample t-test, the null
t-test. Here = (1 , 2 ) where 1 and 2 hypothesis is 1 = 2 , and the alternatives to
are means of two normally distributed this take the form 1 = 2 , or equivalently
samples having the same variance 2 . The 1 2 = k for k = 0, will depend
null hypothesis here is that 1 = 2 . on the value of k. In general, if k is
Here the test statistic is the well-known large then there is a stronger chance of
t-statistic: obtaining a significant t value, and so a
smaller chance of incorrectly failing to reject
the null hypothesis. also depends on the
x1 x2
t=% (11.1) values of n1 and n2 the sizes of the two
1 1 samples. The larger these quantities are, the
s 2 +
n1 n2 smaller the probability of incorrectly failing
to reject the null hypothesis. Given any
where x1 and x2 are the sample means

from the two samples, n1 and n2 are
the respective sample sizes, and s2 is
defined by: Table 11.1 Relationship between and
Probability Reject null hypothesis
Yes No
(n1 1)s12 + (n2 1)s22 Null hypothesis true 1
s2 = (11.2) 1
n1 + n 2 2 Null hypothesis false
three of k, , n1 and n2 one can compute

11.2.2. Other issues for classical
the fourth (although the computation is not
inference
always simple).
In the section on Hypothesis testing it
was assumed that the quantity could be
easily calculated. In some situations this
Estimating parameters is not the case, because the probability of
The other inferential task is that of estimating the test statistic, although known, cannot
or f ( ). As with hypothesis testing, we be manipulated analytically making
cannot be sure that our estimate of or impossible to compute directly. In such
f () is exact indeed given the fact that situations, a Monte Carlo (Metroplois and
it is estimated from a sample we can Ulam, 1949) approach may be more helpful.
be almost certain that it is not. Thus, in In this approach, a large number of random
classical inference the key method provides numbers are drawn from the probability
upper and lower bounds the so-called distribution of the test statistic that would
confidence interval for or f (). Note apply under the null hypothesis, and the
that this assumes that or f ( ) are scalar observed value of the statistic is compared
quantities. The situation when they are against this list (see Manly (1991) for some
not will be discussed later. A confidence examples). It may be checked that the
interval is a pair of numbers a and b percentage rank of the observed test statistic
computed from the sample data, such that when it is merged with the list of randomly
the probability that the interval (a, b) contains generated test statistics is itself a significance
is 1 . This probability is computed on level. Thus, provided we may generate
the assumption that the model M is known random numbers from the distribution of
in advance, up to the specification of . the test statistic, this provides an alternative
A very important distinguishing characteristic approach to the classical significance test
of this approach is that the probability albeit one with a very different computational
quoted for a confidence interval is not the approach. This approach may also be used to
probability that a random lies within the generate confidence intervals.
deterministic interval (a, b) rather it is Another important observation is that
the other way round is not a random the derivation of the test statistics hinges
variable under classical inference it is a on the model for the distribution of the
fixed but unobservable quantity. It is the observational data being known at least
variables a and b that are the random up to the parameters being estimated.
variables, since they are computed from the Sometimes this is not the case. Attempts
random sample of observations and so to draw inference from data when this
the probability statement is made about the is the case are known as non-parametric
random quantities a and b. statistics (see, for example, Siegel (1957)).
In situations where is not a scalar, one One particular non-parametric approach is
may specify confidence regions from the the so-called permutation test. This is a
data. For example, if is two-dimensional, technique used to test relationships between
we could represent it as a point in the plane. pairs of variables, or more generally data sets
A confidence region is some sub-region of in which the order of the observations is of
the plane determined from the sample data some consequence. For example, if we have
that has a 1 probability of containing the data taken from two samples, say S1 and S2
true . with respective sizes n1 and n2 then we
could write the data as one long list, with distribution generating the data. A price
all of the observations in S1 followed by paid for this is that the computational
those in S2 : overhead is much higher and typically
nonparametric tests are not as powerful as

the simpler parametric equivalents, provided
x1 , x2 , . . ., xn1 , xn1+1 , . . ., xn1+n2 . (11.3) the assumptions underlying the parametric
tests hold. A final point is that there is
a subtle difference between randomization
In this case the ordering of the observations tests and standard classical tests, in that they
is of some consequence in the sense that are conditional on the exact set of observed x
an observation with an index greater than values, i.e., the null hypothesis only considers
n1 must have come from S2 . Now suppose the same values of xi in different orders
we wish to test the hypothesis that both sets unlike, for example, a t-test which considers
of observations come from distributions with a sampling frame that could generate any real
the same mean. Consider the quantity: values of xi .
1 1 1 2
i=n i=n
d= xi xi . (11.4) 11.2.3. Simple classical inference
n1 n2
i=1 i=n1 +1 in action
To illustrate some of the above ideas a simple
Suppose a null hypothesis that S1 and example is given. Here, the data consists
S2 come from the same distribution. Then, of a number of sale prices of houses from
there is no difference between the processes two adjacent districts in the greater London
generating the observations in {x1 , . . . , xn1 } area in 1991. The location of the districts in
and {xn1+1 , . . . , xn1+n2 } so that in fact any the context of greater London as a whole is
ordering of {x1 , . . . , xn2 } is equally likely. shown in Figure 11.1, as are the locations
Then, regardless of the distributions of S1 of the houses in the sample. There are
and S2 we would expect sample mean of d 220 houses in district 1 and 249 in district 2
to be zero. We could use this quantity as a (the district to the west).
test statistic, although we do not know its If we assume that house prices in both
distribution. However, if the null hypothesis districts have independent normal distribu-
were true, we may make use of Monte Carlo tions with equal variances, we may test the
methods. We simply randomly permute the hypothesis that the mean house price is the
ordering of the data set a large number same in each district. This null hypothesis,
of times, and obtain a corresponding set together with the assumptions set out above,
of values of d. We then compare the observed lead to the use of t-test as set out in
value of d against this set, to obtain a equation (11.1). The values of the relevant
value of as before. This in essence is quantities are set out in Table 11.2.
the randomization test. Here, it was shown Since we are interested in detecting
in the context of a test of difference of differences in the mean value of either
means, although it may be used to test any sign, we use the absolute value of t which
kind of statistic dependent on the ordering is 2.37. However, from tables, the critical
of the observations. The advantage of this value of t for (two-tailed) = 0.05 is
approach is that it allows tests to be made 1.96 suggesting we should reject the null
when one has no strong evidence of the hypothesis at the 5% level. Thus, with a
Location of study area Location of houses in sample
Figure 11.1 The location of study area (LHS) and the houses in the samples (RHS).
Table 11.2 Two sample t -test data, we can consider the joint probability
District 1 District 2 density of the two items given model M, say
n1 220 n2 249 f (x, | M). Standard probability theory tells
x1 77.7 x2 86.4 us that:
s1 37.3 s2 41.5
s2 39.6
467 f (x, | M) = f (x | , M) f ( | M)
t 2.37
= f ( | x, M) f (x | M) (11.5)
where f (x |.) and f ( |.) denote marginal dis-

5% chance of making an incorrect statement tributions of x and , respectively. Dropping
if the null hypothesis is true, we reject the the M from the notation as is conventional
null hypothesis and state that there is a because everything is conditional on model
difference in average house price between M applying we may write:
two zones.
f ( | x) = f (x | ) f ( )/f (x). (11.6)
11.2.4. Bayesian Inference

Assuming we have a given observed data
The Bayesian approach views in a very set x, we may regard f (x)1 as a normalizing
different way. Whereas classical inference constant and write:
regarded as a deterministic but unknown
quantity, Bayesian inference regards it as
a random variable. The idea is that the f (x | ) f (x | ) f ( ). (11.7)
probability distribution of represents the
analysts knowledge about so that,
for example, a distribution with very little This is essentially Bayes theorem, and is
variance suggests a great deal of confidence the key to the inferential model here. If we
in knowing the value of . If we accept that regard f () as the analysts knowledge about
is a random quantity, as is x, the observed regardless of x, then multiplying this by the
probability of observing x given theta (that sequence of posteriors derived from well-
is, f (x | ), gives an expression proportional defined priors for example if a sequence
to f ( | x). Note that in this framework, of priors with variances increasing without
f (x | ) is our process model, as set out bound were supplied). A prior such as this is
in section 11.1. We can interpret this last termed an improper prior.
expression as the knowledge the analyst has Having arrived at a posterior distribution
about given the observational data x. Thus, f (x | ) we may begin to address the two key
we have updated knowledge about in the inferential questions:
light of the observations x this is essentially
the inferential step.
(1) Estimate the value of or some function of
In standard Bayesian terminology f ( ) is , f(). Since we have a posterior distribution
referred to as the prior or prior distribution for we can obtain point estimates of using
for and f (x | ) is referred to as the estimates of location for the distribution
posterior or posterior distribution for . such as the mean or median. Alternatively,
Thus, starting out with a prior belief in the we can obtain interval estimates such as
value of , the analyst obtains observational the inter-quartile range derived from this
data x and modifies his or her belief in the distribution. Typically, one would compute
light of these data to obtain the posterior an interval [1 , 2 ] between which has a
distribution. The approach has a number of 0.95 probability of lying. Note that this is
elegant properties for example, if individual subtly different from the condence interval of
classical inference. The 95% in a condence
data items are uncorrelated and if data
interval refers to the probability that the
is collected sequentially, one can use the
randomly sampled data provides a number
posterior obtained from an earlier subset of pair that contains the unobserved, but non-
the data as a prior to be input to a later set of random . Here we treat as a random
data. However, the approach does require a variable distributed according to the posterior
major change in world view. The requirement distribution obtained from equation (11.7). To
of a prior distribution for from an analyst emphasize that these Bayesian intervals differ
could be regarded as removing objectivity from condence intervals, they are referred to
from the study. Where does the knowledge as credibility intervals.
to derive this prior come from?
(2) Infer whether some statement about
One way of overcoming this is the use
is likely to be true. If our statement is
of non-informative priors which represent
of the form a < < b where either a or
no knowledge of the value of prior to
b are innite, then this may be answered
analysis. For example, if were a parameter by computing areas underneath the posterior
between 0 and 1, then f ( ) = 1 a uniform density function. For example, to answer the
distribution would be a non-informative question is positive? one computes:
prior since no value of has a greater prior
probability density than any other. Some-
times this leads to problems for example
if is variable taking any real value. In this f (|x ) d
case, f ( ) = const. is not a well-defined 0
probability density function. However, this
shortcoming is usually ignored provided the and obtains the probability that the statement
posterior probability thus created is valid is true. However, questions of the form
(typically the posterior in this case could be addressed by classical inference such as
regarded as a limiting value of an infinite is zero? where typically one is concerned
with exact values of present more difculties. essentially stems from the fact that this is a
Since the output is a probability density, the scale parameter, rather than one of location
probability attached to any point value is zero. see Lee (1997) for example. In this case, it
There are a number of workarounds to this. can be shown that the posterior distribution
One quite sensible approach is to decide how for the quantity = 1 2 is that of the
far from zero could be for the difference to
expression:
be unimportant, and term this . If this is done,
we may then test the statement < <
using the above approach. Other approaches do 1/2
attempt to tackle the exact value test directly 1 1
(x1 x2 ) s t (11.8)
see Lee (1997) for further discussion. n1 n2
Some nal notes where all variables are as defined in

Note that in the above sections is regarded equation (11.1) except for t, which is a
as a univariate and continuous variable; random variable with a t distribution with
however, the arguments may be extended to degrees of freedom (again is as defined
multivariate and discrete . In the discrete earlier). The posterior distribution for is
case, integrals are replaced by sums and shown in Figure 11.2.
point hypothesis testing is no longer an issue. Here, the hypothesis under test differs from
In the multivariate case, single integrals are that of the classical test. Rather than a simple
replaced by multiple integrals and instead test of whether = 0 which makes little
of simple ranges for credibility intervals, sense given the posterior curve above, we
regions in multidimensional parameter space test whether | | < G where G is defined
may be considered. as some quantity below which a difference
in means would be of little consequence.
This is very different from the standard
classical approach. In that framework, if a
11.2.5. Bayesian inference
test were sufficiently powerful, differences
in action
in mean house prices of pennies could
In this section, we revisit the house price be detected. However, in terms of housing
example, this time applying a Bayesian markets such a difference is of no practical
inferential framework to the problem. importance. For this example we choose G
As before, we assume that house prices are to be 1,000 (UK). If this is the case, the
independently normally distributed in each probability that | | < G corresponds to the
of the two districts. If we regard our list shaded area in Figure 11.2. This is equal
of house prices as x, then = (1 , 2 ) to 0.014 alternatively one could state that
the respective means of the house price the probability that | | exceeds 1,000 is
distribution for districts 1 and 2, and f (x | ) 1 0.014 = 0.986. Thus, from a Bayesian
is just the product of the house price perspective, it seems very likely that there
probability densities for each observed is a non-trivial difference between the mean
price. Here we are interested in the quantity house prices for the two districts. Another
1 2 . In this case we have a non- possibility is to compute the probability that
informative prior in 1 and 2 and also in district 2 has a higher mean than district 1.
log where is the standard deviation of This is just the posterior probability that
house prices in both districts. The choice > 0, which, from the curve is equal to
of the prior for may seem strange, but 0.99 again suggesting this is highly likely.
0.10
0.08
Posterior Density
0.06
0.04
0.02
0.00
20 10 0 10
(1000s Pounds)
Figure 11.2 Posterior distribution for = 1 2 .
11.2.6. Bayesian approaches distribution. In this case, hypotheses about

some closing comments are investigated by generating large numbers
of random values, and investigating their
The Bayesian approach is regarded by some
properties.
as very elegant. Certainly the simplicity
Advances of the kind described above are
of the underpinning equation (11.7) and
not made without a great deal of research
the natural way that hypotheses may be
to answer important operational questions
assessed, and parameters estimated from the
such as:
posterior distribution do have a directness
of appeal. However, there is a sting in
the tail. Equation (7) gives the posterior How accurate are the quadrature results?
distribution up to a constant implying
that the expression for probability distribu- How large should the samples of random
tion can only be obtained by integrating numbers drawn using the Metropolis
its un-normalized form. Herein lies the algorithm be?
problem in many cases the integral is
How can very large-scale simulations be
not analytically tractable. At the time of
computed efciently?
writing, this presents fewer problems than
in the past as numerical quadrature
techniques may be used to estimate the Thus, the computational approach to Bayesian
integrals. Alternatively, techniques based on methods is an issue of great importance.
Monte Carlo simulation and the Metropolis However, in recent years this area has been
algorithm allow random values of to the focus of much research, and this com-
be generated according to the posterior bined with increasing trends in the speed and
capacity of computers have led to an increase variables may be found in Fotheringham

in the popularity of Bayesian methods. et al. (1998). The study area consists of
the four counties Tyne and Wear, Durham,
Cleveland and North Yorkshire, in the north-
east of England. Of particular interest here is
11.3. WHY GEOGRAPHY MATTERS the population density variable. An ordinary
IN STATISTICAL INFERENCE least squares regression model was fitted to
the data, giving a coefficient of 5.6. A t-
In the above section, two of the most common test based on principles of classical inference
approaches to formal statistical inference showed this to be significantly different
were discussed. However, this was done in a from zero. In general, this suggests that an
general sense nothing stated in the previous increase in population density leads to a
section applied exclusively to geographical decrease in LLTI. This is perhaps counter-
data. As hinted in the introduction, working intuitive. Normally one associates higher
with spatial data introduces a few specific morbidity rates with urban areas, which have
problems. higher population densities. However, the
This raises a number of issues: study went on to consider geographically
weighted regression (GWR) (Brunsdon et al.,
1 What happens if one ignores spatial effects? 1996) a technique using an underlying
model in which regression parameters vary
2 Does one need to modify the above ideas of over space. When this was carried out, it
inference when working with spatial data? was found that the regression parameter for
population density was at its most negative
3 If some spatial effects are present, can they be
in areas in the region around the coalfields
represented as spatial patterns or images?
of east Durham. Here, it is likely that LLTI
is linked to employment in the coalfields,
All of these issues lead to important and that most people in such employment
questions but none have unique answers. lived in settlements near to the coalfields,
First, consider issue 1 if there are no where population density is low. However,
serious problems encountered when ignoring those people living in urbanized areas in
spatial effects then there is little that spatial that part of the region are less likely to
analysis can add to the usual suspects list be employed in occupations associated with
of standard statistical methods. However, high LLTI. Thus, in that locality a negative
it is argued here that there are indeed relationship between population density and
serious consequences arising from ignoring LLTI holds. However this is unusual in
such effects. There are many examples general, and in other parts of the study area
of such consequences see, for example, (west Durham, North Yorkshire), there is a
Fotheringham et al. (1998), who follow positive relationship. Here, low population
the work of Rees (1995) in modelling density corresponds to a more typical rural
the relationship between limiting long-term environment, and in these places a more
illness (LLTI) as defined in the 1991 conventional urban/rural trend occurs. The
UK census of population, and a number key point here is that the global model told
of predictor variables: Unemployment rate, only one story, while the spatially-oriented
Crowding, Proportion in Social Class 1, GWR identified two different processes
Population Density, and Proportion of Single occurring in different parts of the study area.
Parent Families. Full definitions of these The moral here is that ignoring geography
can lead to mis-interpretation. This example probability distributions, but are in fact
is a cautionary tale about the consequences correlated. In the geographical context, the
of ignoring spatial effects in an inferential correlation is generally related to proximity
framework. nearby x values are more correlated than
So ignoring geography can lead to infer- values located far apart. Typical examples
ential problems. How can this difficulty be are the SAR (spatial autoregression) and
overcome? In particular this raises another CAR (conditional autoregression) models.
key question Does one need to modify the Unlike GWR, these regression models do
above ideas of inference when working with not assume that the regression parame-
spatial data? To answer this, we return to the ters vary over space however they do
four aspects of statistical inference listed in assume that the dependent variables are
section 11.1 once again: both Bayesian and correlated. Typically here, each record of
classical inferential frameworks can handle variables is associated with a spatial unit,
the key inferential tasks of hypothesis eval- such as a census tract, and the spatial
uation and parameter estimation for spatial dependence occurs between adjacent spatial
processes. However, for spatial data the units. As well as the regression coefficients
process model must allow for geographical and the variance of the error term, CAR
effects. Finally, it is also the case that the and SAR models have an extra parameter
computational approach must also be altered controlling the degree to which adjacent
on some occasions. These two key issues will dependent variables are related. In the
be considered in turn. classical inference case, parameter estimation
is typically based on maximum likelihood,
with the parameter vector containing the
extra parameter described above as well
11.3.1. Process models for
as the usual regression parameters. There
spatial data
is much work on the classical inferential
The process models for spatial data can treatment of such models: see, for example,
differ from more commonly used ones in a Cressie (1991). LeSage (1997) offers a
number of ways. The two most common ones Bayesian perspective.
are that they exhibit spatial non-stationarity
and spatial autocorrelation. Spatial non-
stationarity is essentially the characteristic
11.3.2. The computational
of the LLTI example above. The unknown
approach
parameter is not a constant, but in fact
a function of spatial location. In this case, Computational issues for geographical data
a technique like GWR may be used to are generally complex. The whole field
estimate at a set of given localities. of geocomputation has grown to address
Using this approach, one can apply the this. As well as problems of data storage,
classical inferential framework to obtain data retrieval and data mining, there are
estimates of , and test hypotheses such many computational overheads attributable
as is a global fixed value. A classical to inference in spatial data, for a number
inferential framework for GWR is detailed of reasons. In some cases, the issue is
in Fotheringham et al. (2002). related to Monte Carlo or randomization
The phenomenon of spatial autocorrelation methods this is particularly true of the
occurs when each of the observed x values Monte Carlo Markov Chain approach to
are not drawn from statistically independent Bayesian analysis. In others, it is linked
to developing efficient algorithms to access that proximity to R has no influence on the

large geographical data sets this can be an quantity of interest).
issue in localized methods such as GWR. In This approach fits in well with conven-
each case, it is true than specific algorithms tional theory there is one single hypothesis
may need to be created to handle the to test, and it may be tested as set out
geographical situation. A very good example above. However, on many occasions we
is found in Diggle et al. (1998). Another have no prior knowledge of R, possibly
example of this is shown graphically in even on whether R is a single region or a
figures 11.3 and 11.4. Figure 11.3 shows a number of disjoint areas. On such occasions,
map of crime rates. The intention here is a typical approach would be to carry out
to test whether the spatial autocorrelation a test such as that described above on
(as measured by Morans I) is zero. This every possible region, and map the ones that
is done by randomly permuting the rates have a significant result. This is essentially
to geographical zones. The distribution after the approach of the Geographical Analysis
1000 simulations is shown in figure 11.4 the Machine (GAM; Openshaw, 1987) here
observed value is far greater than any of these the Rs are circular regions of several radii
simulations, suggesting a highly significant centred on grid points covering the study
(p < 0.001) result. area. However, there is a difficulty with
The final question in the earlier list this approach. Suppose we carry out a
also raises some interesting problems. The significance test on each of the Rs. There
formal (Bayesian or classical) approach to could be a large number of tests, possibly
hypothesis testing is essentially founded on hundreds. Even if no clustering were present,
the notion of testing a single hypothesis. the chance of obtaining a false positive is ,
However, many geographers would like the significance level of the test. If = 0.05
answers to more complex hypotheses. In the as is common practice, we would expect to
spatial context, one of the key questions is find N significant results even when no
Is there an unusually high or low value clustering occurs, where N is the number of
of some quantity in region R? Typically regions to be tested. For example, if N = 200
this quantity might be the average price and = 0.05, we would expect to find 10
of a house, or an incidence rate of some significant regions even when in reality no
disease. This phenomenon is often termed clustering occurs. Thus, in an unadjusted
clustering (see the chapter by Jacquez in form, this procedure is very prone to false
this volume for more discussion on this positive findings. Essentially this is a problem
topic). In some situations R is known in of multiple hypothesis testing. Because the
advance for example it may represent test has a positive probability of incorrectly
the catchment area of a particular school in rejecting the null hypothesis, carrying out
the house price example. If it is known in enough tests will give some positive results
advance the approach is relatively simple. even if in reality there are no effects to detect.
One creates a proximity measure to reflect A typical way of tackling the problem is
how close to R each observation is, or to apply the Bonferroni adjustment to the
creates a membership function of R for each significance levels of the test. For example,
observation, and then builds this into a model, this is done by Ord and Getis (1995) for
using a parameter that may vary the influence assessing local autocorrelation statistics.
of this new variable. Then one goes on to The correction is derived by arguing that
test the hypothesis that this parameter is zero to test for clustering, we wish to test that
(or whatever value of the parameter implies none of the regions R have a significant
cluster centred on them. Thus, the probability 11.4. FURTHER ISSUES

of a false positive overall is the probability
that any one of the regions has a false In the previous sections, the two most
positive result. If it is assumed that each test common approaches to formal inference were
is independent, then it can be shown that discussed, and following this, some of the
this probability (which will be called ) is particular issues encountered when applying
given by: these principles to spatial data and spatial
models were discussed. In this final section,
other matters arising will be considered.
= 1 (1 )N . (11.9)
Now, if we wish to develop an overall

11.4.1. Population versus process
test for clustering, with say = 0.05 then
equation (11.9) may be solved for giving Throughout this chapter, the concept of
a significance level for the individual tests inference has been applied to processes.
needed in order to achieve the overall level of However, another view is that one makes
significance. For example, if N = 200 and we inferences about Populations given samples
require = 0.05 then = 0.000256. This is taken from these populations. In many
a fairly typical result. To counter the risk of respects, there are similarities between the
false positives, the individual tests must have two situations. In reality, every population
very low values of . is finite, and therefore the population that
However, one thing of note about the items in a sample are drawn from is discrete.
above approach is that the assumption that Therefore, strictly speaking tests such as the
the tests are independent is often incorrect t-test are inappropriate in this situation, as
for geographical studies. Typically, a large they assume observations are drawn from a
number of regions R are used, and many normal distribution which is continuous.
overlap, sharing part of the sample data However, when the population is very large,
used for the local tests and for this this distribution (or in some cases another
reason the results of these tests cannot be continuous distribution) is a very good
independent. It is usually argued that the approximation of reality for example in
Bonferroni procedure provides conservative the UK a population of around 20 million
tests3 and in the situation where the tests are would represent every household but a
correlated the estimate of in equation (11.9) continuous distribution for household income
is an underestimate. In an attempt to avoid may well be quite close to the real situation of
false positives, we insist on very strong a discrete distribution with around 20 million
evidence of clustering around each of the test values!
regions. Thus, we will be insisting on much Thus, it is argued here that the random
stronger evidence than is actually necessary process of drawing from a normal (or other
to detect clustering and there is some chance continuous) distribution is a very good
that genuine clustering is overlooked. In approximation to drawing from a very
a nutshell this is a typical dilemma when large population and that in many cases,
looking for clusters ignoring multiple hypothesis tests related to the population can
hypothesis testing leads to false positives, be reasonably proxied by hypotheses relating
but overcompensation for this could lead to to such an approximating process. Given this
false negatives. There are no inferential free argument, all of the arguments based on
lunches! the concept of process inference here may
be applied to making inferences about large Note that on some occasions, we may have
populations. the entire population represented in our
For smaller populations, the discrete nature data, but even so it may be of interest to
of the sampling frame may suggest that understand the process(es) that brought about
such continuous approximations are not that data. For example, we look at daily
valid. Here, we are faced with two choices. records of rainfall from a one month period
First, if we assume that this population of the previous century. In this case, the list
really is the item of interest, and it is not of rainfall measurements is our population,
particularly large, then one approach might but the process of generating these can be
be to collect observations for the entire modelled as a random process and we
population. In this case, the conventional may wish to test hypotheses about whether
framework for statistical hypothesis testing average levels are similar to those in the
becomes meaningless to test a hypothesis present day. In this case, we wish to test
relating to the population simply look at the the (process-based) hypothesis that the mean
data and see if it is true or not! daily rainfall is equal to some given level. It is
A second alternative is to assume that the authors opinion that in most cases when
the population itself is of less interest than an entire population may be measured, it is
the process generating it. In this case, we the underlying process and not the values of
return to the process hypothesis framework. the population itself that is of most interest.
Columbus OH:residential burglaries and vehicle

thefts per thousand households, 1980
under 19.02
19.02 29.33
29.33 39.03
39.03 53.16
over 53.16
Figure 11.3 Crime rate distribution: vehicle thefts and residential burglaries per
1000 households (1980).
11.4.3. Software
11.4.2. Other types of inference
No chapter about inference would be com-
Although classical and Bayesian methods are
plete without some discussion of software.
both covered in this chapter, these are not
Having argued that making inferences about
the only possible approaches. For example
data is central to knowledge discovery in
Burnhan and Anderson (1998) outline ways
spatial analysis, one has every right to expect
in which Akaikes An Information Criterion
that software for inferential procedures will
(AIC; Akaike 1973) may be used to compare
be readily available. However, as mentioned
models. This approach is quite different
in the introduction, most readily available
in terms of its inferential task rather
GIS packages do not currently contain code
than testing whether a statement about a
for many of the procedures outlined here.
particular model is true or assuming a
Unfortunately, although several commercial
specific model holds and then attempting
statistics packages do contain code for
to estimate a parameter of that model, this
carrying out general inferential procedures,
approach takes several models and attempts
such as the t-test example discussed earlier
to identify which one is best in the sense
in the chapter, they offer less support for
that it best approximates reality. The AIC
more specific inferential tasks developed for
is an attempt to measure the nearness of
spatial data. Until recently, for a number
the model to reality obviously the true
of spatial inferential tasks one was forced
model is not known, but the observations
to write ones own code. However this
have arisen from that model, and this is
situation is now improving. A number
where the clues about the true model come
of packages that are either dedicated to
from. This is very different from the other
the analysis of spatial data or sufficiently
approaches because it regards all potential
flexible that they may be extended to
models as compromises none is assumed to
provide spatial data analysis now exist.
be perfect and attempts to identify the best
Although by no means the only option, the
compromise. This area may prove fruitful in
statistical programming language R provides
the future for example Fotheringham et al.
good spatial analysis options all of the
(2002) use a method based on this idea to
examples (most notably the spatial one)
calibrate GWR models. The idea of finding
in this chapter were based on calculations
a best approximation also sits comfortably
done in R. There are a number of spatial
with the idea of approximating a large finite
data analysis libraries written in R, enabling
sample with a continuous distribution put
this kind of geostatistical computation. For
forward in the previous section.
example:
Of course, exploratory data analysis can
be thought of as yet another inferential
framework, albeit a less formal one. Although
this can provide a very powerful framework sp provides basic spatial data handling facilities;
for discovering patterns in data, it could
maptools provides map drawing functionality as
be argued that this is an entire subject in
well as the ability to import geographical data
its own right, and that there will be many
in a number of common formats, such as ArcGis
examples elsewhere in this book, where the
shapeles;
production of maps and associated graphics
by various software packages provide excel- spdep provides a number of hypothesis tests
lent examples exhibiting the power and utility and model calibration facilities relating to models
of graphical data exploration. allowing for spatial dependencies; and
Monte Carlo Simulation Results
200
150
= 0.05
Observed Value
Frequency
100
50
0
0.6 0.4 0.2 0.0 0.2 0.4 0.6

Morans I
Figure 11.4 Results of Monte Carlo tests on I .
spgwr provides a number of tools for Geo- 3 Conservative here means that the test has a
graphically Weighted Regression (GWR) analysis. signicance level of 5% or lower.
The package is also Open Source so it REFERENCES

provides an easy entry option for anyone
Akaike, H. (1973). Information theory and an extension
wishing to experiment more with inferential
of the maximum likelihood principle. In: Petrov, B.
approaches for geographical data. and Csaki, F. (eds), Proceedings of the Second
International Symposium on Information Theory,
pp. 267278. Budapest: Akademiai Kiado.
Bayes, T. (1763/1958). Studies in the history of
ACKNOWLEDGEMENT probability and statistics: IX, Thomas Bayes essay
towards solving a problem in the doctrine of
I am grateful to the Nationwide Building chances. Biometrika 45: 296315 (Bayes essay in
Society for providing the house price data modernized notation).
first introduced in section 11.2.3. Brunsdon, C. Fotheringham, A.S. and Charlton, M.
(1996). Geographically weighted regression:
A method for exploring spatial non-stationarity.
Geographical Analysis 28: 281289.
NOTES Burnhan, K.P. and Anderson, D.R. (1998). Model
Selection and Inference: A Practical Information-
1 See http://stat.fsu.edu/brouchure/stat/whystat.htm Theoretic Approach. New York: Springer.
for additional details.
2 These principles are the signicance test, and Cressie, N.A.C. (1991). Statistics for Spatial Data.
the condence interval respectively. New York: John Wiley and Sons.
Diggle, P.J., Tawn, J.A. and Moyeed, R.A. (1998). Metroplois, N. and Ulam, S. (1949). The Monte
Model-based geostatistics. Applied Statistics 47: Carlo method. Journal of the American Statistical
299350. Association 44: 335341.
Fotheringham, A.S., Brunsdon, C. and Charlton, M. Openshaw, S. (1984). The Modiable Areal Unit
(1998). Scale issues and geographically weighted Problem. Norwich: Quantitative Methods Research
regression. In Tate, N. (ed.), Scale Issues and GIS. Group, Royal Geographical Society and Institute of
Chichester: John Wiley and Sons. British Geographers, Concepts and Techniques in
Modern Geography Publication No. 38.
(2002). Geographically Weighted Regression: Openshaw, S. (1987). A mark i geographical analysis
The Analysis of Spatially Varying Relationships. machine for the automated analysis of point
Chichester: John Wiley and Sons. data sets. International Journal of Geographical
Fotheringham, A.S. and Brunsdon, C. (2004). Some Information Systems 1: 335358.
thought on inference in the analysis of spatial data. Ord, J.K. and Getis, A. (1995). Local spatial
International Journal of Geographical Information autocorrelation statistics: Distributional issues
Science 18: 44757. and an application. Geographical Analysis
Lee, P.M. (1997). Bayesian Statistics: An Introduction. 27: 286306.
London: Arnold. Rees, P. (1995). Putting the census on the researchers
LeSage, J. (1997). Bayesian estimation of spatial desk. In Openshaw, S. (ed.), Census Users
autoregressive models. International Regional Handbook, pp. 2782. Cambridge: GeoInformation
Science Review 20: 113129. International.
Manly, B. (1991). Randomization and Monte Carlo Siegel, S. (1957). Nonparametric Methods for the
Methods in Biology. London: Chapman and Hall. Behavioral Sciences. New York: McGraw-Hill.
12
Fuzzy Sets in Spatial Analysis
Vincent B. Robinson
12.1. INTRODUCTION (McBratney and Odeh, 1997) and the then

developing field of geographic information
Shortly after the theory of fuzzy sets was science (Robinove, 1989; Robinson and
introduced by Zadeh (1965) researchers Strahler 1984; Robinson, 1988).
began to argue that fuzzy sets theory For many spatial phenomena there are no
could serve as an appropriate foundation crisp boundaries that can be identified to
for spatial analysis (Gale, 1972). It was differentiate regions or zones. For examples,
argued early on that fuzziness is a major the boundary between beach and fore-
factor contributing to the uncertainty of shore, between woodland and grassland, and
spatial behavior. Thus, achieving exactitude between urban and rural areas may be gradual
in representing, analyzing, and predicting rather than defined by a crisp boundary.
spatial behaviors, over space and through It is well known that when we use remotely
time is difficult, or impossible, to accom- sensed imagery to extract spatial objects of
plish in a fuzzy environment characterized interest, there are pixels that may contain sub-
by ambiguous or incomplete information pixel objects, trans-pixel objects, boundary
and inexact cognitive and decision-making pixels, and/or natural intergrades (Foody,
processes. Although many of the earliest 1999). The mixture of spectral information
works focused on spatial behavior, policy, at the sub-pixel scale can lead to uncertain
and planning (Leung, 1983; Lundberg, 1982; classification and indeterminate boundaries.
Pipkin, 1978), it was not long before its This is not unrelated to the general region
relevance was recognized in other areas classification problem highlighted in Leungs
of spatial analysis such as soil science (1983) evaluation of fuzzy sets in spatial
analysis and planning. In addition, it is often fuzzy sets and their relevant use in spatial
the case that concepts, or parameters, in spa- analysis.
tially explicit models are inherently inexact. This chapter will first briefly review some
These, and other problems of uncertainty, of the more noteworthy accomplishments
have led many to use techniques based on using fuzzy sets in spatial analysis. Then
fuzzy set theory. Ironically, fuzzy sets can it will discuss the issue of assigning fuzzy
be used to help make analyses less fuzzy membership and how it has been approached
because the inexactness is managed explicitly for use in spatial analysis. Finally, it will
rather than implicitly. However, like other briefly discuss some issues and challenges of
efforts at formalization, it can help lay bare using fuzzy sets in spatial analysis.
assumptions and force us to be explicit about
their meaning.
The basic idea underlying fuzzy set theory
is that an element can be classified as 12.2. FUZZY SETS AND SPATIAL
being a member of more than one set ANALYSIS: SOME
and to varying degrees hold membership in ACCOMPLISHMENTS
each class. In the usual Boolean, or crisp,
set theory, membership of an element x Spatial analysis is a broad field not rele-
in a set A, is defined by a characteristic gated to just social science, ecology, soil
function that indexes the degree to which science, geography, or engineering. Fuzzy
the object in question is in the set. It set theory has been specifically noted as
should be noted that it is customary, but being a more natural way of representing and
not strictly necessary, for the index to range analyzing phenomena in such diverse areas as
from 0, for full non-membership, to 1.0 social science (Ragin and Pennings, 2005),
for full membership. Hence, the member- soil science (McBratney and Odeh, 1997),
ship function is the fundamental element ecology (Schaefer and Willson, 2002) as well
necessary to use fuzzy sets. A membership as geographical analysis and engineering.
function measures the fractional truth value Thus, there are many areas in which it
a statement such as Object Y is a member has been shown to be of value in spatial
of set S. analysis. Some of the more recent and
A variety of other works contain in-depth important accomplishments are noted in this
explanations of the relevant fundamental section.
concepts of fuzzy set theory. Studies such as Fundamental to some types of spatial anal-
that by Klir et al. (1997), Buckley and Eslami ysis is the generation of a surface from data
(2002), and Zimmerman (2001) cover many that generally contains some level of uncer-
aspects of fuzzy sets in considerable depth tainty. These data are generally represented
with applications as examples. More relevant as a set of points. Not only may there be
to those interested in spatial analysis is the uncertainty regarding the data measurement,
geographic information systems (GIS) text- but there may be duplicate data records
book by Burrough and McDonnell (1998). in the spatial database that may confound
Other informative perspective pieces have an analysis. Torres et al. (2004) present
appeared in the social sciences (Verkuilen, an asymptotically optimal algorithm for
2005), soil science (McBratney and Odeh, eliminating duplicates that incorporates the
1997) and GIS (Robinson, 2003). Hence handling of fuzzy uncertainty. The generation
there are many sources that one can turn of surfaces from point data entails some form
to for background on the fundamentals of of interpolation. In this regard there have
FUZZY SETS IN SPATIAL ANALYSIS 227
been several promising approaches presented Often spatial decision making is repre-
for the interpolation of spatial surfaces from sented using decision tables (DT). However,
point data (Anile et al., 2003; Gedeon et al., the problem of strict, crisp boundaries is
2003; Lodwick and Santos, 2003). viewed as a significant problem in the use
Sampling of spatial data is fundamental of DTs for locational decision making and
to spatial analysis. Fuzzy set theory has spatial analysis. Witlox and Derudder (2005)
been used relatively rarely in this regard. have demonstrated how fuzzy decision tables
It has been shown how a combination of can be formulated and used effectively. They
fuzzy clustering and regionalized variables show that it is possible to explicate the
can be used to estimate the optimal spacing imprecision involved in the decision-making
of sample collection sites for soils map- process using FDTs. However, like DTs,
ping (Odeh et al., 1990). Developments when the number of conditions becomes large
in mapping systems that integrate mobile then knowledge-based techniques may be
computing, GIS, and one or more sensors to more effective and manageable.
take physical measurements (Arvanitis et al., The use of fuzzy sets in spatial analysis
2000) suggest that it is becoming realistic has been shown to improve the accuracy of
to think in terms of spatial data collection representing spatial phenomena in a variety
agents that use fuzzy logic in an adap- of domains. Often times this improvement
tive spatial sampling strategy. Simulation is also coincident with a reduction in cost.
results of a prototypical system for adaptive Using a fuzzy similarity approach, Hwang
sampling along a transect suggest that the and Thill (2005) found that the rate of
fuzzy adaptive sampler usually produced success of a typically used georeferencing
better results and on average required fewer procedure went from 86% up to 94% of all
sampling locations (Graniero and Robinson, fatal accidents. This may not sound like a
2003). great many instances, but in a mission-critical
When using spatial analysis in support of application such as locating fatal accidents,
spatial decision making, it is sometimes noted this represents a significant gain in accuracy.
that the results of using a crisp, or nonfuzzy, In a different domain, the soilland inference
approach to provide an information space model (SoLIM) based on fuzzy set theory
for making decisions virtually guarantees (Zhu et al., 2001; Zhu, 1997) has been
that an analysis will ignore potentially estimated to have increased the accuracy
useful information (Morris and Jankowski, of spatially explicit soils data by as much
2005; Oberthur et al., 2000; Yanar and as 20% at a third of the cost of tradition
Akyurek, 2006). Thus, a nonfuzzy, or crisp, techniques (Zhu, 2004). A similar result of
approach may have the effect of hiding lower cost and higher accuracy when using
important spatially explicit information from a fuzzy logic-based methodology has been
decision makers, hence increasing the risk suggested by work on ecological landscape
of incurring additional costs by forgoing mapping (MacMillan et al., 2003; MacMillan
an opportunity because it was not known et al., 2000).
to the decision maker. This feature of For spatial analysis, map comparison is
fuzzy versus crisp approaches has been useful for purposes of studying dynamic
noted in studies as varied as landfill site processes such as land cover change, compar-
selection (Charnpratheep et al., 1997), real ing simulation model results with empirical
estate evaluation (Zeng and Zhou, 2001), data, map creation/revision and translating
and soil erosion potential (Ahamed et al., between maps using different semantics.
2000). Translating between map products from
differing sources with differing semantics of linguistic assessments that are often part
is a problem when assembling spatial of the qualitative criteria.
data for analysis. To address this prob- It is not uncommon for fuzzy set theory
lem, Ahlqvist (2005) used rough fuzzy applications to be incorporated in a compo-
sets to analyze the semantic similarity of nent of a larger decision support system. For
map products having differing classification example, in the DISCUSS system of spatially
semantics. Fritz and Lee (2005) used a disaggregating costbenefit analyses fuzzy
fuzzy logic based methodology to compare logic is used in only one component. The
two land cover datasets and found the fuzzy spatial disaggregation method uses
fuzzy agreement approach superior to the standard membership curves operating on
nonfuzzy approach in identifying areas of spatial variables. If the initial method of
severe disagreement. Using a hierarchical spatial disaggregation is not accepted, then
fuzzy pattern matching technique, Power a fuzzy disaggregation method is used that is
et al. (2001) were able to convincingly based on membership functions on distance
demonstrate the superiority of the use of variables and fuzzy addition. Using fuzzy
fuzzy logic to address the problems of map sets this work has shown how cost benefit
comparison. They noted the deficiencies of analyses can be spatially disaggregated,
using a map comparison statistic such as something that has rarely been accomplished
the Kappa measure that relies on crisp, in the past (Paez et al., 2006).
nonfuzzy categories. This has subsequently In some cases, when compared with more
been addressed by Hagen and others in traditional methods that are statistics-based,
their development of the K-fuzzy (or fuzzy fuzzy techniques have provided superior
kappa) (Hagen-Zanker et al., 2005) that results. For example, when they replaced
takes into consideration the fuzziness of their principal components model with a
both location and attribute quality (Hagen, fuzzy set analysis. Taylor and Derudder
2003). This is one of the more promis- (2004) noted that the fuzzy-based analysis
ing approaches to comparing spatial fields provided an exceptionally clear picture of
(Wealands et al., 2005). regional and hierarchical tendencies among
A variety of multi-criteria decision making world cities. In a similar vein, Katz et al.
efforts have used fuzzy techniques to address (2005) concluded that regression analysis
spatially explicit problems. A methodology did not provide meaningful results while
for assessing land for allocation to restoration fuzzy set analysis did provide meaningful
projects demonstrated that the additional results. In quite a different domain, Kuo
information afforded by fuzzy classification et al. (2003) incorporated a fuzzy analytical
can be of significance in avoiding misallo- hierarchical process (AHP) to support the
cations that would result in unnecessary cost locational decision for convenience stores.
(Guneralp et al., 2003). Similar conclusions Since the mean standard error (MSE) for
could be drawn from an earlier study on the fuzzy AHP was 0.0173 as opposed
allocation of land for industrial use. Jiang and to 0.091 for the regression model, Kuo
Eastman (2000) showed that results can vary et al. (2003) concluded that fuzzy AHP
significantly as a function of the method of plus artificial neural network (ANN) decision
aggregation. In their review of fuzzy-based support system provided more accurate
approaches Kahraman et al. (2003) suggest results than did a regression model. In
that nonfuzzy, conventional approaches to geographical soil science, Oberthur et al.
the facility location problem tend to be less (2000) showed that nonfuzzy approaches
effective in dealing with the imprecise nature severely misclassified land while fuzzy
approaches were much more successful. In self-evident. Therefore, an important step

particular, Boolean classification allocated towards automating the analysis of such
nearly 2,000 hectares of land to the group spatial data is represented by Skubic et al.
with low potential for plant recovery than (2004) who use force histograms and fuzzy
did the fuzzy approach. Hence, leading to rules to generate a linguistic description
a result where less land was erroneusly from hand-drawn sketch maps. The use of
shown to have a high potential for plant force histograms combined with fuzzy set
recovery. In another physical science domain, theory has been shown to be able to extract
Fisher et al.s (2005) multi-scale fuzzy-based directional as well as topological information
analysis provided significant insights over about spatial objects with relative ease
more conventional, nonfuzzy, analysis of (Matsakis and Nikitenko, 2005).
the accumulation and erosion of material as In addition to the forgoing applications
reflected in elevation changes. of fuzzy sets to spatial analysis, there have
In addition to Taylor and Derudders been reformulations of nonfuzzy techniques
(2004) application of fuzzy clustering to for spatial analysis. One of the early refor-
world cities, Heikkila et al. (2003) used mulations was that of fuzzy kriging which
a two-pass Bayes classification method to can be used for analysis or interpolation
assign membership values to urban objects of spatial data (Bardossy et al., 1989). The
that are ultimately used in the context of formulation of fuzzy kappa (Hagen-Zanker
Koskos (1992) fuzzy hypercube. They claim et al., 2005) has already been mentioned and
that the main contribution of their fuzzy is an important step towards using fuzzy tech-
urban set formulation is the introduction niques for analyzing fuzzy spatial data. One
of a unifying conceptual framework for of the most widely used reformulations, or
measures of urbanization. This is made extensions, is the fuzzy c-means (also known
possible by the representation of an entire as k-means) algorithm that is a fuzzification
study area as a single point within a fuzzy of the nonfuzzy c-means clustering algorithm
hypercube. In a fuzzy hypercube each axis (Bezdek et al., 1984;Wilson and Burrough,
corresponds to the membership of a particular 1999).
fuzzy set. Hence, each axis is defined on As simulation models have become more
the interval [0, 1]. A fuzzy system with commonly used to address spatially explicit
n sets would generate a fuzzy hypercube of problems, fuzzy sets has been used in a
dimension [0, 1]n .Using three dichotomies number of ways to address the uncertainties
with fuzzy set interpretations, they exploited inherent at various levels of such models. Not
the geometric interpretation of fuzzy sets only is there uncertainty in the spatial data
afforded by the fuzzy hypercube to show that may affect the model, but uncertainty
how study area could be located within surrounding the precise value of a parameter
a three-dimensional hypercube. Aggregate can affect the outcome of the modeling
measures were used to calculate the degree exercise. Wu (1998) and Bone et al. (2006)
of membership a study area has in each of are examples of using fuzzy sets in cellular
the three aggregate measures. These are used automata for urban and ecological modeling
to locate the study area as fuzzy set in the respectively. Bossomaier et al. (2005) and
hypercube. Robinson and Graniero (2005) use fuzzy
In some spatially explicit applications sets in individual-based modeling of housing
hand-drawn sketch maps are used as a transactions and animal dispersal movements
means of collecting spatial data. However, respectively. Not only have fuzzy sets
the inherent uncertainty of such maps is improved the ability of simulation models
to formally accommodate uncertainty, it has to fall between 0.0 and 1.0, they are
also been shown to enable model self- not probabilities. Probabilities and fuzzy
evaluation thus avoiding semantic errors in membership values measure very different
complex process models that unknowingly things. For example, one of the most impor-
compromise the integrity of an analysis tant properties of probability is additivity.
(Mackay and Robinson, 2000). There is no such inherent restriction on
With the advent of GIS the representation fuzzy memberships. In fact, the sum of
and query of uncertain spatial to support membership values is interpreted as fuzzy
spatial analysis has developed substantially. cardinality (i.e., the size of a fuzzy set).
Cross and Firat (2000) discuss the issues Second, membership values are not a simple
involved in construction of fuzzy spatial quantitative variable of the interval level. It
objects with specific reference to GIS. Morris is because the end points (i.e., 0 and 1) have
(2003) describes a fuzzy object-oriented more meaning than just being artifacts of
framework to model spatial objects with the membership function. Verkuilen (2005)
uncertain boundaries. Another object-based suggests it is really a generalization of the
effort is that of Bordogna et al. (2006) who case of dichotomous dummy variables that
developed a fuzzy object-based data model are often used to represent ordinary crisp sets.
as a tool for supporting spatial analysis. It In the spatial analytic and GIS literature
is based on the management of a linguistic it is common to refer to either the Semantic
granule. Import (SI) or Similarity Relation (SR) model
Verstraete et al. (2005) presented detailed (Burrough and McDonnell, 1998; Robinson,
techniques for modeling fuzzy spatial infor- 1988). However, it may be more useful
mation represented as triangular irregular to consider that fuzzy memberships are
networks (TINs) and raster (grid) layers. usually a function of a direct assignment
They show how processing, as well repre- (DA), indirect assignment (ID), or an assign-
sentation, can be carried out using fuzzy ment by transformation (AT) methodology
set theory to represent the uncertainty in (Verkuilen, 2005).
spatial data. One of the significant aspects
of this work is its presentation of the
use of type-2 fuzzy sets. In other words,
12.3.1. Direct assignment
it detailed how to formally represent and
process uncertainty not just about the spatial Studies where membership functions are pro-
data, but also uncertainty about the fuzzy vided directly by an expert is characteristic
membership functions themselves. of the direct assignment (DA) method. It is
also common in the DA method of assign-
ment to make use of standard membership
functions such as the triangular, trapezoidal,
12.3. ASSIGNING FUZZY bell, and others (Robinson, 2003). For exam-
MEMBERSHIPS ple, in consultation with experts, DeGenst
et al. (2001) made use of a standard curve
Crucial to any spatial analysis using fuzzy to describe a basic spatial relation in their
sets is the assignment of membership. With study of squirrel dispersal. Often, as in
regard to fuzzy membership, it is important Braimoh et al. (2004), and Zeng and Zhou
to realize that fuzzy memberships have (2001), the choice of membership function
special characteristics. First, although fuzzy is based on the literature, common-sense,
membership values are typically normalized and/or expert opinion. Sometimes these
membership functions are available as part crisp answers the system generates a fuzzy
of a geographic information system so that representation of a spatial concept (Robinson,
experts can specify them directly in an 2000). This approach may be useful to gen-
automated geospatial environment (Yanar eralize for obtaining fuzzy representations
and Akyurek, 2006). Neverthless, they are of individual concepts; it is not suitable for
still directly assigned by an expert. It should use in studies where more complex expert
be noted that this approach is sometimes knowledge representations are required.
criticized because of these deficiencies: One of the reasons indirect assignment
is less often used is the difficulty of the
knowledge elicitation process. Zhu (1999)
1 Interpretation is difcult because rarely is there
used personal construct theory to formu-
anything tangible underlying the number.
late a rigorous methodology for eliciting
expert knowledge about soils. Part of the
2 It may be too difcult for the expert(s) to do
process included the expert interacting with
reliably, especially if they are not well-versed in
a graphical user interface (GUI) to assist
fuzzy set theory.
in formalizing the relations. The result of
this intensive knowledge elicitation process
3 Can be biased. In particular, subjects may
was used to populate a fuzzy soil similarity
systematically be biased towards the end points
model. This is one of the rare studies in
(Thole et al., 1979).
the geographical literature where knowledge
consistency and validation were explicitly
4 Difculty in combining assignments from multiple
incorporated into the knowledge elicitation
experts. This is especially difcult when the
process. Although the process is rigorous
assignments are at extreme variance from one
and thorough, the interviews with the expert
another (Verkuilen, 2005).
that are essential to the process can be very
tense and often frustrating for the expert
Despite these deficiencies, direct assign- as well as the knowledge engineer. Hence,
ment remains a commonly used strategy for it can be difficult to secure an experts
assigning membership values. cooperation. This is perhaps why there are so
few studies in the spatial analytic literature
where a rigorous indirect assignment process
is followed.
12.3.2. Indirect assignment
Paired comparisons have been used in
Indirect assignment elicits responses of some conjunction with fuzzy sets and spatial
kind from experts and applies a model analysis, but generally not in the construction
to the judgments to generate membership of membership functions/values themselves.
values. There have been a number of For example, Charnpratheep et al. (1997)
approaches used to formalize the process used paired-comparison analytic hierarchy
of generating fuzzy set memberships from process (AHP) methodology to arrive at
expert knowledge. One of the simpler weights that were subsequently used in a
approaches showed how an intelligent, inter- convex combination model of fuzzy aggre-
active question/answer system could be used gation. However, their membership functions
to generate fuzzy representations of a spatial were by direct assignment. In another
relation such as near. In this approach the instance, Kuo et al. (2003) used a fuzzy
expert need only provide a yes/no answer to a AHP methodology that made use of a ques-
question posed by the software. From those tionnaire to acquire data on store location
decisions from 16 business experts. The wild lands. Although, respondents detailed
results of this questionnaire exercise provided what distances represented concepts like
enough information to estimate the weight near, the use of a default triangular
assigned to each factor (e.g., the competition membership function meant that the actual
dimension received the highest single weight membership function was not obtained
of 0.1922). They show that weights provided directly from respondent data. Neverth-
by fuzzy AHP can be applied as criteria for less, it does represent a more formalized
selecting important factors to subsequently be approach for proceeding from questionnaire
used in an artificial neural network location responses to construction of a fuzzy set or
analysis. These works are suggestive of a rule base.
linkage to discrete choice modeling (e.g.,
Fotheringham, 1988; Train, 2003). Some
work in the transportation field has explored
12.3.3. Assignment by
the use of fuzzy sets in modeling route
transformation
choice (Vythoulkas and Koutsopoulos, 2003).
Since preferences play an important role In this approach a numerical variable is
in discrete choice modeling, Ridwan (2004) taken and mapped into membership val-
introduced a model of route choice based ues by some transformation. There are
on fuzzy preference relations. The elements many different approaches that assign
the fuzzy relations were specified as fuzzy fuzzy membership using some version of
pairwise comparisons between alternative assignment by transformation. In this section
routes. Since the use of logit models are many of the approaches used to address
commonly used to estimate the probability problems in spatial analysis are briefly
of alternatives being chosen, it is interesting discussed.
to note that Henn (2000) presents a fuzzy Among the more typical approaches to
formulation that suggests the logit model assignment is the use of a fuzzy clustering
is a special case of his fuzzy based model algorithm. Perhaps the most commonly
when the similarity measure has a given used method across the spatial sciences for
shape. assigning membership is based on the fuzzy
Questionnaires have been reportedly used c-means algorithm originally developed by
in some studies as an instrument for Dunn (1973) later generalized by Bezdek
constructing fuzzy memberships. Although (1974, 1981). It is also known as the
details are not given, Lin et al. (2006) fuzzy k-means (FKM) or fuzzy ISODATA
describe a process using results of a ques- algorithm. It is derived to minimize an objec-
tionnaire survey to construct a fuzzy rule tive function with respect to the membership
base. They were able then to make some functions and centroids of c clusters. Hence
tentative statements about changes in activity it is useful for clustering multivariate data
centers in relation to a subway line. Simi- into a finite number of fuzzy sets (Brown,
larly, Fritz et al. (2000) used a web-based 1998; Cheng et al., 2002; Irvin et al.,
questionnaire where distances specified by 1997; McBratney and Odeh, 1997; Stefanakis
respondents were used to construct fuzzy et al., 1999). In spatial analytic studies
sets for defining the concepts near, medium each spatial object would be classified as a
and far for visible features and close and member of all classes but to varying degrees.
far away for nonvisible features. They then Although used in numerous studies since
detailed a methodology that combined the the algorithm was published (Bezdek et al.,
resulting fuzzy rules to aid in mapping of 1984), it continues to figure prominently in
applications in physical geography (Bragato, characterized by the application of artificial

2004; Burrough et al., 2001; Scull et al., neural network (ANN) methods adjusted
2003). In addition, it has been used to so that the output is a fuzzy membership
address spatially explicit problems in fields value. Note that this differs from the use
as diverse as wildlife ecology (Schaefer of ANN by Kuo et al. (2003) in that they
et al., 2001), marketing (Wanek, 2003) used fuzzy AHP to develop the weights that
and urban geography (Taylor and Derudder, were used by ANN to produce nonfuzzy
2004). results. Here ANN is considered as a method
Since the objective function does not that is used to directly produce a fuzzy
take into consideration spatial dependence classification (Foody and Boyd, 1999) or to
between observations, noisy spatial data extract a fuzzy rule base from data (Zheng
can adversely affect the performance of the and Kainz, 1999). In either case, ANNs are
algorithm. Few attempts to incorporate spa- composed of a set of simple processing units,
tial information in an FCM algorithm have or nodes, that are interconnected by some
been published outside the image analysis predefined architecture which can be trained.
community. Liew et al. (2000) presented The processing nodes are generally arranged
a modification of the FCM whereby the in a layered architecture where the first layer
normed distance computed at each pixel is the input, or fuzzification, layer where there
within an image was replace with the is one node per input channel (i.e., input
weighted sum of distances from within a variable). The second, or implication, layer(s)
neighborhood of the pixel. Pham (2001) is comprised of a number of processing
followed with a more general solution units. These processing nodes do most of
that uses penalty functions to constrain the thinking of the ANN. The third layer
the membership value of a class to be is the output, or defuzzification, layer. In
negatively correlated with the membership general, there is one output node associated
values of the other classes at neighboring with each class to be output. Each node
pixels. Both approaches produced promising in a layer is connected to every node
results. It remains to be seen if, or when, in an adjacent layer by a weighted link.
these adaptations of FCM will develop The weights are typically set randomly and
and be applied outside the image analysis iteratively adjusted during a training phase
community. during which the ANN attempts to generate
Another problem is that the number of a model capable of correctly assigning class
classes needs to be specified a priori. In membership.
their extension of the FCM to the spatio- Related to ANN is the adaptive neuro-
temporal domain, Liu and George (2005) fuzzy inference system (ANFIS) (Jang,
address the number of clusters problem using 1993). Using a given input/output data set
the XieBeni validity index to develop a the objective is to construct a fuzzy inference
stopping condition. Given a starting number system whose membership functions best
of clusters, their technique will successively suit the data set. Using a back-propagation
merge clusters until a stopping condition algorithm or a least-squares method, the
based on the XieBeni validity index is met. membership parameters are tuned in a
They illustrate its use on spatio-temporal training exercise similar to ANN (The
meteorological data where it is able to detect Math Works Inc., 2002). ANFIS has been
interesting climatic phenomena. used for map revision (Teng and Fairbairn,
Another approach that has been used to 2002) and land cover classification (Peschel,
map from data to a fuzzy membership is 2002).
These neural network approaches have Carlo sampling. Then each simulation result
advantages and disadvantages. The advan- is evaluated by regressing simulated evap-
tages include an ability to learn from training orative fraction from RHESSys and surface
data and they can handle noisy, incomplete temperature from thermal remote sensing
data. Once trained, an ANN can respond to data. For each regression, the coefficient
a new set of data instantly. However, they of determination (R2 ) is calculated and
can take a long time to train, especially used as a fuzzy measure of the goodness-
since training is still largely by trial and of-fit for its respective simulation result.
error complicated by the fact that incomplete Hence the fuzzy set is composed of the
training data can cause the network to provide set of R2 measures for all simulations, to
incorrect results. Perhaps the most important which an information-theoretic tool based on
disadvantage is that it is difficult to explain ordered possibility distributions is applied
the specific reasoning leading to the output to form a restricted set in which only
product. Hence it can be a kind of black-box good simulations retained. A restricted
approach. set is used as an ensemble solution in
The presence, or absence, of an asso- the second stage of parameter estimation.
ciation, interaction or interconnectedness Note that a separate ensemble solution is
between elements of two or more sets is produced for each hillslope (Mackay et al.,
represented by a crisp relation. Rather than 2003).
presence/absence of association, degrees of
association can be represented by member-
ship grades in a fuzzy relation in much the
same way as degrees of set membership are 12.4. COMBINING MEMBERSHIPS
represented in a fuzzy set. Thus, the classical
notion of relation can be generalized into A common requirement of fuzzy spatial
matter of degree as a fuzzy relation. Fuzzy analysis is the combination of several fuzzy
relations have been used to formally rep- sets in a desirable manner to produce
resent fuzzy regions and their relationships a single fuzzy set (Klir et al., 1997).
(Zhan and Lin, 2003). In addition, Kahraman This combination is often accomplished
et al. (2003) present an example of using using aggregation operators. In fuzzy set
fuzzy relations in a model of group decision theory there are many aggregation opera-
making for the facility location selection tors from which to choose with the most
problem. common being the min (intersection) and
Statistical data analysis has been suggested max (union) operators (Robinson, 2003).
as another way to choose fuzzy membership The choice of operator depends on the
functions and form fuzzy rules (Hanna nature of the underlying decision model.
et al., 2002). However, it has not been For example, in their fuzzy-base cellular
used widely in spatial analysis. An example automata model of insect infestation, Bone
of its application to a spatially explicit et al. (2006) used a compensatory operator
problem is illustrated by the problem of rather than the noncompensatory operators
estimating parameters to use in a regional (max or min) because the compensatory
ecohydrological simulation model. Mackay aggregation operator allows for the influence
et al. (2003) use a two stage methodology of each set to contribute to the final
where in the first stage many simulations are result.
run in which parameters affecting stomatal The basic aggregation operators have
conductance are assigned values using Monte been further developed using various
weighting schemes. Both the convex 12.5. CHALLENGES AND RESEARCH

combination and a modified ordered ISSUES
weighted averaging operator (OWA) have
been used in various studies (Charnpratheep Although this chapter has detailed a varied set
et al., 1997; Oberthur et al., 2000; Zeng and of accomplishments using fuzzy sets in fields
Zhou, 2001). However, use of the weighted allied with spatial analysis, there remain
aggregation models has highlighted the many significant challenges and research
subjectivity inherent in the weighting issues. All the areas of accomplishment noted
scheme, hence care should be taken when above remain open to further studies to refine
formulating the weighting scheme as small research issues in their work.
differences in subjective weights can lead Even though there has been substantial
to large variations in the results (Jiang and research on the representation and processing
Eastman, 2000). of spatial data (Bordogna et al., 2006;
Another common method is to use fuzzy Verstraete et al., 2005), especially in a GIS
rules that have the general structure of context, the specification of fuzzy member-
the form: ship remains a challenge. Although a variety
of methods for assigning fuzzy memberships
have been presented, there remains much
to be done in formalizing the process. In
IF(antecedent)THEN(consequent).
the field of spatial analysis, there is still a
need for the development of methodologies
for the acquisition of fuzzy memberships
To evaluate the rule base and arrive at from experts. Perhaps an even more pressing
an answer requires the application of an challenge is that posed by formalizing the
inference, or implication, method. One of acquisition of fuzzy memberships directly
the most common inference methods is from spatial data in such a way as to
known as Mamdani-type inference. Whether have meaning in the context of the problem
named, or not, it is often the one used for domain.
spatial analytic studies because it supports Another chapter of this book deals with
outputs as fuzzy sets. The use of fuzzy rule the topic of spatial autocorrelation. In spatial
bases in spatial analysis include applications analysis spatial autocorrelation is usually
for the conflation of vector maps (Cobb presented as a special topic in the statistical
et al., 1998), real estate evaluation (Zeng analysis of spatial data. With a few excep-
and Zhou, 2001), land fill location (Charn- tions, it is a topic that has for the most
pratheep et al., 1997), and prediction of part been neglected in many applications of
weed infestation (Chiou and Yu, 2001). The fuzzy sets to spatial analysis. For example, in
alternative TakagiSugeno type inference image analysis it is known to affect the fuzzy
model tends to produce a single, crisp value clustering results, hence its incorporation in
as output rather than a fuzzy set. This is some fuzzy clustering algorithms. However,
why many applications have avoided its there are few, if any, investigations of
use (e.g., Power et al., 2001). However, the degree to which spatial autocorrelation
such a characteristic may be useful for affects fuzzy-based results in areas such
spatial interpolation purposes. For example, a as soil science, geodemographics, landscape
TakagiSugeno rule base has been used in the ecology, etc.
spatial interpolation of solar radiation (Botia An issue that has not been fully developed
et al., 2001). in spatial analysis is the linkage between
fuzzy sets and mainstream spatial analysis. own. This implies that they are competing
There are examples of the fuzzification of paradigms when they may be more properly
mainstream methods such as in the case of viewed as complementary paradigms of
kriging and the kappa statistic. However, analysis.
other aspects of fuzzy statistics have yet to
be explored in depth for the analysis of spa-
tial data. For example, although regression
analysis is often used in spatial analyses REFERENCES
the use of fuzzy regression techniques is
Ahamed, T.R.N., Rao, G.K. and Murthy, J.S.R. (2000).
virtually unheard of in spatial analysis
Fuzzy class membership approach to soil erosion
even though fuzzy regression techniques modelling. Agricultural Systems, 63: 97110.
address the case where the relations of the
Ahlqvist, O. (2005). Using uncertain conceptual
variables are subject to fuzziness or where
spaces to translate between land cover categories.
the variables themselves are fuzzy (Taheri, International Journal of Geographical Information
2003). Science, 19: 831857.
Many efforts in spatial analysis are
Ahn, C.-W., Baumgardner M.F. and Biehl L.L. (1999).
concerned with the testing of hypotheses. Delineation of soil variability using geostatistics and
Mainstream methods rely upon classical fuzzy clustering analyses of hyperspectral data. Soil
statistics to determine whether a hypothesis Sci. Soc. Am. J., 63: 142150.
should be rejected. Little investigation of Anile, M.A., Furno, P., Gallo, G. and Massolo, A.
fuzzy hypothesis testing has been done in (2003). A fuzzy approach to visibility maps creation
the context of spatial analysis. However, over digital terrains. Fuzzy Sets and Systems, 135:
as Smithson (2005) points out, fuzzy sets 6380.
and statistics work better together. There Arvanitis, L.G., Ramachandran, B., Brackett, D.P.,
are a few cases where this is demonstrated Abd-El Rasol, H. and Xu, X.S. (2000). Multire-
mostly in relation to the process of assigning source inventories incorporating GIS, GPS and
database management systems: a conceptual model.
membership values (Ahn et al., 1999; Brown,
Computers and Electronics in Agriculture, 28:
1998; Mackay et al., 2003), not in the explicit 89100.
testing of hypotheses.
Bardossy, A., Bogardi I. and Kelly W.E. (1989).
There are several broad issues that will Geostatistics utilizing imprecise (fuzzy) information.
face researchers attempting to use fuzzy Fuzzy Sets and Systems, 31: 311328.
sets in spatial analysis. Perhaps, the most
Bezdek, J.C. (1974). Cluster validity with fuzzy sets.
fundamental issue is when, or when not, Journal of Cybernetics, 3: 5873.
to use fuzzy-based analysis. This is not
Bezdek, J.C. (1981). Pattern Recognition with Fuzzy
easily answered and demands considerable
Objective Function Algorithms. New York: Plenum
knowledge of both the problem at hand as Press.
well as both mainstream methods as well as
Bezdek, J.C., Ehrlich, R. and Full, W. (1984). FCM: the
fuzzy-based methods. However, fuzzy-based
fuzzy c -means clustering algorithm. Computers and
approaches are showing great promise, yet Geosciences, 10: 191203.
are still not as widely known, or understood,
Bone, C., Dragicevic, S. and Roberts, A. (2006). A fuzzy-
as many of the mainstream approaches constrained cellular automata model of forest insect
detailed in other chapters of this book. infestations. Ecological Modelling, 192: 107125.
Another issue is whether or not a fuzzy-
Bordogna, G., Chiesa, S. and Geneletti, D. (2006).
based spatial analysis should be evaluated Linguistic modelling of imperfect spatial information
against nonfuzzy-based techniques or are as a basis for simplifying spatial analysis. Information
they now developed enough to stand on their Sciences, 176: 366389.
Bossomaier, T., Amri, S. and Thompson, J. (2005). Cobb, M.A., Chung, M.J., Foley III, H., Petry, F.E. and
Agent-based modelling of house price evolution. Shaw, K.B. (1998). A rule-based approach for the
In: Proceedings of CABM-HEMA-SMAGET 2005 conation of attributed vector data. Geoinformatica,
Joint Conference on Multi-Agent Modelling for 2: 735.
Environmental Management, Bourg StMaurice-Les
Cross, V., Firat, A. (2000). Fuzzy objects for
Arc, France.
geographical information systems. Fuzzy Sets and
Botia, J.A., Gomez-Skarmeta, A.F., Valdes, M., and Systems, 113: 1936.
Padilla, A. (2001). Fuzzy and hybrid methods applied
DeGenst, A., Canters, F. and Gulink, H. (2001).
to GIS interpolation. In: The 10th IEEE International
Uncertainty modeling in buffer operations applied
Conference on Fuzzy Systems, 453456, Melbourne,
to connectivity analysis. Transactions in GIS, 5:
Australia.
305326.
Bragato, G. (2004). Fuzzy continuous classication and
Dunn, J.C. (1973). A fuzzy relative of the ISODATA
spatial interpolation in conventional soil survey for
process and its use in detecting compact well-
soil mapping of the lower Piave plain. Geoderma,
separated clusters. Journal of Cybernetics, 3: 3257.
118: 116.
Braimoh, A.K., Vlek, P.L., Stein, A. (2004). Land evalu- Fisher, P., Wood, J. and Cheng, T. (2005). Fuzziness
ation for maize based on fuzzy set and interpolation. and ambiguity in multi-scale analysis of landscape
Environmental Management, 33: 226238. morphometry. In: Petry, F.E., Robinson, V.B. and
Cobb, M.A. (eds.), Fuzzy Modeling with Spatial
Brown, D.G. (1998). Classication and boundary Information for Geographic Problems. pp. 207232.
vagueness in mapping presettlement forest types. Heidelberg: Springer.
Science, 12: 105129. Foody, G.M. (1999). The continuum of classication
fuzziness in thematic mapping. Photogrammetric
Brown, D.G. (1998). Mapping historical forest types in Engineering and Remote Sensing, 65: 443451.
Baraga County Michigan, USA as fuzzy sets. Plant
Ecology, 134: 97118. Foody, G.M. and Boyd, D.S. (1999). Fuzzy mapping of
tropical land cover along an environmental gradient
Buckley, J.J. and Eslami, E. (2002). An Introduction from remotely sensed data with an articial neural
to Fuzzy Logic and Fuzzy Sets. New York: Physica- network. Journal of Geographical Systems, 1: 2335.
Verlag.
Fotheringham, A.S. (1988). Consumer store choice
Burrough, P.A., McDonnell, R.A. (1998). Principles and choice set denition. Marketing Science,
of Geographical Information Systems. New York: 7: 299310.
Oxford University Press.
Fritz, S. and See, L. (2005). Comparison of land
Burrough, P.A., Wilson, J.P., van Gaans Pauline, cover maps using fuzzy agreement. International
F.M. and Hansen, A.J. (2001). Fuzzy k -means Journal of Geographical Information Science,
classication of topo-climatic data as an aid to forest 19: 787807.
mapping in the Greater Yellowstone area, USA.
Landscape Ecology, 16: 523546. Fritz, S., Carver, S. and See, L. (2000). New GIS
approaches to wild land mapping in Europe.
Charnpratheep, K., Zhou, Q. and Garner, B. (1997).
In: Wilderness Science in a Time of Change
Preliminary landll site screening using fuzzy
Conference Volume 2: Wilderness Within
geographical information systems. Waste Manage-
the Context of Larger Systems, Missoula, MT,
ment & Research, 15: 197215.
pp. 120127.
Cheng, T., Molenaar, M. and Lin, H. (2002). Formalizing
Gale, S. (1972). Inexactness, fuzzy sets, and the
fuzzy objects from uncertain classication results.
foundations of behavioral geography. Geographical
Analysis, 4: 337349.
Science, 15: 2742.
Gedeon, T.D., Wong, K.W., Wong, P. and Huang, Y.
Chiou, A. and Yu, X. (2001). Prediction of Parthenium
(2003). Spatial interpolation using fuzzy reasoning.
weed infestation using fuzzy logic applied to
Transactions in GIS, 7: 5566.
geographic information system (GIS) spatial image.
In: The 10th IEEE International Conference on Fuzzy Graniero, P.A. and Robinson, V.B. (2003). A real-
Systems, pp. 13631366, Melbourne, Australia. time adaptive sampling method for eld mapping in
patchy, heterogeneous environments. Transactions Klir, G.J., Ute, S.C., Yuan, B. (1997). Fuzzy Set Theory:
in GIS, 7: 3154. Foundations and Applications. Upper Saddle River,
NJ: Prentice Hall.
Guneralp, B., Mendoza, G., Gertner, G. and
Anderson, A. (2003). Spatial simulation and fuzzy Kosko, B. (1992). Neural Networks and Fuzzy Systems.
threshold analyses for allocating restoration areas. Englewood Cliffs, NJ: Prentice-Hall.
Transactions in GIS, 7: 325343.
Kuo, R.J., Chi, S.C. and Kao, S.S. (2003). A decision
Hagen, A. (2003). Fuzzy set approach to assess- support system for selecting convenience store
ing similarity of categorical maps. International location through integration of fuzzy AHP and
Journal of Geographical Information Science, 17: articial neural network. Computers in Industry, 47:
235249. 199214.
Hagen-Zanker, A., Straatman, B., Uljee, I. (2005). Leung, Y. (1983). Fuzzy sets approach to spatial
Further developments of a fuzzy set map comparison analysis and planning, a nontechnical evaluation.
approach. International Journal of Geographical Geograska Annaler, Series B, Human Geography,
Information Science, 19: 769785. 65: 6575.
Hanna, A.S., Lotfallah, W.B. and Lee, M.J. (2002). Liew, A.W.C., Leung, S.H. and Lau, W.H. (2000). Fuzzy
Statistical-fuzzy approach to quantify cumulative image clustering incorporating spatial continuity. IEE
impact of change orders. Journal of Computing in Proc Vision, Image, and Signal Processing, 147:
Civil Engineering, 16: 252258. 185192.
Heikkila, E.J., Shen, T.-Y., Yang, K.-Z. (2003). Fuzzy Lin, J.-J., Feng, C.-M., Hu, Y.-Y. (2006). Shifts in activity
urban sets: theory and application to desakota centers along the corridor of the Blue subway line in
regions in China. Environment and Planning B: Taipei. Journal of Urban Planning and Development,
Planning and Design, 30: 239254. 132: 2228.
Henn, V. (2000). Fuzzy route choice model for Liu, Z. and George, R. (2005). Mining weather data
trafc assignment. Fuzzy Sets and Systems, 116: using fuzzy cluster analysis. In: Petry, F.E., Robinson,
77101. V.B. and Cobb, M.A. (eds.), Fuzzy Modeling
Hwang. S., Thill. J.-C. (2005). Modeling localities with with Spatial Information for Geographic Problems,
fuzzy sets and GIS. In: Petry, F.E., Robinson, V.B. pp. 105119. Heidelberg: Springer.
and Cobb, M.A. (eds.), Fuzzy Modeling with Spatial Lodwick, W.A. and Santos, J. (2003). Constructing
Information for Geographic Problems. pp. 71104. consistent fuzzy surfaces from fuzzy data. Fuzzy Sets
Heidelberg: Springer. and Systems, 135: 259277.
Irvin, B.J., Ventura, S.J. and Slater, B.K. (1997). Fuzzy Lundberg, C.G. (1982). Modeling constraints and
and isodata classication of landform elements from anticipation: linguistic variables, foresight-hindsight
digital terrain data in Pleasant Valley, Winsconsin. and relative alternative attractiveness. Geographical
Geoderma, 77: 137154. Analysis, 14: 347355.
Jang, J.-S.R. (1993). ANFIS: Adaptive-network-based
Mackay, D.S. and Robinson, V.B. (2000). A multiple
fuzzy inference systems. IEEE Trans. Systems, Man
criteria decision support system for testing integrated
& Cybernetics, 23: 665685.
environmental models. Fuzzy Sets and Systems, 113:
Jiang, H. and Eastman, J.R. (2000). Application of 5367.
fuzzy measures in multi-criteria evaluation in GIS.
Mackay, D.S., Samanta S., Ahl, D.E., Ewers,
B.E., Gower, S.T., Burrows, S.N. (2003). Auto-
Science, 14: 173184.
mated parameterization of land surface process
Kahraman, C., Ruan, D. and Dogan, I. (2003). Fuzzy models using fuzzy logic. Transactions in GIS,
group decision-making for facility location selection. 7: 139153.
Information Sciences, 157: 135153.
MacMillan, R.A., Martin, T.C., Earle, T.J. and McNabb,
Katz, A., Vom, H.M. and Mahoney, J. (2005). Explaining D.H. (2003). Automated analysis and classication
the great reversal in Spanish America: fuzzy set of landforms using high-resolution digital elevation
analysis versus regression analysis. Sociological data: applications and issues. Canadian Journal of
Methods and Research, 33: 539573. Remote Sensing, 29: 592606.
MacMillan, R.A., Pettapiece, W.W., Nolan, S.C. Power, C., Simms, A. and White, R. (2001).
and Goddard, T.W. (2000). A generic proce- Hierarchical fuzzy pattern matching for the regional
dure for automatically segmenting landforms into comparison of land use maps. International Journal
landform elements using DEMs, heuristic rules, of Geographical Information Science, 15: 77100.
and fuzzy logic. Fuzzy Sets and Systems, 113:
Ragin, C.C. and Pennings, P. (2005). Fuzzy sets and
81109.
social research. Sociological Methods and Research,
Matsakis, P. and Nikitenko, D. (2005). Combined 33: 423430.
extraction of directional and topological relationship
Ridwan, M. (2004). Fuzzy preference based trafc
information from 2D concave objects. In: Petry,
assignment problem. Transportation Research Part
F.E., Robinson, V.B. and Cobb, M.A. (eds.), Fuzzy
C, 12: 209233.
Modeling with Spatial Information for Geographic
Problems. pp. 143158. Berlin: Springer. Robinove C.J. (1989). Principles of logic and the
use of digital geographic information systems.
McBratney, A.B. and Odeh, I.O.A. (1997). Application In: Ripple, W.J. (ed.), Fundamentals of GIS:
of fuzzy sets in soil science: fuzzy logic, fuzzy A Compendium, Washington, D.C.: American
measurements and fuzzy decisions. Geoderma, 77: Society for Photogrammetry and Remote Sensing.
85113. pp. 6179.
Morris, A. (2003). A framework for modeling Robinson, V.B. and Strahler, A.H. (1984). Issues in
uncertainty in spatial databases. Transactions in GIS, designing geographic information systems under
7: 83103. conditions of inexactness. In: Proceedings of
Morris, A. and Jankowski, P. (2005). Spatial decision 10th International Symposium on Machine Pro-
making using fuzzy GIS. In: Cobb, M.A., Petry, F. and cessing of Remotely Sensed Data, pp. 179188,
Robinson, V.B. (eds.), Fuzzy Modeling with Spatial Terre Haute, IN.
Information for Geographic Problems. pp. 275298. Robinson, V.B. (1988). Some implications of fuzzy set
Heidelberg: Springer. theory applied to geographic databases. Computers,
Oberthur, T., Dobermann, A. and Aylward, M. Environment, and Urban Systems, 12: 8997.
(2000). Using auxiliary information to adjust fuzzy Robinson, V.B. (2000). Individual and multipersonal
membership functions for improved mapping of fuzzy spatial relations acquired using human-
soil qualities. International Journal of Geographical machine interaction. Fuzzy Sets and Systems, 113:
Information Science, 14: 431454. 133145.
Odeh, I.O.A., McBratney, A.B. and Chittleborough, Robinson, V.B. (2003). A perspective on the funda-
D.J. (1990). Design of optimal sample spacings for mentals of fuzzy sets and their use in geographic
mapping soil using fuzzy k -means and regionalized information systems. Transactions in GIS, 7: 330.
variable theory. Geoderma, 47: 93112.
Robinson, V.B. and Graniero, P.A. (2005). Spatially
Paez, D., Bishop, I.D. and Williamson, I.P. (2006). explicit individual-based ecological modeling with
DISCUSS: a soft computing approach to mobile fuzzy agents. In: Petry, F.E., Robinson, V.B.
spatial disaggregation in economic evaluation and Cobb, M.A. (eds.), Fuzzy Modeling with Spatial
of public policies. Transactions in GIS, 10: Information for Geographic Problems. pp. 299334.
265278. Heidelberg: Springer.
Peschel, J.M. (2002). Creating land cover input Schaefer, J.A., Veitch, A.M., Harrington, F.H., Brown,
datasets from the SWAT 2000 model using remotely W.K., Theberge, J.B. and Luttich, S.N. (2001).
sensed data. Texas A&M University, http://ceprofs. Fuzzy structure and spatial dynamics of a declining
tamu.edu/folivera/TxAgGIS/Spring2002/Peschel/ woodland caribou population. Oecologia, 126:
peschel.htm, visited on April 14, 2006. 507514.
Pham, D.L. (2001). Spatial models for fuzzy clustering. Schaefer, J.A. and Willson, C.C. (2002). A fuzzy
Computer Vision and Image Understanding, 84: structure of populations. Canadian Journal of
285297. Zoology, 80: 22352241.
Pipkin, J.S. (1978). Fuzzy sets and spatial choice. Scull, P., Franklin, J., Chadwick, O.A. and McArthur, D.
Annals of Association of American Geographers, 68: (2003). Predictive soil mapping: a review. Progress
196204. in Physical Geography, 27: 171197.
Skubic, M., Blisard, S., Bailey, C., Adams, J.A., from fuzzy set theory, approximate reasoning and
Matsakis, P. (2004). Qualitative analysis of sketched neural networks. Transportation Research Part C, 11:
route maps: translating a sketch into linguistic 5173.
descriptions. IEEE Trans. on Systems, Man, and
Wanek, D. (2003). Fuzzy spatial analysis techniques in a
Cybernetics, 34: 12751282.
business GIS environment. In: European Regional Sci-
Smithson, M. (2005). Fuzzy set inclusiong: linking ence Association 2003 Congress, Jyvaskla, Finland
fuzzy set methods with mainstream techniques. [CD-ROM (paper no. 177)].
Sociological Methods and Research, 33: 431461.
Wealands, S.R., Grayson, R.B. and Walker, J.P.
Stefanakis, E., Vazirgiannis, M. and Sellis, T. (1999). (2005). Quantitative comparison of spatial elds for
Incorporating fuzzy set methodologies in a DBMS hydrological model assessment some promising
repository for the application domain of GIS. approaches. Advances in Water Resources, 28:
International Journal of Geographical Information 1532.
Science, 13: 657675.
Wilson, J.P., Burrough, P.A. (1999). Dynamic modeling,
Taheri, S.M. (2003). Trends in fuzzy statistics. Austrian geostatistics, and fuzzy classication: new sneakers
Journal of Statistics, 32: 239257. for a new geography? Annals of the Association of
Taylor, P.J. and Derudder, B. (2004). Porous Europe: American Geographers, 89: 736746.
european cities in global urban arenas. Tijdschrift Witlox, F. and Derudder, B. (2005). Spatial decision-
voor Economishe en Sociale Geogra phie, 95: making using fuzzy decision tables: theory, applica-
527538. tion and limitations. In: Petry, F.E., Robinson, V.B.
Teng, C.H. and Fairbairn, D. (2002). Comparing and Cobb, M.A. (eds.), Fuzzy Modeling with Spatial
expert systems and neural fuzzy systems for object Information for Geographic Problems. pp. 253275.
recognition in map dataset revision. International Berlin: Springer.
Journal of Remote Sensing, 23: 555567. Wu, F. (1998). Simulating urban encroachment on rural
The Math Works Inc. (2002). Fuzzy Logic Toolbox Users land with fuzzy-logic-controlled cellular automata
Guide. Natick, MA. USA: The Math Works Inc. in a geographical information system. Journal of
Environmental Management, 53: 293308.
Thole, U., Zimmermann, H.-J. and Zysno, P. (1979). On
the suitability of minimum and product operators for Yanar Tahsin, A. and Akyurek, Z. (2006). The
the intersection of fuzzy sets. Fuzzy Sets and Systems, enhancement of the cell-based GIS analyses with
2: 167180. fuzzy processing capabilities. Information Sciences,
176: 10671085.
Torres, R., Keller, G.R., Kreinovich, V., Longpre, L. and
Starks, S.A. (2004). Eliminating duplicates under Zadeh, L.A. (1965). Fuzzy sets. Information and Control,
interval and fuzzy uncertainty: an asymptotically 8: 338353.
optimal algorithm and its geospatial applications. Zeng, T.Q. and Zhou, Q. (2001). Optimal spatial
Reliable Computing, 10: 401422. decision making using GIS: a prototype of a real
Train, K.E. (2003). Discrete Choice Methods with estate geographical information system (REGIS).
Simulation. Cambridge, UK: Cambridge University International Journal of Geographical Information
Press. Science, 15: 307321.
Verkuilen, J. (2005). Assigning membership in a fuzzy Zhan, F.B., Lin, H. (2003). Overlay of two simple poly-
set analysis. Sociological Methods and Research, 33: gons with indeterminate boundaries. Transactions in
462496. GIS, 7: 6781.
Verstraete, J., De Tre, G., De Caluwe, R. and Hallez, A. Zheng, D. and Kainz, W. (1999). Fuzzy rule extraction
(2005). Field based methods for the modeling of from GIS data with a neural fuzzy system for
fuzzy spatial data. In: Petry, F.E., Robinson, V.B. decision making. In: Proceedings of the Seventh
and Cobb, M.A. (eds.), Fuzzy Modeling with Spatial ACM International Symposium on Advances in
Information for Geographic Problems. pp. 4170. Geographic Information Systems, Kansas City, MO,
Heidelberg: Springer. USA, pp. 7984.
Vythoulkas, P.C. and Koutsopoulos, H.N. (2003). Zhu, A.X., Hudson, B., Burt, J., Lubich, K. and
Modeling discrete choice behavior using concepts Simonson, D. (2001). Soil mapping using GIS, expert
knowledge, and fuzzy logic. Soil Science Society of resource mapping. International Journal of
America Journal, 65: 14631472. Geographical Information Science, 13: 119141.
Zhu, A.-X. (1997). A similarity model for representing Zhu, A.-X. (2004). Personal Communication. Depart-
soil spatial information. Geoderma, 77: 217242. ment of Geography, University of Wisconsin.
Zhu, A.-X. (1999). A personal construct-based Zimmermann, H.-J. (2001). Fuzzy Set Theory and Its
knowledge acquisition process for natural Applications. Boston, MA: Kluwer Academic.
13
Geographically Weighted
Regression
13.1. INTRODUCTION fundamental property of classical aspatial

statistical inference. Another property of
Spatial data contain locational informa- many spatial data sets, perhaps slightly less
tion as well as attribute information. It recognized but becoming increasingly well-
is increasingly recognized that most data known, is that the processes generating
sets are spatial in that the attribute being the data might exhibit spatial heterogeneity
measured is typically recorded either at or nonstationarity. That is, the processes
some specific location or as a representation generating observed attributes might vary
of a general area. It is also increas- over space rather than being constant as is
ingly recognized that spatial data exhibit assumed in the use of most traditional types
special properties which distinguish them of statistical analysis.
from aspatial data and which necessitate Nowhere is this more evident than in the
the development of specialized statisti- use of what is undoubtedly the most fre-
cal techniques. For instance, spatial data quently used statistical modelling approach
almost invariably exhibit some form of in the analysis of spatial data that of
spatial dependence whereby locations in regression. In a typical linear regression
close proximity tend to have more similar model applied to spatial data we assume
attributes than do locations further apart. a stationary process (often without giving
This tends to invalidate the assumption this any thought!). That is, we assume that
of the independence of error terms, a the same relationships hold throughout the
entire study area we are investigating and the model to vary over space rather than to
that the same stimulus provokes the same calibrate a stationary model and then trying
response in all parts of the study region. to examine a possible error in the model
In a linear framework, we can represent through the spatial patterning of the residuals.
these relationships with the following general The specification of a model that allows
model:1 the parameter estimates to vary over space
is the essence of geographically weighted
regression (GWR).
yi = 0 + 1 x1i + 2 x2i + n xni + i
(13.1)
where yi is the value of the dependent vari- 13.2. GWR MECHANICS

able observed at location i, x1i , x2i , . . ., xni
The geographically weighted version of
are the values of the independent variables
the regression model described in equa-
observed at i, 0 , 1 , . . ., n are parameters
tion (13.1) is:
to be estimated, and i is an error term which
is assumed to be normally distributed.
The parameter estimates obtained in the
calibration of such a model are constant over yi = 0i + 1i x1i + 2i x2i + ni xni + i
space and are obtained from the following (13.3)
estimator:
where i refers to a location at which data

= (XT X)1 XT Y. (13.2) on y and x are measured and at which local
estimates of the parameters are obtained. In
this model, the parameter estimates are now
That is, for each relationship between y and
local to location i instead of being global
an x variable, a single parameter is estimated
constants. The estimator for the parameters
which is assumed to be constant across
is then:
the study region. Consequently, if there is
spatial nonstationarity, the resulting single
parameter estimate would then represent an
(i) = (XT W(i) X)1 XT W(i) Y (13.4)
average of the different processes operating
over space and we would only get an
inkling of this through the residuals of the
model. We might map these to determine where W(i) is a matrix of weights specific
whether there are any spatial patterns. Or to location i such that observations nearer to
we might compute an autocorrelation statistic i are given greater weight than observations
for the residuals or we might even try to further away. The matrix W(i) has the form:
model the error dependency with various
types of spatial regression models. However,
spatial dependency in the residuals can result wi1 0 . . . ... 0
0 w ... ... 0
from other processes apart from spatial i2

W(i) = 0 0 wi3 ... 0 (13.5)
heterogeneity so examining the residuals is
not an ideal solution. It seems much more
obvious to allow the parameter estimates in 0 0 0 . . . win
GEOGRAPHICALLY WEIGHTED REGRESSION 245
where win is the weight given to data point for many different locations as we will
n for the estimate of the local parameters at see below.
location i. Although the exact specification of the
There are many possible weighting func- weighting function can take many forms,
tions that could be specified which relate there are two broad categories of weighting
the weighting of an observed value at functions: fixed or adaptive. An example of
location j to the distance location j is a fixed spatial weighting function is shown
from the regression point i but they tend in Figure 13.2. In this case, the specified
to be Gaussian or Gaussian-like, reflecting weighting function or kernel is constant
the nature of many spatial processes. The across the study area and therefore has the
operation of a typical weighting function is undesirable property that in areas where data
shown in Figure 13.1. points are relatively sparse, the resulting local
Data points that are located close to parameter estimates will have high standard
the regression point are weighted highly errors attached to them reflecting the added
whereas data points that are far from the uncertainty in the estimates caused by the
regression point get a very low weight. relative lack of data.
Hence, the weighting matrix will change There are many functions that could be
every time the regression point changes. used to represent a fixed spatial weighting
GWR thus produces a model that effectively function. One is a Gaussian expression:
answers the question what do the relation-
ships in my model look like around this
location? The question can be answered wij = exp [ (dij /h)2 ] (13.6)
1 wij
Bandwidth
0
x dij
x Regression point wij Weight of data point j at regression point i

Data point dij Distance between regression point i and data point j
Figure 13.1 A typical spatial weighting function.

wij
wij
x Regression point
Data point
Figure 13.2 A xed spatial weighting function.
wij
wij
x Regression point
Data point
Figure 13.3 A spatially adaptive weighting function.
where dij is the distance between locations i outwards in order to capture more data. The
and j, and h is a parameter often referred to as operation of an adaptive kernel is shown in
the bandwidth as h increases, the gradient of Figure 13.3.
the kernel becomes less steep and more data Again, there are several functions that one
points are included in the local calibration. could use to produce a spatially adaptive
An alternative, and generally preferred, weighting function. One, for example, is the
alternative is an adaptive kernel where the following:
spatial extent of the kernel is dictated by
the underlying density of data points. In
areas where data are plentiful, the kernel wij = [1 (dij2 /h2 )]2 if j is one of the Nth
is relatively tightly defined around the nearest neighbours of i
regression point; in areas where the data are
relatively sparse, the kernel has to extend =0 otherwise (13.7)
where h is the bandwidth and N is a As the bandwidth 0, the local model

parameter to be estimated. wraps itself around the data so the number
The results of GWR appear to be relatively of parameters = n.
insensitive to the choice of weighting func- The number of parameters in local models
tion as long as it is a continuous distance- therefore ranges between k and n and
based function but whichever weighting depends on the bandwidth. This number
function is used, the results will, however, need not be an integer and is referred to
be sensitive to the degree of distance- as the effective number of parameters in the
decay. Therefore an optimal value of either model.
h or N has to be obtained. This can be
found by minimizing a cross-validation score
(CV) or the Akaike information criterion
(AICc) where: 13.3. GWR OUTPUT
The main output from GWR is a set of

CV = [yi y = i (h)]2 (13.8) location-specific parameter estimates that
i
can be mapped and analysed to provide
information on spatial nonstationarity in
where y = i (h) is the fitted value of yi relationships. However, any diagnostic from
with data from point i omitted from the regression can be replicated in geographically
calibration and: weighted format so within the GWR frame-
work we can also:
AICc = Deviance + 2k[n/(n k 1)]
(13.9) estimate local standard errors;
derive local t statistics;

where n is the number of data points and k
is the number of parameters in the model. calculate local goodness-of-t measures;
Lower values of both statistics indicate better
model fits. calculate local leverage measures;
Optimal bandwidth selection is a trade-off
perform tests to assess the signicance of
between bias and variance:
the spatial variation in the local parameter
estimates; and
too small a bandwidth leads to a large variance
in the local estimates because of the relatively perform tests to determine if the local model
small number of data points used in the local performs better than the global one, accounting
calibration; for differences in degrees of freedom.
too large a bandwidth leads to large bias in

the local estimates because data are drawn
from locations further away from the regression 13.4. A SIMULATION EXPERIMENT
point.
Consider the following model:
As the bandwidth , the local model
will tend to the global model with number of
parameters = k. yi = i + 1i x1i + 2i x2i (13.10)
and data on x1 and x2 drawn randomly for In this case, where there is no spatial
2500 locations on a 50 50 matrix subject to nonstationarity (the parameters are the same
the correlation between x1 and x2 , r(x1 , x2 ), everywhere), the global model is clearly
being controlled. In fact, the results of this appropriate and replicates the y variable
experiment can be shown to be independent perfectly and the estimated parameters are
of r(x1 , x2 ) so we will ignore this feature of equal to their known values. K represents
the experiment here. the number of parameters estimated in the
model. The results are not surprising the
processes being modelled are stationary so
the global model works well. The question
13.4.1. Experiment 1 (parameters is, how well does the GWR model perform
spatially invariant) in this situation? The results of the GWR
In this experiment, we set the three para- calibration are given below.
meters in the model to known, constant
values:
Local model calibrated by GWR
i = 10 for all i
Adj. R2 = 1.0
1i = 3 for all i
AIC = 59,386
2i = 5 for all i.
K = 6.5
N = 2,434
With everything on the right-hand side of
equation (13.10) now known, we can derive a
i (est.) = 10 for all i
value of yi at each location and then use these
data to calibrate the model both by ordinary 1i (est.) = 3 for all i
least squares regression and by GWR. The
results are as follows: 2 i(est.) = 5 for all i.
Global model calibrated by OLS Reassuringly, the GWR model attempts

to make itself as similar to the global
model as possible. N, the number of nearest
Adj. R2 = 1.0 neighbours used to calibrate each local
model, is optimized at 2434 data points out
AIC = 59,390 of the 2500. That is, the kernel is trying to
become as broad as possible to use all the
K =3 data on each local calibration. Consequently,
the local parameter estimates are the same
(est.) = 10 everywhere and the model replicates the
y variable almost perfectly. The AIC values,
1 (est.) = 3 another goodness of fit measure, are almost
identical. Notice that k, here the effective
2 (est.) = 5 number of parameters, is not an integer and
is 6.5. This is the equivalent number of Global model calibrated by OLS

independent parameters used in the model.
These results are useful because they
demonstrate that the GWR model is not Adj. R2 = 0.04
picking up spurious nonstationarity when the
AIC = 17,046
processes are stationary and the global model
is appropriate. However, what happens if the
K =3
processes being examined are nonstationary?
(est.) = 10.26
13.4.2. Experiment 2 (parameters 1 (est.) = 0.1

spatially varying)
2 (est.) = 5.28.
Given that the locations of the data points lie
on a 50 50 grid, we can use this to assign
spatially varying values to each of the three These are close to the averages of the local
parameters in our model. If the coordinates estimates (10; 0; 5).
of a representative grid point are defined In this case, the OLS calibration performs
as (i, j), we know: very poorly because it is trying to fit a global
model to a situation in which the processes
are nonstationary. The resulting parameter
0 i 50, 0 j 50
estimates are very close to the averages of
the spatially varying local values but this
so that we can make the parameters functions average model is not representative of any
of i and j. In this case, we chose the following situation across the study region and hence
relationships: the model cannot replicate the y data at
all well. In addition, of course, the model
provides no indication on how the processes
i = 0 + 0.2i + 0.2j (13.11) being examined vary spatially.
so that i ranges between 0 and 20: Local model calibrated by GWR
1i = 5 + 0.1i + 0.1 j (13.12) Adj. R2 = 0.997
AIC = 2,218
so that 1i ranges between 5 and 5; and:
K = 167
2i = 5 + 0.2i + 0.2 j (13.13)
N = 129
so that 2i ranges between 5 and 15. i (est.) range = 2 to 18.6

Values of yi are then obtained as before
and the data used to calibrate the model by 1i (est.) range = 4.3 to 4.7
global regression and by GWR. The results
are as follows. 2i (est.) range = 3.9 to 13.6.
The local model clearly captures the spatial various model diagnostics plus geocoded
nonstationarity in relationships extremely local parameter estimates, their local standard
well. The y variable is replicated accurately errors, local t-values and local goodness-of-
with the adjusted r-squared statistic being fit measures.
close to 1.0 and the AIC value being much An example of the interface is shown
lower than the comparable value from the in Figure 13.4. The user is asked to input
global model (2,218 versus 17,046). In this a data file from which the variable names
case, the local model is trying to make are stripped off and loaded into the GWR
itself as local as possible and the number of model editor for placement in the appropriate
nearest neighbours in each local regression model form. The user defines the dependent
is only 129. Recall also that the data on variable and a set of independent variables
these 129 observations are not weighted as 1 for the model from the variable list. The x
but will have a weight somewhere between and y coordinates of the data locations must
0 and 1 depending on their distance from also be designated. A kernel type (either fixed
the regression point. The effective number or adaptive), a calibration criterion (either
of parameter estimates is 167 reflecting the CV or AICc), and an output format for the
spatially varying nature of the processes geocoded information must then be selected
underlying the model and the ranges of before the model is saved and run. The output
local parameter estimates are close to their is presented in both a listing file on the
known values. The local parameter estimates screen and an output file which is saved for
are geocoded and can easily be mapped to subsequent processing generally mapping
display the nature of their spatial variation. of the output to see the spatial variation in
The conclusion from these two experi- local parameter estimates and goodness-of-fit
ments is that calibration of local models statistics.
by GWR allows the identification of spatial The model editor also allows extra
nonstationarity where it exists. Further, the computations. The user can select a Monte
GWR calibration procedure does not appear Carlo simulation exercise to examine the
to introduce any spurious nonstationarity significance of any spatial variability in local
in situations where a global model is parameter estimates and various other diag-
appropriate. nostics can be chosen. The user also has the
facility to by-pass the optimization routine
for the bandwidth and input his/her own
bandwidth. This can be useful to examine
13.5. SOFTWARE FOR GWR the effects of scale on the output: large
bandwidths essentially perform regional
Software for running GWR (GWR 3.1) is calibrations on the data; small bandwidths
available from the author and runs on any perform very local calibrations.
Windows platform. It has a very simple The software is distributed on a self-
point-and-click interface which makes it very loading CD which also contains sample data.
easy to calibrate models by GWR. The
user can select from a Gaussian, Poisson,
or binary logit GWR models. The current
restrictions on data size are a maximum 13.6. RESEARCH TOPICS
of 80,000 observations and 50 variables.
The software also calibrates a global model Although the initial development of GWR
for comparison and the output consists of took place over a decade ago and it is
now becoming a relatively well-established autocorrelation has necessitated the use of

technique, being used in many disciplines, various spatial regression techniques, some
much research remains to be done. For of which are quite complex and which
instance, the investigation of how the GWR has possibly hindered their adoption. This
format can be linked with that of spatial raises the question to what extent is the
regression models would seem quite fruitful. spatial autocorrelation of residuals often
One of the advantages of using GWR is seen in the application of global models,
that it generally accounts for much of the a result of assuming a stationary process
spatial autocorrelation in the residuals often when the relationships being examined vary
found in global modelling. In the past, such over space? There are other reasons why
Figure 13.4 The model editor in GWR 3.1.

the residuals from global models applied variation from that which is likely to be
to spatial data might be autocorrelated but attributable to something more interesting.
we now have the means of examining the Currently, this is done via Monte Carlo
relative contributions of different processes simulation but more formal methods might
to such autocorrelation. That said, there is be developed. One aspect of inference that is
still a potentially useful merger of spatial well known in these situations is that of the
regression models and GWR one could multiple hypothesis testing problem which
have, for example, a GWR version of suggests that the traditional cut-off points
a spatial regression model. If the spatial on a statistical distribution for rejecting a
regression model were an autoregressive null hypothesis is too liberal. Bonferroni-
model, for example, this would provide type adjustments should be made although
an easy way of calibrating local spatial recognizing that the hypothesis tests in GWR
autocorrelation statistics which are free from are not independent. Probably the ratio of
covariate effects. the effective number of parameters in the
A second research area is that of the GWR model to the number of parameters
development of what are termed mixed in the global model should be used as the
or semi-parametric GWR models where adjustment factor rather than the number
some of the parameters are allowed to vary of tests.
spatially whilst others are fixed globally. In Although the primary rationale for cali-
some instances, for example, there is no brating a GWR model is to uncover facets
reason to suspect that a particular relationship of possible nonstationarity in the processes
would be spatially varying and it makes being examined, a common question is to
sense to set such a parameter in the model what extent can the methodology be used
as fixed. The calibration of such models, for prediction? To answer this, research is
however, is somewhat more complex than the currently being undertaken to compare GWR
full GWR model. as a prediction method with various forms
This topic leads into a related one of kriging. The results so far suggest GWR
which concerns variable selection in GWR. provides much better estimates of unknown
It should be realized that simply because a values than do many types of kriging and
variable is insignificant at the global level, about the same level of predicative ability as
does not mean it might not be impor- universal kriging with external covariates. Of
tant locally. Consequently, variable selection course, the advantage of GWR is that much
should ideally be at the level of the GWR more information is yielded on the processes
model and not at that of the global model. at work.
Following from the above, however, vari- Finally, the most powerful aspect of GWR
ables could either be: unimportant at the local is the concept of geographically weighting
level, important but with a stationary effect, models. Anything that can be weighted can
or important with a spatially varying effect. be geographically weighted. The models
Consequently, variable selection, along the need not be linear nor even in a regression
lines of stepwise regression, is considerably format. One can generate, for example,
more complex in GWR. GW versions of any descriptive spatial
Another topic that needs further research statistic or GW versions of any multivariate
is that of statistical inference in GWR. It is statistical method such as GWR PCA or GW
necessary to distinguish the degree of spatial discriminant analysis. The task in these latter
variation in local parameter estimates that cases is probably to handle the large volumes
could reasonably be attributed to sampling of output that will be generated.
13.7. SUMMARY information as well as attribute information

as input, it employs a spatial weighting
GWR appears to be a useful method to function with the assumption that near places
investigate spatial nonstationarity simply are more similar than distant ones, and it
assuming relationships are stationary over produces outputs that are location-specific
space is no longer tenable and is easily and geocoded so they can easily be mapped
testable. GWR can be likened to a spatial and subject to further spatial analysis. The
microscope in that it allows us to see concept of GW can be extended to many
variations in relationships that were pre- statistical techniques and there is still a great
viously unobservable. It provides a whole deal of work to be done.
new geography of relationships that needs
explanation.
GWR can be viewed as both a model
diagnostic tool or as a method to identify ACKNOWLEDGEMENT
interesting locations for further investigation.
In doing so, it conforms to two previ- Research presented in this paper was sup-
ously disparate philosophical views. From ported by a grant to the National Centre
a post-modernist view relationships can be for Geocomputation by Science Foundation
intrinsically different across space caused Ireland (03/RP1/1382) and by a Strate-
by differences in attitudes, preferences or gic Research Cluster grant (07/SRC1/1168)
different administrative, political or other from Science Foundation Ireland under the
contextual effects and GWR helps identify National Development Plan. The author
such differences. From a positivist view, gratefully acknowledges this support.
global statements about relationships can be
made but our models might not be properly
specified to allow us to make them. GWR
is then a good indicator of when and in NOTE
what way a global model is mis-specified and
how it can be improved. If the assumption 1 Note that the model need not be a linear
one but this is used here for convenience and
that global statements can be made is correct
because it is probably the most frequently encoun-
and a global model fails to make them, then tered type of regression. The software described
clearly the model is mis-specified. GWR can subsequently allows geographically weighted Poisson
thus be a useful model-building tool. regression models and geographically weighted
binary logit models to be calibrated and in theory
Finally, GWR is a good example of a there is no limit to what model forms can be
spatial statistical method. It uses locational geographically weighted.
14
Spatial Regression
Luc Anselin
14.1. INTRODUCTION book by Ripley (1981). It was introduced

in quantitative geography through the works
Spatial regression deals with the specifi- of Cliff and Ord (1973, 1981) and Upton
cation, estimation, and diagnostic checking and Fingleton (1985). Paralleling this was the
of regression models that incorporate spa- development of the field of spatial econo-
tial effects. Two broad classes of spatial metrics, started by regional scientists who
effects may be distinguished, referred to as were concerned with spatial correlation in
spatial dependence and spatial heterogeneity multiregional econometric models (Paelinck
(Anselin, 1988b). In this chapter, attention and Klaassen, 1979; Anselin, 1980). By
will be limited to the former, since spatial the late 1980s and early 1990s, several
heterogeneity is addressed in Chapter 13, compilations had appeared that included
on Geographically Weighted Regression. The technical reviews of a range of models,
focus will be on ways to incorporate spatial estimation methods and diagnostic tests,
correlation structures into a linear regression including Anselin (1988b), Griffith (1988)
model, and the implications of this for and Haining (1990). In addition, the publi-
estimation and specification testing. cation of the text by Cressie (1993) provided
Early interest in the statistical implica- a near-comprehensive technical treatment of
tions of estimating spatial regression models the statistical foundations for the analysis of
dates back to the pioneering results of the spatial data.
statistician Whittle (1954), followed by other In recent years, the interest in spatial
by now classic papers in statistics, such analysis in general and spatial data analysis
as Besag (1974) and Ord (1975), and the in particular has seen an almost exponential
growth, especially in the social sciences is therefore entirely on regression mod-

(Goodchild et al., 2000). Spatial regression els in a simple cross-sectional setting,
analysis is a core aspect of the spatial leaving out other promising applications,
methodological toolbox and several recent such as the spatial econometrics of panel
texts covering the state of the art have data (Elhorst, 2001, 2003; Anselin et al.,
appeared, such as Haining (2003), Waller 2008), the spatial econometrics of origin-
and Gotway (2004), Banerjee et al. (2004), destination flow models (LeSage and Pace,
Fortin and Dale (2005), Schabenberger and 2005; Fischer et al., 2006), the analysis
Gotway (2005), and Arbia (2006). There of spatial latent variables (Pinkse and
have also been a number of edited volumes, Slade, 1998; LeSage, 2000; Beron et al.,
dealing with more advanced topics, such as 2003; Fleming, 2004), and spatial gener-
Bartels and Ketellapper (1979), Anselin and alized linear mixed models (Gotway and
Florax (1995a), Anselin et al. (2004), Getis Stroup, 1997; Zhang, 2002; Gotway and
et al. (2004), and LeSage and Pace (2004). Wolfinger, 2003). Finally, it should be
In addition, several journal special issues pointed out that this chapter derives from
have recently been devoted to the topic, several earlier and more technical reviews
and they provide an excellent overview of dealing with various methodological aspects
important research directions. Such special of spatial regression analysis, specifically,
issues include Anselin (1992, 2003), Anselin Anselin and Bera (1998), and Anselin
and Rey (1997), Pace et al. (1998), Nelson (2001a, b, 2002, 2006). A more in-depth
(2002), Florax and van der Vlist (2003), technical discussion can be found in those
Pace and LeSage (2004b), and LeSage reviews.
et al. (2004).
This chapter provides a concise overview
of some of the central methodological
issues related to spatial regression analy- 14.2. SPECIFYING THE SPATIAL
sis. It consists of four sections, starting REGRESSION MODEL
with a treatment of the specification of
spatial dependence in a regression model. The point of departure is the familiar
Next, specification tests are considered to specification of a linear regression model,
detect the presence of spatial autocorre- where for each observation (location) i,
lation. This is followed by a review of with i = 1, . . . , N, the following relation-
the estimation methods, including maximum ship holds:
likelihood, instrumental variables/method
of moments and semi-parametric methods.
yi = xik k + i , (14.1)
The chapter closes with some concluding
k
remarks.
The treatment in this brief chapter is
not intended to be comprehensive, but where yi is an observation on the dependent
instead aims to provide a guide to both variable, xik an observation on an explanatory
the current state of the art as well as variable, with k = 1, . . . , K (including a
to ongoing research and remaining gaps. constant term, or 1), k as the matching
A number of topics are not included, regression coefficient, and i is a random
since they are (partially) addressed in error term. For ease of notation, the K
other chapters in this volume, such as explanatory variables and matching coeffi-
Bayesian techniques (Chapter 17). The focus cients are expressed as a K 1 vector,
SPATIAL REGRESSION 257
respectively xi and , such that the regression observations are for a single point in time,
becomes: the actual dynamics of the interaction among
agents (peer effects, neighborhood effects,
spatial externalities) cannot be observed, but
yi = xi + i . (14.2) the correlation structure that results once the
process has reached equilibrium is what can
be modeled (Brock and Durlauf, 2001, 2004).
In the classic regression specification, the This is also referred to as a spatial reaction
error terms have mean zero (E[i ] = 0, i), function (Brueckner, 2003). In the spatial
and they are identically and independently regression equation, this is accomplished
distributed (i.i.d.). Consequently, their vari- by including a function of the dependent
ance is constant, Var[i ] = 2 , and they are variable observed at other locations on the
uncorrelated, E[i j ] = 0, for all i, j. right-hand side:
In matrix notation, the N observations on
the dependent variable are stacked in an N 1
yi = g(yJi , ) + xi + i (14.4)
vector y, the observations on the explanatory
variables in an N K matrix X, and the
random error terms in an N 1 vector , where Ji includes all the neighboring loca-
such that: tions j of i, with j = i. The function g can be
very general (and non-linear), but typically is
y = X + (14.3) simplified by using a spatial weights matrix
(see also Chapter 8 in this volume). The
N N spatial weights matrix W has non-
with E[] = 0 (an N 1 vector of zeros), and zero elements wij in each row i for those
E[] = 2 I (with I as the identity matrix). columns j that are neighbors of location i.
Spatial dependence is introduced into The notion of neighbors is very general, and
this specification in two major ways, one not limited to geographical concepts, but can
referred to as spatial lag dependence, the readily be extended to neighbors in social
other as spatial error dependence (Anselin, network space (Leenders, 2002).
1988b). While the former pertains to spatial A so-called mixed regressive, spatial
correlation in the dependent variable, the autoregressive model (Anselin, 1998b) then
latter refers to the error term. Spatial takes on the form:
autocorrelation can also be introduced in
the explanatory variables, in so-called spatial
cross-regressive models (Florax and Folmer, yi = wij yj + xi + i (14.5)
j
1992). However, in contrast to the lag and
error models, cross-regressive models do not
require the application of special estimation where is the spatial autoregressive coef-
methods. They will therefore not be further ficient, and the error term i is i.i.d.
considered here. Alternatively, in matrix notation:
14.2.1. Spatial lag models y = Wy + X + . (14.6)
A spatial lag model is a formal representation

of the equilibrium outcome of processes With a row-standardized spatial weights
of social and spatial interaction. Since the matrix (i.e., the weights standardized

such that j wij = 1, i), this amounts to expected value (since the errors all have
including the average of the neighbors as mean zero):
an additional variable into the regression
specification. This added variable is referred
to as a spatially lagged dependent variable, or E[y|X] = X +WX + 2W2 X +
a spatial lag. For example, in a model for tax (14.8)
rates of local communities, this would add
the average of the tax rates in the neighboring
locations as an explanatory variable. The powers of matching the powers of the
The inclusion of the spatial lag is similar to weights matrix (higher orders of neighbors)
an autoregressive term in a time series con- ensures that a distance decay effect is
text, hence it is called a spatial autoregressive present.
model, although there is a fundamental differ- Even when the spatial lag specification is
ence. Unlike time dependence, dependence in not necessarily the result of a process of
space is multidirectional, implying feedback interaction among agents, it remains a useful
effects and simultaneity. More precisely, if model to deal with spatial autocorrelation,
i and j are neighboring locations, then yj and can be interpreted as a filtering model.
enters on the right-hand side in the equation More precisely, moving the spatial lag term
for yi , but yi also enters on the right-hand to the left-hand side reveals:
side in the equation for yj (the neighbor
relation is symmetric). This endogeneity

must be accounted for in the estimation yi = yi wij yj = xi + i (14.9)
process. j
The proper solution to the equations for
all observations is the so-called reduced
form, which no longer contains any spatially i.e., a standard regression model in a
lagged dependent variables on the right- dependent variable yi from which the spatial
hand side. After some matrix algebra, this correlation has been removed (filtered).
follows as: Unlike detrending time series data, however,
the parameter cannot take on the value
of 1 and must be estimated jointly with the
y = (I W)1 X + (I W)1 other parameters of the model. The spatial
(14.7) filtering interpretation is often useful when
there is a mismatch between the spatial
scale of observations and the spatial scale at
a model that is nonlinear in and and has which the phenomenon of interest manifests
a spatially correlated error structure (more itself. For example, this would be the case
precisely, a spatial autoregressive structure, when a regional phenomenon (e.g., a labor
see below). More importantly, this reveals market or housing market) is measured at
the spatial multiplier, i.e., the notion that a subregional scale, resulting in a high
the value of y at any location i is not only degree of positive spatial autocorrelation
determined by the values of x at i, but also (very little change across the sub-regional
of x at all other locations in the system. scale). In that situation, the estimation of the
This can be seen after a simple expansion spatial lag model will yield estimates for
of the inverse matrix term (for | | < 1 and the parameters that properly control for
with a row-standardized W), and using the the spatial autocorrelation.
commonly used in analyses of real estate

14.2.2. Spatial error models
markets, as reviewed in Dubin et al.
In spatial error models, the spatial auto- (1999).
correlation does not enter as an additional The choice of the function and of the
variable in the model, but instead affects distance metric needs to be made very
the covariance structure of the random carefully, in order to ensure that the result-
disturbance terms. The typical motivation ing variancecovariance matrix is positive
for this is that unmodeled effects spill definite. A common choice is a negative
over across units of observation and hence exponential distance decay function. This
result in spatially correlated errors. For results in an error variancecovariance matrix
example, in hedonic house price models, it of the form:
is often assumed that neighborhood effects
that are hard (or impossible) to quantify
are shared by houses in similar locations E[] = 2 [I + ] (14.10)
and thus appear as spatially correlated
errors (Dubin, 1988). More recently, a
where the variance is accounted for in the first
theoretical framework based on common
term, is a non-negative scaling parameter,
shocks has been suggested as a mecha-
and the off-diagonal elements of are
nism to motivate spatially correlated errors
ij = edij .
(Andrews, 2005).
A second approach obtains structure for
Spatial error autocorrelation is a special
the error covariance matrix by specifying a
case of a non-spherical error covariance
spatial process for the random disturbance.
matrix, in which the off-diagonal elements
A number of processes may be considered,
are non-zero, i.e., E[i j ] = 0, for i = j,
each yielding a different covariance structure,
or, in matrix notation, E[] = . The value
expressed as a function of one or two
and pattern of the non-zero covariances
parameters. The most common choice is a
are the outcome of a spatial ordering.
spatial autoregressive process, or SAR:
In a cross-section, it is impossible to
extract this ordering from the data directly,

since there are potentially N (N 1)/2 i = wij j + ui (14.11)
covariance parameters and only N obser- j
vations to estimate them from. Hence, it
is necessary to impose structure and to
with as the autoregressive parameter and
obtain estimates from a more parsimonious
ui as a random error term, typically assumed
specification.
to be i.i.d. In matrix notation, this is
The spatial covariance structure can be
equivalent to:
obtained in a number of ways. One of
the earliest suggestions was a so-called
direct representation. In this, each covariance = W + u. (14.12)
between a pair of observations i j is
specified as a parameterized function f (with
parameter vector ) of the distance dij Solving this for the full vector or errors
between them, or, E[i j ] = 2 f (dij , ). yields:
Early applications of this approach were
Cook and Pocock (1983) and Mardia
and Marshall (1984). More recently, it is = (I W)1 u (14.13)
with E[uu] = 2 I, so that the complete error of an SMA (see also Anselin and Moreno,
variancecovariance matrix follows as: 2003). The common shocks framework
outlined in Andrews (2005) can encompass
general factor structures yielding different
E[] = 2 (I W)1 (I W )1 . specifications for the range and strength of
(14.14) spatial autocorrelation. This approach has
seen increased application in recent work on
spatial autocorrelation in panel data models
Even though the spatial weights matrix (Pesaran, 2005).
W may contain only a few neighbors for A final approach to provide structure to
each observation, the variancecovariance spatial error variancecovariance matrices is
structure that results from the SAR process based on a non-parametric rationale, which
is a non-sparse matrix, representing a global is particularly appropriate for local patterns
pattern of spatial autocorrelation. Moreover, of spatial autocorrelation. Using the formal
unless the number of neighbors is constant properties for a kernel estimator of spatial
for each observation, the diagonal elements autocovariance established by Hall and Patil
in the variancecovariance matrix will not (1994), a general non-parametric covariance
be constant, resulting in heteroskedasticity. matrix estimator has been suggested by
This induced heteroskedasticity is a distin- Conley (1999), and, more recently, by
guishing characteristic for spatial processes, Kelejian and Prucha (2007).
and it complicates specification testing and
estimation. More precisely, since many of the
theoretical asymptotic results in time series
analysis are based on assumptions of constant 14.3. HIGHER ORDER MODELS
variance, they do not translate directly to
spatial processes; for technical details, see, In addition to the basic spatial lag and spatial
e.g., Anselin (2006). error models just reviewed, higher order
Other spatial processes used to provide models can be specified as well, by including
structure to the error variancecovariance multiple weights matrices, by combining
matrix include a conditional autoregressive lag and error structures, and by including
process (CAR) and a spatial moving average specification for spatial heterogeneity jointly
process (SMA). The CAR model is often with spatial dependence. An extensive review
used as a prior in hierarchical Bayesian spec- of these specifications can be found in
ifications, whereas the SMA specification Anselin (2006).
is appropriate for local patterns of spatial
autocorrelation (for details, see Anselin,
2006).
Error component models have been sug- 14.4. SPECIFICATION TESTS
gested as well, and some recent theoretical
results provide the basis for a wide range In empirical practice, there are often no
of structures for error spatial autocorrelation. strong a priori reasons to consider a
In Kelejian and Robinson (1992), an error spatial lag or spatial error model in a
decomposition was proposed that combined cross-sectional situation. Instead, the need
a local or location-specific component with for such a specification follows from the
a spillover component, yielding an error result of model diagnostics. Specifically,
variancecovariance structure similar to that diagnostic tests derived from the residuals of
a regression carried out by means of ordinary alternative model, referred to as focused

least squares (OLS) may point to violations tests, or are diffuse, in that the alternative
of the underlying assumptions, including the is an unspecified form of spatial correlation.
uncorrelatedness of errors. In the remainder of the section, diffuse or
Ignoring spatial autocorrelation when it is spatial autocorrelation tests are considered
in fact present has different consequences, first, followed by focused tests based on the
depending on whether the correct model is maximum likelihood principle. The section
a spatial lag or a spatial error specifica- concludes with a discussion of the practice
tion. Ignoring a spatially lagged dependent of a specification search.
variable is equivalent to an omitted variable
error, and will yield OLS estimates for
the model coefficients that are biased and 14.4.1. Spatial autocorrelation
inconsistent. On the other hand, ignoring tests
spatially correlated errors is mostly a problem
of efficiency, in the sense that the OLS coeffi- Arguably the best known test statistic against
cient standard error estimates are biased, but spatial autocorrelation is the application of
the coefficient estimates themselves remain Morans I statistic for spatial autocorrela-
unbiased. However, to the extent that the tion (Moran, 1948) to regression residuals
spatially correlated errors mask an omitted (Moran, 1950), popularized in the work of
variable, the consequences of ignoring this Cliff and Ord (1972, 1973, 1981). This
may be more serious. statistic corrects the well known Morans I
The problem at hand is the extent to for the fact that the random variable under
which any systematic spatial patterning in the consideration is a regression residual. As a
residuals provides evidence to reject the null result, inference is based on analytical and
hypothesis of uncorrelated errors. There are asymptotic results, but should not rely on the
two complications in this respect. One is that familiar permutation approach (Anselin and
the null hypothesis pertains to the error terms, Rey, 1991; Schmoyer, 1994).
which are not observable. Instead, one has to Morans I for regression residuals is then:
deal with residuals. OLS regression residuals
are already correlated by construction, since
eWe/S0
they are derived from a common set of data. I= (14.15)
e e/N
Hence, simply concluding from correlated
residuals that the errors are also correlated
may be spurious. Another complication is that where e is a N 1 vector of OLS residuals
the rejection of the null hypothesis does not y X, W is a spatial weights matrix, and

necessarily suggest a given spatial model as S0 = i j wij , a normalizing factor.
the proper alternative. Both spatial lag and In practice, inference in Morans I test
spatial error alternatives will, when ignored, can be based on a normal approximation,
lead to OLS residuals that are spatially using a standardized value, or z-value. This
correlated. In addition, since several spa- is obtained by subtracting the mean under the
tial processes also result in heteroskedastic null and dividing by the square root of the
errors, distinguishing true heteroskedasticity variance. The first two moments were derived
from this type of induced heteroskedasticity in Cliff and Ord (1972) as:
will constitute an added complication.
Specification tests against spatial auto-
correlation are either based on a specific E[I] = tr(MW)/(N K) (14.16)
and: (2SLS) regression estimation. Kelejian and

Prucha (2001) formulate a general framework
to obtain the asymptotic properties of the
Var[I] = statistic in a wide range of contexts. Ellner
tr(MWMW ) + tr(MWMW) + [tr(MW)]2 and Seifu (2002) use Morans I as a model
(N K)(N K + 2) diagnostic to select the proper bandwidth for
(E[I])2 (14.17) kernel estimators in semi-parametric models.
In this application, the weights matrix does
not pertain to geographic locations, but to
where tr is a matrix trace operator and locations in variable space.
M = I X(X X)1 X . The normality of the An alternative to Morans I as a test
z-value is an approximation, which works statistic against an unspecified form of spatial
well in large samples. Alternatives are to autocorrelation was suggested by Kelejian
use exact inference (under the assumption of and Robinson (1992). Theirs is a large sample
Gaussian error terms, as in Tiefelsdorf and test, which does not require an assumption
Boots, 1995), or a saddlepoint approximation of normality and can be applied in nonlinear
(Tiefelsdorf, 2002). models as well. In Kelejian and Robinson
Morans I has been shown to have certain (1998, 2004), this principle is extended to
optimal properties, similar to the Durbin include both heteroskedasticity and error
Watson test against serial correlation in the autocorrelation as the alternative.
time domain (King, 1981). Also, it turns
out to be asymptotically equivalent to a
likelihood ratio (LR) test and to a Lagrange
14.4.2. Maximum likelihood
multiplier (LM) test (Cliff and Ord, 1972;
based tests
Burridge, 1980), and therefore shares the
asymptotic properties of these statistics. In contrast to diffuse spatial autocorrelation
Morans I has power against any alter- tests, focused tests are constructed with a
native of spatial correlation, including spa- specific alternative in mind, such as a spatial
tial lag dependence, as demonstrated in a lag or a spatial error specification. In general,
large number of Monte Carlo simulation they boil down to a test of restrictions
experiments (see, e.g., Anselin and Rey, on the parameters of a spatial regression
1991; Anselin and Florax 1995b; Florax and model. For example, for a spatial lag model,
de Graaff, 2004). In addition, not unlike the null hypothesis would be H0 : = 0,
the DurbinWatson statistic, the test has such that the restricted model would then
power against heteroskedasticity as well be y = X + . The alternative hypothesis
(Anselin and Griffith, 1988). In practice, is then that H1 : = 0, such that the
this complicates specification testing in that unrestricted model is y = Wy + X + .
without further evidence, it will be difficult The three classic test statistics obtained under
to conclude whether a spatial model, a maximum likelihood (ML) estimation are
heteroskedastic model, or a combination of the Wald, likelihood ratio, and Lagrange
the two is the proper alternative. multiplier (or, Rao score) tests.
Morans I test statistic is very general and The Wald, or asymptotic t-test is simply a
can be applied in many contexts other than significance test on the spatial autoregressive
the classic regression model. For example, in parameter in a spatial lag or spatial error
Anselin and Kelejian (1997) it is extended model, based on the results of estimation
to residuals from a two stage least squares by means of maximum likelihood of the
unrestricted (spatial) model. This requires and Anselin (1988a), the statistic is:
both the point estimate of the parameter as
well as an estimate of the asymptotic variance
matrix (for technical details, see Anselin, [eWe/(e e/N)]2
LM = (14.18)
1988b, Ch. 6). tr[WW + WW]
The likelihood ratio test statistic is
obtained in the standard manner as well, where e is a N 1 vector of OLS residuals,
as twice the difference between the log- and tr stands for the trace operator (the sum
likelihood of the unrestricted (i.e., the spatial) of the diagonal elements of a matrix). Except
model, and that of the restricted model for the scaling factor in the denominator, this
(i.e., the standard regression without spatial statistic is essentially the square of Morans I.
autocorrelation). This thus requires the esti- It is asymptotically distributed as 2 (1).
mation of two models, and an assumption Using similar principles, the LM lag
of normality for the OLS regression. The statistic follows as:
statistic is asymptotically distributed as 2 (1)
(see Anselin, 1988b, Ch. 6).
The Lagrange multiplier (LM) test only LM = [eWy/(e e/N)]2 /D (14.19)
requires estimation of the model under the
null hypothesis of no spatial dependence. It
therefore lends itself well to specification with e as the OLS residuals, and the
searches in practice, since the extra step of denominator term:
estimating a spatial lag or spatial error model
can often be avoided. In the spatial case, 1
the LM statistic does not follow the standard D = [(WX) [I X(X X) X ](WX)/ 2 ]
result from econometrics, where in many + tr(WW + WW) (14.20)
instances it can be obtained as a measure
of fit in an auxiliary regression. Instead, it
needs to be derived explicitly, as in Burridge where the estimates for and 2 are from
(1980) and Anselin (1988a) (for extensive OLS. The test statistic is asymptotically
technical details, see also Anselin and Bera, distributed as 2 (1).
1998; Anselin, 2001a). A related test statistic, also based on the
Even though the LM statistic is constructed maximum likelihood principle, applies the
from the OLS residuals, a complete alter- idea of double length artificial regressions
native model must be specified. In some (DLR, Davidson and MacKinnon, 1984,
instances, two different alternatives yield the 1988) to tests for spatial error and spatial
same LM statistic. These are called locally lag dependence (Baltagi and Li, 2001a).
equivalent alternatives (Godfrey, 1981). SAR The DLR approach consists of expressing
and SMA error processes fall into this cate- the regression model as a function of
gory. As a result, a LM test statistic against standard normal error terms. In the spatial
spatial error autocorrelation cannot distin- models, this follows as a simple standard-
guish between these two different processes. ization (for technical details, see Baltagi and
In practice, this affects the interpretation of Li, 2001a).
the results, since SAR is a global spatial The LM principle can be applied to
process, while SMA is local. alternatives other than the SAR/SMA error
The LM error statistic is very similar to processes or the spatial lag model. Test
Morans I. As shown in Burridge and (1980) statistics can be derived for higher-order
processes (multiple orders of contiguity), and (1993), robust versions of these test statistics
for different error models, such as spatial have been developed in Anselin et al.
error components or direct representation (1996) (see also Anselin and Bera, 1998,
(Anselin, 2001a; Anselin and Moreno, 2003). pp. 273278).
So far, only a single alternative has been A second strategy is that of a joint test,
taken into account. However, in practice, where the null hypothesis is to set all spatial
it is often more reasonable to consider an parameters equal to zero. For example, for the
alternative hypothesis that contains both a spatial lag model with a SAR or SMA error
spatial lag and spatial error autocorrelation: term, H0 : = = 0. In contrast to standard
results in the econometric literature, the joint
test statistic is not simply the sum of the
y = Wy + X + (14.21) marginal test statistics, i.e., LM = LM +
LM , but it takes on a far more complex form
(Anselin, 1988a).
with: A third strategy is a so-called conditional
approach, where a test on the null hypothesis
= 0 is carried out in a model with = 0,
= W + u (14.22)
and vice versa. This can no longer be based
on OLS estimates, but requires estimation of
a SARSAR model, or, with: the proper spatial model by means of ML.
Using the same principles as before, but now
with the residuals of the ML estimation, a
= Wu + u (14.23) test statistic for H0 : = 0 in the spatial
lag model (i.e., with = 0) can be derived.
Similarly, a test statistic can be constructed
a SARMA model. for H0 : = 0 in the spatial error model
In this more general case, there are (i.e., with = 0). While straightforward, the
three ways to proceed. One is as before, derivations are quite tedious and the resulting
considering a one-directional alternative only test statistics complex (for technical details,
and ignoring the other form of spatial see Anselin, 1988a; Anselin et al., 1996;
autocorrelation. For example, the LM error Anselin and Bera, 1998).
test above has the null hypothesis H0 : = 0, The LM principle can also be extended
irrespective of the value of , which is to multiple sources of mis-specification, such
considered to be a nuisance parameter. This as spatial dependence and heteroskedasticity
is referred to as a marginal test. (Anselin, 1988b), or spatial dependence
A problem with the marginal approach is and functional mis-specification (Baltagi and
that the LM and LM test statistics are no Li, 2001b).
longer 2 (1) in the presence of local mis-
specification in the form of the other type
of spatial dependence, but they become non-
14.4.3. Specication search
central 2 . In other words, in the presence of
spatial lag dependence, the LM test against In practice, the sheer number of available
error correlation becomes biased, and, in the test statistics can seem overwhelming and a
presence of spatial error dependence, the strategy needs to be developed to move from
LM test against lag dependence becomes the null model to a superior alternative (when
biased. Using a result of Bera and Yoon appropriate). Given that tests may be based
on marginal, joint, or conditional approaches, out to yield consistent estimates (Lee, 2002;
the results of a specification search may be Kelejian and Prucha, 2002).
subject to the order in which tests are carried Two general sets of methods have been
out, and whether or not adjustments are made developed to address the estimation of
for pre-testing (see, e.g., Florax and Folmer, spatial regression models, one based on the
1992; Anselin and Florax, 1995b; Florax and maximum likelihood (ML) principle, the
de Graaff, 2004). other on the (general) method of moments
Based on a large number of simulation (GMM). Each will be considered in turn,
results, an ad hoc decision rule was suggested followed by a brief overview of semi-
in Anselin and Rey (1991) for the simple case parametric methods.
of choosing between a spatial lag or spatial
(SAR) error alternative. There is considerable
evidence that the proper alternative is most
likely the one with the largest significant LM 14.5.1. Maximum likelihood
test statistic value. This was later refined in estimation
light of the robust forms of the statistics
The point of departure for maximum likeli-
in Anselin et al. (1996). In a recent paper
hood estimation in spatial regression models
by Florax et al. (2003), this classic forward
is an assumption of normality for the error
stepwise specification search is compared to
term. In general, allowing for heteroskedas-
a general-to-simple model selection rule
ticity and/or error correlation, the N 1 error
(for further discussion, see also Florax et al.,
vector has a multivariate normal distribution,
2006; Hendry, 2006).
N(0, ), with the subscript denoting
that may be a function of a p 1 vector
of parameters. In the commonly considered
i.i.d. case, this simplifies to N(0, 2 I),
14.5. ESTIMATION
with = 2 .
To move from the likelihood for the
The estimation problems associated with
error vector to a likelihood for the observed
spatial regression models are distinct for the
dependent variable, a Jacobian of the
spatial lag and spatial error case. Spatial error
transformation needs to be inserted, which
models are special instances of specifications
corresponds to the determinant |I W|
with a non-spherical error. As a result, OLS
in the spatial lag model, and |I W|
may still be applied, as long as the estimated
in the spatial error model. The presence
standard errors are adjusted to take into
of the Jacobian term constitutes a major
account the error correlation. In contrast,
computational complication.
the inclusion of a spatially lagged dependent
Using the standard result for a multivariate
variable in a regression specification yields
normal distribution, and taking into account
a form of endogeneity. As a result, for
the Jacobian term, the log-likelihood for the
most spatial weights used in practice, OLS
spatial lag model follows as:
in the spatial lag model is not an appro-
priate method, and the simultaneity must
be accounted for explicitly. An exception
to this general rule is when the weights L = (N/2)(ln2 )(1/2)ln| |
represent subgroups in the data (i.e., all the +ln|IW|(1/2)(yWyX)
observations in the same group are neighbors
of each other), in which case OLS turns 1 (yWyX). (14.24)
Maximizing the log-likelihood is not likelihood:

equivalent to minimizing weighted least
squares (the last term in L), as in the
standard linear regression model. The L = (N/2) ln (2 ) (1/2) ln | |
main difference is in the presence of the (y X) 1 (y X). (14.26)
log-Jacobian term ln|IW|. This illustrates
informally how weighted least squares will
not yield a consistent estimator in the With a consistent estimate for the parame-
spatial lag model, due to the endogeneity ters i , consistent estimates for are obtained
in the Wy term. The log-Jacobian also through feasible generalized least squares
implies constraints on the parameter (FGLS).
space for , which must be such that Each spatial error process will result in a
|I W| > 0. specialized form for . For example, for
Maximum likelihood estimates for , a SAR error process without heteroskedas-
, and are obtained as solutions to ticity, the corresponding parameter vector is
the usual first-order conditions, requiring = [ 2 , ]. The FGLS estimator in this
numerical optimization (for technical details, model simplifies to:
see Ord (1975), Cliff and Ord (1981),
Anselin (1980, 1988b, 2006), Anselin and
Bera (1998), among others). Inference is ML = [X (I W) (I W)X]1
based on an asymptotic variance matrix,
the inverse of the information matrix (see X (I W) (I W)y (14.27)
Anselin, 1980, 1988b).
Even though the principles of ML esti-
mation in a spatial lag model were laid out or, a regression of spatially filtered yL =
more than 30 years ago by Ord (1975), it was y Wy on spatially filtered XL = X WX.
only very recently that the formal proofs were This is referred to as spatially weighted least
developed that established the conditions squares. Unlike the time series counterpart, a
under which consistency and asymptotic consistent estimate for cannot be obtained
normality of this estimator are obtained from a simple auxiliary regression, but the
(Lee, 2004). first-order condition must be solved explicitly
Maximum likelihood estimation of the by numerical means. As for the spatial
parameters in models with spatially depen- lag model, asymptotic inference is based
dent error terms follows as a special case on the inverse of the information matrix
of the results in Magnus (1978). For a (for technical details, see Anselin, 1988b,
general non-spherical error term , with Chapter 6).
as the parameters, the ML estimator for Maximum likelihood estimation in spatial
is the familiar generalized least squares regression models involves the application
expression: of nonlinear optimization techniques to the
log-likelihood function. A main computa-
tional obstacle follows from the presence
ML = (X 1 X)1 X 1 y. (14.25) of the log-Jacobian term ln|I W| in
the log-likelihood. In addition, the first-
order conditions and information matrix
This follows as the solution of the involve the traces of matrix products such as
first-order conditions, applied to the log- W(I W)1 . For even medium-sized data
sets, the computation of these terms by brute moments). This approach does not require an
force is impractical. assumption of normality and it avoids some
An early solution was suggested by Ord of the computational problems associated
(1975), who exploited the decomposition of with ML for very large data sets.
the Jacobian in terms of the eigenvalues of The spatial lag model can be formu-
the spatial weights matrix. This facilitates lated as a linear model that contains an
computation greatly, since the eigenvalues endogenous variable (Wy) and exogenous
only need to be calculated once. The trace variables (X):
terms used in the information matrix can be
expressed in terms of the eigenvalues as well
(Anselin, 1980). y = Z + (14.28)
The computation of eigenvalues becomes
impractical and computationally unstable for
medium and large-sized data sets (n > 1000). with Z = [Wy, X] and = [, ]. A classic
This precludes the application of the Ord solution to the endogeneity problem is to use
approach. Several alternatives have been instrumental variables. A matrix of additional
suggested that either approximate or bound variables Q (N q) is used to obtain an
the Jacobian or log-Jacobian term (e.g., instrument for the spatially lagged dependent
Martin, 1993; Griffith and Sone, 1995; Barry variable:
and Pace, 1999; Pace and LeSage, 2002,
2004a), or exploit the sparse nature of spatial
& = Q(Q Q)1 Q Wy
Wy (14.29)
weights (Pace and Barry, 1997a, b; Smirnov
and Anselin, 2001).
A second important computational prob-
such that '
Z = [Wy,& X], resulting in the
lem pertains to the presence of terms like
spatial two-stage least squares estimator
tr[W(I W)1 ]2 in the information matrix.
(S2SLS):
The calculation of these inverse matrices is
impractical in large data settings. As a result,
most large data ML methods developed so far S2SLS = ['
Z'
Z]1'
Zy. (14.30)
have not based inference on the asymptotic
variance matrix, but instead use a sequence
of likelihood ratio tests. Recently, Smirnov Inference on the S2SLS is based on the
(2005) developed a solution to this problem, asymptotic variance matrix:
based on the use of a conjugate gradient
approach.
1
AsyVar[S2SLS ] = 2 [Z Q(Q Q) Q Z]1
(14.31)
14.6. INSTRUMENTAL
VARIABLES/METHOD OF with 2 = (y ZS2SLS ) (y ZS2SLS )/N.
MOMENTS ESTIMATION The application of instrumental variables
to the spatial lag model was initially outlined
An alternative to maximum likelihood esti- in Anselin (1980, 1988b, pp. 8286), where
mation is the use of the method of moments some ad hoc suggestions were made for the
(including instrumental variables, general- selection of the instruments (see also Land
ized method of moments, and generalized and Deane (1992) for an early discussion).
Specifically, the choice of a spatial lag of results also yield an asymptotic variance
the predicted values of the y (using only matrix, so that tests of significance can be
the exogenous variables) or of spatially carried out on the spatial parameters as well.
lagged exogenous variables was considered.
In Kelejian and Robinson (1993), proof is
provided of the consistency of S2SLS and
14.6.1. Semi-parametric methods
the selection of instruments is couched in
terms of the reduced form. This suggests Semi-parametric methods provide a compro-
the use of a subset of columns from mise between a full parametric specification
{X, WX, W2 X, W3 X, . . . } as the instruments and a non-parametric approach where the
(see also Kelejian and Prucha, 1998). parameters are completely determined by
Recent work has focused on the selection the data, with very little prior structure.
of optimal instruments (Lee, 2003; Das The combination of a full specification of
et al., 2003; Kelejian et al., 2004), and the parts where theory or previous results
on establishing formal proofs of consistency provide a strong support for the model and
and asymptotic normality. In Lee (2007), the relaxing the functional and distributional
S2SLS estimator is compared to a GMM assumptions for the rest has become very
method with superior asymptotic properties. attractive, especially when large data sets
Extensions of the instrumental variables provide ample information (for a recent
approach to systems of simultaneous equa- review, see Horowitz and Lee, 2002).
tions are considered in Rey and Boarnet While by far the predominant paradigm
(2004) and Kelejian and Prucha (2004). in spatial regression analysis is the para-
Moment methods have been developed to metric approach, the use of semi-parametric
address spatial error autocorrelation as well, techniques has seen a recent increase and
both in isolation as well as in combination is an area of very active research, both
with a spatial lag model (the SARSAR theoretical as well as applied. A semi-
model). The basic results were obtained parametric approach has seen application in
by Kelejian and Prucha (1998, 1999), who four main areas in spatial regression analysis.
initially treated the spatial autoregressive One is as an alternative to specifying
coefficient in the error SAR process as a a specific spatial process for the error
nuisance parameter. Specifically, attention term. Instead, the error covariance may
focused on obtaining a consistent estimate be estimated in a non-parametric fashion.
for the nuisance parameter as the solution of This follows along the lines of the work
a set of moment conditions. This consistent in econometrics by White (1980) on a
estimate could then be used in a second heteroskedastic-consistent approach, and its
step of a FGLS estimation. One drawback extension to both heteroskedasticity and
of the nuisance parameter approach is that serial correlation by Newey and West
no inference can be carried out on the (1987), and others. The incorporation of
spatial autoregressive parameter, since no spatial dependence in this framework was
result existed on its asymptotic variance. In first considered by Conley (1999) in the
recent work by Lin and Lee (2005) and context of GMM estimation, and recently
Kelejian and Prucha (2006), this problem has elaborated upon in Kelejian and Prucha
been alleviated, in the context of an extended (2007) (see also Chen and Conley (2001),
set of moment conditions that account for for a related approach). The basic idea is to
both spatial autoregressive errors as well as avoid specifying a particular spatial process
heteroskedasticity of unspecified form. Their or spatial weights matrix and to extract
the spatial covariance terms from weighted A semi-parametric spatial error model is
averages of cross-products of residuals, considered as well, using residuals from a
using a kernel function. This yields a so- non-parametric regression of y on g(X), as
called heteroskedastic and spatial autocorre- a special application of local linear weighted
lation consistent (HAC) estimator. The HAC least squares (Henderson and Ullah, 2005).
approach is asymptotic and in finite samples A fourth approach is akin to spatial
a major practical problem is to ensure that filtering, and purports to model unspecified
the estimated variancecovariance matrix is spatial spillover effects non-parametrically,
positive semidefinite. A number of sugges- in a so-called smooth spatial effects (SSE)
tions have been formulated, but considerable estimator. In Gibbons and Machin (2003), the
research remains to be done to obtain insight model considered is:
into finite sample properties (see Kelejian
and Prucha (2007), for some technical
details). yi = xi + g(ci ) + i (14.34)
In a second approach, the focus is on
relaxing the requirements to specify a spatial
where g is an unknown function, intended
weights matrix W in the construction of
to capture all spatial correlation, and ci
the spatially lagged dependent variable in a
represents the location of i. The model is
spatial lag model. In Pinkse et al. (2002), a
estimated by means of the classic two-step
model is considered of the form:
procedure suggested by Robinson (1988).
In the SSE estimator, both the dependent

yi = g(dij )yj + xi + i (14.32) variable and the explanatory variables are
j=i replaced by deviations from the conditional
expectation, which is obtained as a spatial
kernel smoother. OLS can be applied to the
in which the unspecified function g relates
transformed regression to obtain consistent
the values of y at other locations j to that
estimates for , (for a recent application, see
at i through a distance measure dij . The
Day et al., 2004).
function g is approximated by a polynomial
series expansion in distance measures, the
coefficient of which are estimated jointly with
the other parameters in the model. 14.7. CONCLUSION
In a third approach, suggested in the work
of Gress (2004a) (see also Gress (2004b), and The methodological toolbox for spatial
Basile and Gress (2005), for applications), regression has reached a certain maturity
the spatial weights specification is kept in the when it comes to the classical linear
spatial lag part, but the other variables enter regression model. However, much less has
into the model in a non-parametric way. For been accomplished beyond this context and
example, a semi-parametric spatial lag model the development of new models, estimation
takes the form: techniques and specification tests is a very
active area of research, both in statistics
as well as in econometrics. Given space
y = Wy + g(X) + (14.33)
constraints, it was impossible to review all
these efforts in a comprehensive way, but it
where g is an unspecified function, to is hoped that through the references provided
be estimated in a non-parametric way. an entry into this field has been facilitated.
Considerable theoretical research is ongo- Anselin, L. (2002). Under the hood. Issues in the
ing to develop the formal conditions and specication and interpretation of spatial regression
proofs needed to obtain the asymptotic models. Agricultural Economics, 27(3): 247267.
properties of estimators and tests in various Anselin, L. (2003). Spatial externalities. International
settings. New techniques are being developed Regional Science Review, 26(2): 147152.
to deal with spatial effects in panel data, Anselin, L. (2005). Spatial statistical modeling in a
count models, probit and tobit, and other GIS environment. In: Maguire, D.J., Batty, M. and
specifications that are the mainstay of Goodchild, M.F. (eds), GIS, Spatial Analysis and
Modeling, pp. 93111. Redlands, CA: ESRI Press.
applied empirical regression analysis. The
growth in applications is encouraging as Anselin, L. (2006). Spatial econometrics. In Mills, T.
well, providing a greater empirical basis and Patterson, K. (eds), Palgrave Handbook of
Econometrics: Volume 1, Econometric Theory,
to document the importance of location
pp. 901969. Basingstoke: Palgrave Macmillan.
and distance in explaining socioeconomic
phenomena. Lastly, while in the past the lack Anselin, L. and Bera, A. (1998). Spatial dependence
in linear regression models with an introduction
of software may have been an impediment
to spatial econometrics. In: Ullah, A. and Giles,
to the dissemination of spatial regression D.E. (eds), Handbook of Applied Economic Statistics,
methods, this is no longer the case, as attested pp. 237289. New York: Marcel Dekker.
by several active open source developments
Anselin, L., Bera, A., Florax, R.J. and Yoon, M.
(for a recent review, see Anselin, 2005, (1996). Simple diagnostic tests for spatial depen-
pp. 101106). dence. Regional Science and Urban Economics,
26: 77104.
Anselin, L. and Florax, R.J. (1995a). New Directions in
Spatial Econometrics. Berlin: Springer-Verlag.
Anselin, L. and Florax, R.J. (1995b). Small sample
REFERENCES properties of tests for spatial dependence in
regression models: Some further results. In
Andrews, D.W. (2005). Cross-section regression with Anselin, L. and Florax, R.J. (eds), New Directions
common shocks. Econometrica, 73: 15511585. in Spatial Econometrics, pp. 2174. Berlin: Springer-
Anselin, L. (1980). Estimation Methods for Spatial Verlag.
Autoregressive Structures. Regional Science Disser-
Anselin, L., Florax, R.J. and Rey, S.J. (2004). Advances
tation and Monograph Series, Cornell University,
in Spatial Econometrics. Methodology, Tools and
Ithaca, New York.
Applications. Berlin: Springer-Verlag.
Anselin, L. (1988a). Lagrange multiplier test diagnostics
Anselin, L. and Grifth, D.A. (1988). Do spatial effects
for spatial dependence and spatial heterogeneity.
really matter in regression analysis? Papers, Regional
Science Association, 65: 1134.
Anselin, L. (1988b). Spatial Econometrics: Methods
Anselin, L. and Kelejian, H.H. (1997). Testing for spatial
and Models. Dordrecht, The Netherlands: Kluwer
error autocorrelation in the presence of endogenous
Academic Publishers.
regressors. International Regional Science Review,
Anselin, L. (1992). Space and applied econometrics. 20: 153182.
Introduction. Regional Science and Urban Eco-
Anselin, L., Le Gallo, J. and Jayet, H. (2008). Spatial
nomics, 22: 307316.
panel econometrics. In: Matyas, L. and Sevestre, P.
Anselin, L. (2001a). Raos score test in spatial (eds), The Econometrics of Panel Data, Fundamentals
econometrics. Journal of Statistical Planning and and Recent Developments in Theory and Practice
Inference, 97: 113139. (3rd Edition), pp. 627662. Berlin: Springer-Verlag.
Anselin, L. (2001b). Spatial econometrics. In: Baltagi, B. Anselin, L. and Moreno, R. (2003). Properties of tests
(ed.), A Companion to Theoretical Econometrics, for spatial error components. Regional Science and
pp. 310330. Oxford: Blackwell. Urban Economics, 33(5): 595618.
Anselin, L. and Rey, S.J. (1991). Properties of tests Burridge, P. (1980). On the CliffOrd test for spatial
for spatial dependence in linear regression models. autocorrelation. Journal of the Royal Statistical
Geographical Analysis, 23: 112131. Society B, 42: 107108.
Anselin, L. and Rey, S.J. (1997). Introduction to the Chen, X. and Conley, T.G. (2001). A new semipara-
special issue on spatial econometrics. International metric spatial model for panel time series. Journal of
Regional Science Review, 20: 17. Econometrics, 105: 5983.
Arbia, G. (2006). Spatial Econometrics: Statistical Foun- Cliff, A. and Ord, J.K. (1972). Testing for spa-
dations and Applications to Regional Convergence. tial autocorrelation among regression residuals.
Berlin: Springer-Verlag. Geographical Analysis, 4: 267284.
Baltagi, B.H. and Li, D. (2001a). Double length Cliff, A. and Ord, J.K. (1973). Spatial Autocorrelation.
articial regressions for testing spatial dependence. London: Pion.
Econometric Reviews, 20(1): 3140.
Cliff, A. and Ord, J.K. (1981). Spatial Processes: Models
Baltagi, B.H. and Li, D. (2001b). LM tests for and Applications. London: Pion.
functional form and spatial error correlation.
Conley, T.G. (1999). GMM estimation with cross-
International Regional Science Review, 24(2):
sectional dependence. Journal of Econometrics,
194225.
92: 145.
Banerjee, S., Carlin, B.P. and Gelfand, A.E. (2004). Cook, D. and Pocock, S. (1983). Multiple regression
Hierarchical Modeling and Analysis for Spatial Data. in geographic mortality studies, with allowance for
Boca Raton: Chapman & Hall/CRC. spatially correlated errors. Biometrics, 39: 361371.
Barry, R.P. and Pace, R.K. (1999). Monte Carlo Cressie, N. (1993). Statistics for Spatial Data. New York:
estimates of the log determinant of large sparse Wiley.
matrices. Linear Algebra and its Applications, 289:
4154. Das, D., Kelejian, H.H. and Prucha, I.R. (2003). Finite
sample properties of estimators of spatial autore-
Bartels, C. and Ketellapper, R. (1979). Exploratory gressive models with autoregressive disturbances.
and Explanatory Analysis of Spatial Data. Boston: Papers in Regional Science, 82: 127.
Martinus Nijhoff.
Davidson, R. and MacKinnon, J.G. (1984). Model
Basile, R. and Gress, B. (2005). Semi-parametric specication tests based on articial regressions.
spatial auto-covariance models of regional International Economic Review, 25: 485502.
growth in Europe. Rgion et Dveloppement, 21:
93118. Davidson, R. and MacKinnon, J.G. (1988). Double-
length articial regression. Oxford Bulletin of
Bera, A. and Yoon, M.J. (1993). Specication testing Economics and Statistics, 50: 203217.
with misspecied alternatives. Econometric Theory,
9: 649658. Day, B., Bateman, I. and Lake, I. (2004). Omitted loca-
tional variates in hedonic analysis: A semiparametric
Beron, K.J., Murdoch, J.C., and Vijverberg, W.P. (2003). approach using spatial statistics. Working Paper
Why cooperate? Public goods, economic power, and 0404, Center for Social and Economic Research on
the Montreal Protocol. The Review of Economics and the Global Environment (CSERGE), University of East
Statistics, 85(2): 286297. Anglia, UK.
Besag, J. (1974). Spatial interaction and the statistical Dubin, R. (1988). Estimation of regression coefcients
analysis of lattice systems. Journal of the Royal in the presence of spatially autocorrelated errors.
Statistical Society B, 36: 192225. Review of Economics and Statistics, 70: 466474.
Brock, W. and Durlauf, S. (2001). Discrete choice with Dubin, R., Pace, R.K. and Thibodeau, T.G. (1999).
social interactions. Review of Economic Studies, 59: Spatial autoregression techniques for real estate
235260. data. Journal of Real Estate Literature, 7: 7995.
Brueckner, J.K. (2003). Strategic interaction among Durlauf, S.N. (2004). Neighborhood effects. In:
governments: An overview of empirical studies. Henderson, J. and Thisse, J.-F. (eds), Handbook
International Regional Science Review, 26(2): of Regional and Urban Economics, Volume 4,
175188. pp. 21732242. Amsterdam: North Holland.
Elhorst, J.P. (2001). Dynamic models in space and time. Gibbons, S. and Machin, S. (2003). Valuing English
Geographical Analysis, 33: 119140. primary schools. Journal of Urban Economics, 53:
197219.
Elhorst, J.P. (2003). Specication and estimation of
spatial panel data models. International Regional Godfrey, L. (1981). On the invariance of the Lagrange
Science Review, 26(3): 244268. Multiplier test with respect to certain changes
in the alternative hypothesis. Econometrica, 49:
Ellner, S.P. and Seifu, Y. (2002). Using spatial statistics 14431455.
to select model complexity. Journal of Computational
and Graphical Statistics, 11: 348369. Goodchild, M.F., Anselin, L., Appelbaum, R. and
Harthorn, B. (2000). Toward spatially integrated
Fischer, M.M., Reismann, M. and Scherngell, T. (2006). social science. International Regional Science
From conventional to spatial econometric models Review, 23(2): 139159.
of spatial interaction. Paper presented at the Fifth
International Workshop on Spatial Econometrics and Gotway, C.A. and Stroup, W.W. (1997). A generalized
Statistics, Rome, Italy, May 2006. linear model approach to spatial data analysis and
prediction. Journal of Agricultural, Biological and
Fleming, M. (2004). Techniques for estimating spatially Environmental Statistics, 2(2): 157178.
dependent discrete choice models. In: Anselin, L.,
Florax, R.J. and Rey, S.J. (eds), Advances in Spatial Gotway, C.A. and Wolnger, R.D. (2003). Spatial
Econometrics, pp. 145168. Heidelberg: Springer- prediction of counts and rates. Statistics in Medicine,
Verlag. 22: 14151432.
Gress, B. (2004a). Semi-Parametric Spatial Autocovari-
Florax, R. and Folmer, H. (1992). Specication and
ance Models. PhD thesis, University of California,
estimation of spatial linear regression models: Monte
Riverside, CA.
Carlo evaluation of pre-test estimators. Regional
Science and Urban Economics, 22: 405432. Gress, B. (2004b). Using semi-parametric spatial
autocorrelation models to improve hedonic housing
Florax, R.J. and de Graaff, T. (2004). The performance
price prediction. Working paper, Department of
of diagnostic tests for spatial dependence in linear
Economics, University of California, Riverside, CA.
regression models: A meta-analysis of simulation
studies. In Anselin, L., Florax, R.J. and Rey, S.J. (eds), Grifth, D.A. (1988). Advanced Spatial Statistics.
Advances in Spatial Econometrics. Methodology, Dordrecht: Kluwer Academic.
Tools and Applications, pp. 2965. Berlin: Springer- Grifth, D.A. and Sone, A. (1995). Trade-offs
Verlag. associated with normalizing constant computational
Florax, R.J., Folmer, H. and Rey, S.J. (2003). Specica- simplications for estimating spatial statistical models.
tion searches in spatial econometrics: The relevance Journal of Statistical Computation and Simulation,
of Hendrys methodology. Regional Science and 51: 165183.
Urban Economics, 33(5): 557579. Haining, R. (1990). Spatial Data Analysis in the Social
Florax, R.J., Folmer, H. and Rey, S.J. (2006). A comment and Environmental Sciences. Cambridge: Cambridge
on specication searches in spatial econometrics: University Press.
The relevance of Hendrys methodology: A reply. Haining, R. (2003). Spatial Data Analysis: Theory and
Regional Science and Urban Economics, 36: Practice. Cambridge: Cambridge University Press.
300308.
Hall, P. and Patil, P. (1994). Properties of nonparametric
Florax, R.J.G.M. and van der Vlist, A. (2003). estimators of autocovariance for stationary random
Spatial econometric data analysis: moving beyond elds. Probability Theory and Related Fields, 99:
traditional models. International Regional Science 399424.
Review, 26(3): 223243.
Henderson, D.J. and Ullah, A. (2005). A nonparametric
Fortin, M.-J. and Dale, M. (2005). Spatial Analysis: random effects estimator. Economics Letters, 88:
A Guide for Ecologists. Cambridge: Cambridge 403407.
University Press.
Hendry, D.F. (2006). A comment on specication
Getis, A., Mur, J. and Zoller, H.G. (2004). Spatial searches in spatial econometrics: The relevance of
Econometrics and Spatial Statistics. London: Palgrave Hendrys methodology. Regional Science and Urban
Macmillan. Economics, 36: 309312.
Horowitz, J.L. and Lee, S. (2002). Semiparametric Kelejian, H.H. and Robinson, D.P. (1993). A suggested
methods in applied econometrics: Do the models t method of estimation for spatial interdependent
the data? Statistical Modelling, 2: 322. models with autocorrelated errors, and an appli-
cation to a county expenditure model. Papers in
Kelejian, H.H. and Prucha, I. (1998). A generalized
Regional Science, 72: 297312.
spatial two stage least squares procedures for
estimating a spatial autoregressive model with Kelejian, H.H. and Robinson, D.P. (1995). Spatial corre-
autoregressive disturbances. Journal of Real Estate lation: A suggested alternative to the autoregressive
Finance and Economics, 17: 99121. model. In: Anselin, L. and Florax, R.J. (eds),
New Directions in Spatial Econometrics, pp. 7595.
Kelejian, H.H. and Prucha, I. (1999). A generalized
Berlin: Springer-Verlag.
moments estimator for the autoregressive parameter
in a spatial model. International Economic Review, Kelejian, H.H. and Robinson, D.P. (1998). A sug-
40: 509533. gested test for spatial autocorrelation and/or
heteroskedasticity and corresponding Monte Carlo
Kelejian, H.H. and Prucha, I. (2001). On the asymptotic results. Regional Science and Urban Economics, 28:
distribution of the Moran I test statistic with 389417.
applications. Journal of Econometrics, 104(2):
219257. Kelejian, H.H. and Robinson, D.P. (2004). The inuence
of spatially correlated heteroskedacity on tests for
Kelejian, H.H. and Prucha, I.R. (2002). 2SLS and OLS spatial correlation. In: Anselin, L. and Florax, R.J.
in a spatial autoregressive model with equal spatial (eds), Advances in Spatial Econometrics, pages
weights. Regional Science and Urban Economics, 7997. Heidelberg: Springer-Verlag.
32(6): 691707.
King, M. (1981). A small sample property of the
Kelejian, H.H. and Prucha, I.R. (2004). Estimation of CliffOrd test for spatial correlation. Journal of the
simultaneous systems of spatially interrelated cross Royal Statistical Association B, 43: 264.
sectional equations. Journal of Econometrics, 118:
2750. Land, K. and Deane, G. (1992). On the large-
sample estimation of regression models with
Kelejian, H.H. and Prucha, I.R. (2005). HAC estimation spatial or network-effect terms: A two stage least
in a spatial framework. Working paper, Department squares approach. In: Marsden, P. (ed.), Socio-
of Economics, University of Maryland, College logical Methodology, pp. 221248. San Francisco:
Park, MD. Jossey-Bass.
Kelejian, H.H. and Prucha, I.R. (2007). HAC estimation Lee, L.-F. (2002). Consistency and efciency of least
in a spatial framework. Journal of Econometrics, squares estimation for mixed regressive, spatial
140: 131154. autoregressive models. Econometric Theory, 18(2):
Kelejian, H.H. and Prucha, I.R. (2006). Specication 252277.
and estimation of spatial autoregressive models with Lee, L.-F. (2003). Best spatial two-stage least squares
autoregressive and heteroskedastic disturbances. estimators for a spatial autoregressive model with
Working paper, Department of Economics, University autoregressive disturbances. Econometric Reviews,
of Maryland, College Park, MD. 22: 307335.
Kelejian, H.H., Prucha, I.R. and Yuzefovich, Y. Lee, L.-F. (2004). Asymptotic distributions of quasi-
(2004). Instrumental variable estimation of a maximum likelihood estimators for spatial autore-
spatial autoregressive model with autoregressive gressive models. Econometrica, 72: 18991925.
disturbances: Large and small sample results.
Lee, L.-F. (2006). GMM and 2SLS estimation of mixed
In: LeSage, J.P. and Pace, R.K. (eds), Advances
regressive, spatial autoregressive models. Journal of
in Econometrics: Spatial and Spatiotemporal
Econometrics. Forthcoming.
Econometrics, pp. 163198. Oxford, UK: Elsevier
Science Ltd. Lee, L.-F. (2007). GMM and 2SLS estimation of mixed
regressive, spatial autoregressive models. Journal of
Kelejian, H.H. and Robinson, D.P. (1992). Spatial
Econometrics, 137: 489514.
autocorrelation: A new computationally simple
test with an application to per capita county Leenders, R.T.A.J. (2002). Modeling social inuence
police expenditures. Regional Science and Urban through network autocorrelation: Constructing the
Economics, 22: 317333. weights matrix. Social Networks, 24: 2147.
LeSage, J.P. (2000). Bayesian estimation of limited Pace, R.K. and Barry, R. (1997b). Sparse spatial
dependent variable spatial autoregressive models. autoregressions. Statistics and Probability Letters,
Geographical Analysis, 32: 1935. 33: 291297.
LeSage, J.P. and Pace, R.K. (2004). Advances in Econo- Pace, R.K., Barry, R. and Sirmans, C. (1998). Spatial
metrics: Spatial and Spatiotemporal Econometrics. statistics and real estate. Journal of Real Estate
Oxford, UK: Elsevier Science Ltd. Finance and Economics, 17: 513.
LeSage, J.P. and Pace, R.K. (2005). Spatial econo- Pace, R.K. and LeSage, J.P. (2002). Semiparametric
metric modeling of origin-destination ows. Paper maximum likelihood estimates of spatial depen-
Presented at the 52nd North American Meeting dence. Geographical Analysis, 34: 7690.
for the Regional Science Association International,
Las Vegas, NV, Nov. 2005. Pace, R.K. and LeSage, J.P. (2004a). Chebyshev
approximation of log-determinants of spatial weights
LeSage, J.P., Pace, R.K. and Tiefelsdorf, M. (2004). matrices. Computational Statistics and Data Analysis,
Methodological developments in spatial economet- 45: 179196.
rics and statistics. Geographical Analysis, 36: 8789.
Pace, R.K. and LeSage, J.P. (2004b). Spatial statistics
Lin, X. and Lee, L.-F. (2005). GMM estimation
and real estate. Journal of Real Estate Finance and
of spatial autoregressive models with unknown
Economics, 29: 147148.
heteroskedasticity. Working paper, The Ohio State
University, Columbus, OH. Paelinck, J. and Klaassen, L. (1979). Spatial
Econometrics. Farnborough: Saxon House.
Magnus, J. (1978). Maximum likelihood estimation
of the GLS model with unknown parameters Pesaran, M.H. (2005). Estimation and inference in
in the disturbance covariance matrix. Journal of large heterogenous panels with cross section
Econometrics, 7: 281312. Corrigenda, Journal of dependence. Working paper, Faculty of Economics
Econometrics 10: 261. and Politics, University of Cambridge, Cambridge,
Mardia, K. and Marshall, R. (1984). Maximum likeli- United Kingdom.
hood estimation of models for residual covariance in Pinkse, J. and Slade, M.E. (1998). Contracting in space:
spatial regression. Biometrika, 71: 135146. An application of spatial statistics to discrete-choice
Martin, R. (1993). Approximations to the determinant models. Journal of Econometrics, 85: 125154.
term in Gaussian maximum likelihood estimation of
Pinkse, J., Slade, M.E. and Brett, C. (2002). Spatial
some spatial models. Communications in Statistics:
price competition: A semiparametric approach.
Theory and Methods, 22: 189205.
Econometrica, 70(3): 11111153.
Moran, P.A. (1948). The interpretation of statistical
Rey, S.J. and Boarnet, M.G. (2004). A taxonomy
maps. Biometrika, 35: 255260.
of spatial econometric models for simultaneous
Moran, P.A. (1950). A test for the serial dependence of equations systems. In Anselin, L., Florax, R.J. and
residuals. Biometrika, 37: 178181. Rey, S.J. (eds), Advances in Spatial Econometrics.
pp. 99119, Heidelberg: Springer-Verlag.
Nelson, G.C. (2002). Introduction to the special issue
on spatial analysis. Agricultural Economics, 27(3): Ripley, B.D. (1981). Spatial Statistics. New York: Wiley.
197200.
Robinson, P.M. (1988). Root-n-consistent semipara-
Newey, W.K. and West, K.D. (1987). A simple, positive metric regression. Econometrica, 56: 931954.
semi-denite, heteroskedasticity and autocorrelation
consistent covariance matrix. Econometrica, 55: Schabenberger, O. and Gotway, C.A. (2005). Statistical
703708. Methods for Spatial Data Analysis. Boca Raton, FL:
Chapman & Hall/CRC.
Ord, J.K. (1975). Estimation methods for models of
spatial interaction. Journal of the American Statistical Schmoyer, R. (1994). Permutation tests for correlation
Association, 70: 120126. in regression errors. Journal of the American
Pace, R.K. and Barry, R. (1997a). Quick computation
of spatial autoregressive estimators. Geographical Smirnov, O. (2005). Computation of the information
Analysis, 29: 232246. matrix for models with spatial interaction on a lattice.
Journal of Computational and Graphical Statistics, Upton, G.J. and Fingleton, B. (1985). Spatial Data
14: 910927. Analysis by Example. Volume 1: Point Pattern and
Quantitative Data. New York: Wiley.
Smirnov, O. and Anselin, L. (2001). Fast maximum
likelihood estimation of very large spatial autoregres- Waller, L.A. and Gotway, C.A. (2004). Applied Spatial
sive models: A characteristic polynomial approach. Statistics for Public Health Data. Hoboken, NJ:
Computational Statistics and Data Analysis, 35: John Wiley.
301319. White, H. (1980). A heteroskedastic-consistent covari-
Tiefelsdorf, M. (2002). The saddlepoint approximation ance matrix estimator and a direct test for
of Morans I and local Morans Ii s reference distri- heteroskedasticity. Econometrica, 48: 817838.
bution and their numerical evaluation. Geographical Whittle, P. (1954). On stationary processes in the plane.
Analysis, 34: 187206. Biometrika, 41: 434449.
Tiefelsdorf, M. and Boots, B. (1995). The exact Zhang, H. (2002). On estimation and prediction for
distribution of Morans I. Environment and Planning spatial generalized linear mixed models. Biometrics,
A, 27: 985999. 56: 129136.
15
Spatial Microsimulation
D. Ballas and G.P. Clarke
15.1. INTRODUCTION three-dimensional tables (although special

requests for different combinations can be
Much modelling in human geography and made in certain countries but at additional
related disciplines takes an aggregate or expense). Models built on these more aggre-
meso-scale approach to the issue of spatial gate data sets are widespread and have proved
resolution. That is, characteristics of individ- very fruitful in many areas of policy analysis
uals or households are summed to provide (see, for example, Fotheringham et al., 2000;
zonal population or demand totals and, if Longley and Batty 2002; Stillwell and Clarke
appropriate, individual companies or firms 2004). However, such modelling techniques
are similarly aggregated on the supply side often need to be highly disaggregated for
of the economy. In spatial econometrics real world applications and they also provide
or regional science those zones can be very little information concerning the inter-
as large as entire cities or regions. The dependencies between household structure
most obvious reason for doing this is that or type and their lifestyles, including the
detailed disaggregate data on persons or events they routinely participate in and hence
firms are typically not regularly available their ability to raise and spend various types
below the level of the region (especially of income and wealth. For social policy
economic or household survey data). In evaluation such micro models allow analysts
most countries census data are available to monitor the effects of changes in taxa-
to help disaggregate population totals to tion, family credit, property or council tax,
smaller geographical regions but the level pensions, social security payments, etc. (the
of detail available for researchers is then actions of local and national governments)
limited by what is published in two- or at the household level (and hence at any
more aggregate spatial scale). For area- Orcutt et al. (1961) who argued for a
based policy evaluation such models allow new type of socio-economic system and
differential impacts between and within areas described a simple model of demographic
to be analysed more effectively. The necessity transitions based on micro-analytical simula-
of predicting the impacts of social and area- tion. In particular, microsimulation methods
based policies at the local or micro-level aim to examine changes in the characteristics
has also been emphasized by Openshaw or lifestyles of individuals within households
(1995, p.60). Governments need to predict and to analyse the impact of government
the outcomes of their actions and produce policy changes on these individuals or
forecasts at the local level. households. Microsimulation models can be
For these reasons Wilson (2000, p. 98) distinguished between two main types. First,
identified microsimulation as one the most there are static models that are based on
important methods in regional science mod- simple snapshots of the current circumstances
elling: Simulation is a critical concept in the of a sample of the population at any one time.
future development of modelling because it Second, there are dynamic models that vary
provides a way of handling complexity that or age the attributes of each micro-unit in a
cannot be handled analytically. Microsimula- sample to build up a synthetic longitudinal
tion is a valuable example of a technique that database forecasting the sample members
may have increasing prominence in future lifestyles into the future.
research. The first geographical application
This chapter reviews the history of spatial of microsimulation was developed by
microsimulation and spells out a research Hgerstrand (1967) who employed micro-
agenda for the further exploitation of the analytical techniques for the study of spatial
technique. First, the semantics of microsimu- diffusion of innovation. Nevertheless,
lation are revisited and we describe the differ- it can be argued that the basis for
ent types of microsimulation models and how spatial microsimulation of households
they can be formulated (section 15.2). We and individuals was founded in the 1970s. In
then provide a brief overview of applications particular, Wilson and Pownall (1976) were
of microsimulation models which includes among the first to address the aggregation
use in economics, social policy, geography difficulties that were associated with
and regional science (section 15.3). Then, traditional comprehensive spatial models of
we spell out a research agenda for spatial urban systems. They suggested a new spatial
microsimulation (section 15.4) and offer modelling framework for representing the
some concluding comments in section 15.5. urban system based on the micro-level
interdependence of household and individual
characteristics. Further, they concentrated
on the spatial distribution of population
15.2. WHAT IS SPATIAL and its activities and suggested that persons
MICROSIMULATION? and their associated attributes should be
defined separately in the form of lists, rather
Microsimulation can generally be defined as than represented in the form of matrices. In
a methodology that is concerned with the this manner, there is no loss of information
creation of large-scale population microdata and the storage is computationally efficient.
sets to enable the analysis of policy impacts In their representational framework, they
at the micro-level. The approach dates were interested in estimating all the
back to the work of Orcutt (1957) and characteristics of the individuals that
SPATIAL MICROSIMULATION 279
comprise the urban population. Formally, demonstrate how they applied the method
they defined variables for each person in the to estimate joint probability distributions
system separately by adding a person label of household attributes. The IPF procedure
r to each person attribute x1 , x2 , x3 , . . . , xn . adopts a synthetic reconstruction method
The person attributes would therefore which calculates conditional probabilities
become x1r , x2r , x3r , . . ., xnr for the rth person of of having particular attributes and it then
the population. This means that if there are M assigns these attributes on the basis of
people in the population, there will be N M random sampling procedures (Monte Carlo
variables in total. After suggesting the above simulation). Table 15.1 depicts the steps
framework for representing individuals, that need to be followed in the procedure
Wilson and Pownall (1976) proposed a for allocating economic activity status for
modelling procedure to estimate each example.
characteristic for each person in turn. They More recently researchers have argued
formally expressed this procedure as follows: that reweighting existing survey data can
produce more robust results than these
synthetic probabilistic reconstruction models,
xjr = (xjr (Pj (x/. . .)Rjr , )
which involve the use of random sam-
pling (Williamson et al., 1998; Huang and
where Pj (x/. . .) is the probability of xj taking Williamson, 2001; Ballas et al., 2005). Two
the value x conditional on variables yet to be well-used reweighting procedures are:
specified, Rjr is a random number selected for
person r and characteristic j, and represents
a relevant constraint set (Wilson and Pownall, Reweighting probabilistic approaches, which
1976). One of the most significant properties typically reweight an existing national microdata
of the above model is its causal structure, set to t a geographical area description on
the basis of random sampling and optimization
which is largely reflected in the order in
techniques
which the characteristics are estimated for
each person. Reweighting deterministic approaches, which
Almost a decade later, Birkin and Clarke reweight a non geographical population micro-
(1988) built a synthetic spatial information data set to t small area descriptions, but without
system for urban and regional analysis. the use of random sampling procedures
It can be argued that this model is the
first comprehensive spatial microsimulation
model in the UK. Birkin and Clarke (1988) These new methods involve the reweight-
discussed the difficulties of performing ing of an existing microdata sample (which
micro-level spatial analysis using the existing is usually only available at coarse levels of
published data sources and they proposed a geography), so that it would fit small area
methodology for generating synthetic micro- population statistics tables. For instance, an
data from a number of different aggregate existing microdata set such as the British
sources. This microsimulation methodology Household Panel Survey (BHPS) described
was underpinned by a technique known as in Table 15.2 can be reweighted to populate
iterative proportional fitting (IPF) (see Birkin small areas.
and Clarke (1988) and Ballas (2001) for a The BHPS provides a detailed record for
more detailed discussion of this technique). a sample of households and all of their
Birkin and Clarke (1988) briefly discuss occupants (Taylor et al. 2001). Reweighting
the theoretical properties of IPF and they methods aim to sample from all the microdata
Table 15.1 Microsimulation procedure for the allocation of economic activity status (after
the similar example of tenure allocation procedure given by Clarke, 1996: 3)
Head of household (hh)
Steps 1st 2nd Last

...
Age, sex and marital status and Age: 1629 Age: 7584 Age: 3044
location (ED level) (given) Sex: Male Sex: Female ... Sex: Male
Marital status: SWD Marital status: married Marital status: married
GeoCode: DAFA01 GeoCode: DAFA02 GeoCode: DAGK45
Probability of hh of given age, sex, 0.7 0.4 0.7
and location (ED level) being ...
economically active
Random number 0.55 0.5 0.45
...
Economic activity assigned to hh on Economically active Economically inactive Economically active
the basis of random sampling ...
Probability of economically active hh 0.6 0.5
being an employee ...
given age, sex, marital staus, and ...
location (ED level) being
self-employed
location (ED level) being on a
government scheme
location (ED level) being
unemployed
Random number 0.4 0.6
...
Economic activity category assigned Employee Self-employed
on the basis of random sampling ...
records to find the set of household records a data-fitting exercise. However once built
that best matches the population described the model can be used for static what-if
in the Small Area Statistics or Census Area simulations, in which the impacts of alter-
Statistics tables for the small area under native policy scenarios on the population
study. First, a series of small area tables are estimated: for instance if there had been
(e.g., from the Census or other sources) no poll tax in 1991 which communities
that describe the small area of interest must would have benefited most and which
be selected. For example, a reweighting would have had to have paid more tax
method would sample from the BHPS to find in other forms? Second it can be used
a suitable combination of households that for dynamic modelling, to update a basic
would fit the data described in Table 15.3. micro-dataset and future-oriented what-if
This first stage of population estima- simulations: for instance if the current
tion at the household level is primarily government had raised income taxes in
Table 15.2 A population microdata example

Person AHID PID AAGE12 Sex AJBSTAT ... AHLLT AQFVOC ATENURE AJLSEG ...
1 1000209 10002251 91 2 4 ... 1 1 6 9 ...
2 1000381 10004491 28 1 3 ... 2 0 7 8 ...
3 1000381 10004521 26 1 3 ... 2 0 7 8 ...
4 1000667 10007857 58 2 2 ... 2 1 7 8 ...
5 1001221 10014578 54 2 1 ... 2 0 2 8 ...
6 1001221 10014608 57 1 2 ... 2 1 2 8 ...
7 1001418 10016813 36 1 1 ... 2 1 3 8 ...
8 1001418 10016848 32 2 7 ... 2 7 3 7 ...
9 1001418 10016872 10 1 8 ... 8 8 3 8 ...
10 1001507 10017933 49 2 1 ... 2 0 2 8 ...
11 1001507 10017968 46 1 2 ... 2 0 2 8 ...
12 1001507 10017992 12 2 8 ... 8 8 2 8 ...
Note: The British Household Panel Survey data were made available through the UK Data Archive. The data were originally
collected by the ESRC Research Centre on Micro-social Change at the University of Essex, now incorporated within the
Institute for Social and Economic Research.
Person Person number.
AHID Household identier (number of household to which the listed individual belongs).
PID Person identier (a unique number to identify the individual).
AAGE12 Age at 1/12/1991.
Sex Sex
AJBSTAT Current labour force status (e.g., self-employed, in paid employment, unemployed, family care, etc.) in 1991.
AHLLT Health status in 1991.
AQFVOC Vocational qualications in 1991.
AJBSEG Socio-economic group (e.g., employers, managers, professionals, skilled manual, unskilled, etc.) in 1991.
ATENURE Tenure status in 1991.
AJLSEG Socio-economic group: last job (in 1991).
Table 15.3 An example of small area data

Small area table 1 (household type) Small area table 2 (economic Small area table 3 (tenure status)
activity of household head)
Area 1 Area 1 Area 1
60 Married couple households 70 Employed/ self-employed 60 Owner occupier
20 Single-person households 10 Unemployed 20 Local Authority or Housing Association
20 Other 20 Other 20 Rented privately
Area 2 Area 2 Area 2

40 Married couple households 50 Employed/ self-employed 60 Owner occupier
20 Single-person households 20 Unemployed 20 Local Authority or Housing Association
40 Other 30 Other 20 Rented privately
1997 what would the redistributive effects 15.3. APPLICATIONS

have been between different socio-economic
groups and between central cities and their 15.3.1. Introduction
suburbs by 2007?
We shall explore applications based on As mentioned above, microsimulation has
these principles in the next section. been mainly developed and used by a
Environment/tra (2%) Microsimulation studies
Population (6%) Health (5%)
Transportation (16%)
Economics (41%)
Other (2%)
Medicine (25%)
Geography (3%)
Figure 15.1 Distribution of microsimulation academic studies in the period 19672003.

(Source: http://www.sciencedirect.com/; Accessed 15 October 2003; after Ballas et al., 2005:
p. 11).
variety of social sciences. Figure 15.1 shows 15.3.2. Tax and income modelling
the results of a basic keyword search in
the Sciencedirect academic journal database, A large number of papers in economics on
searching the word microsimulation in the microsimulation relate to work on household
titles or abstracts of papers in the last finance. Indeed, amongst the first applied
30 years. As can be seen, the majority of microsimulation models was TAX, devel-
the papers were in economics (41%) with oped at the US Treasury department in the
very few papers in geography (3%), although 1960s (Nelissen, 1993). Since then there
spatial applications may also lie in fields have been many models built to examine the
such as population, transport and health. impacts on individual households of various
There is also a relatively high number of tax or welfare changes (Bekkering, 1995;
microsimulation applications in medicine. Falkingham and Lessof, 1992; Falkingham
However these are applications of a different and Hills, 1995a, b; Glennerster et al., 1995;
nature, as their main focus is the effectiveness Propper, 1995).
of medicines (e.g., simulating the impact of The first task in the modelling of
medicines on human well-being, etc.) household income is to link households
The rest of this section explores some with job type. Birkin and Clarke (1989)
well-known examples of microsimulation for used the SYNTHESIS model to generate
certain types of policy work. This includes incomes for individuals. They used an IPF
static models (simply run for one period based microsimulation approach to estimate
of time) and dynamic models (where the earned income at ward level for the Leeds
attributes of the population are updated Metropolitan District by assigning each
constantly or over yearly totals). household a job and an occupation using
information from the New Earnings Survey is built, dynamic microsimulation procedures
to allocate an income variable accordingly. can be introduced in order to update these
In addition, they estimated income from databases. Amongst the first applied dynamic
transfer payments such as the Family Income microsimulation models was DYNASIM
Supplement for each household. This was (DYNAmic Simulation of Income Model;
probably the first successful attempt to see Orcutt et al., 1961; Wertheimer et al.,
generate income at the small area level in 1986), which was the base for later, more
the literature. Ballas and Clarke (2001a) sophisticated, models such as CORSIM.
extended this work by increasing the number One of the descendants of DYNASIM
of transfer or welfare payments included was DYNASIM2, which was developed
in the model (such as detailed work on and maintained at the Urban Institute in
child benefits) and also including household Washington D.C. (Wertheimer et al., 1986).
taxation levels. Williamson and Voas (2000) DYNASIM2 comprised two sub-models: a
report ongoing research to provide more Family and Earnings History (FEH) model
robust and reliable estimates of income at and a Jobs and Benefit History (JBH) model
the small area level. They argue that income (Wertheimer et al., 1986).
estimation at the small area level may be Work on income and taxation can be
seen as a multilevel analysis problem where more focused onto particular problems.
variables at individual and area levels may Currently in many Western countries there
interact. is a problem relating to pensions given
Work in the US has tended to extend that an ageing population will need more
such work to include not only house- financial support from a declining workforce
hold income but also household wealth. population. Notable here is the work of
In particular, Caldwell and Keister (1996) Hancock et al. (1992), who built PENSIM.
present CORSIM, which is a dynamic This is a microsimulation model designed
microsimulation model that has been under for the simulation of pensioners incomes
development at Cornell University since up to the year 2030. Hancock et al.
1986. CORSIM has been used to model (1992) point out that the simulation of
wealth distribution in the United States pensions is another good example of the
over the historical period 19601995 and to application of dynamic microsimulation tech-
forecast wealth distribution over the future niques, given that pension rights accumulate
(Caldwell and Keister, 1996). It is noteworthy over a long period of time and their
that over 17 different national microdata files estimation requires the processing of data
have been used to build the model, which pertaining to individuals entire working
incorporated 50 economic, demographic and lives. PENSIM aims at predicting aggregate
social processes by means of approximately income by source within certain subsets
900 stochastic equations and rule-based of the pensioner population under different
algorithms (Ibid.). Furthermore, Caldwell alternative assumptions. These assumptions
et al. (1998) review the geography of wealth pertain to the rules controlling the treatment
in the USA and show how CORSIM has of pensioners by the social security system,
included many variables relating to assets pension entitlement regulations, projected
and debts. demographic movements and movements in
As mentioned in the previous section, aggregate economic variables such as unem-
microsimulation models can be even more ployment and inflation. Davies and Joshi
powerful when they become dynamic. In (1992) also focused on modelling pensions.
particular, once a microsimulation database In particular, they employed microsimulation
modelling techniques to simulate lifetime As can be seen, this framework starts

earning and pension entitlements in Britain. to model activity patterns of individuals or
They used a microsimulation model to households (activity normally undertaken in
construct illustrative individuals and examine more meso-scale models). Hence, there have
the treatment of pensions after divorce. been a number of examples of building
They also modelled lifetime earnings upon links between these household data sets
which pensions depend and they simulated and trip making behaviour or activities. We
dated earnings for each partner before and shall explore a sample of these types of
after dissolution of the marriage and they application in this section. Simulated small
explored how pension entitlement varied with area micro data sets can also allow for a
the duration of the marriage. Among the household demand function to be specified
variables that they estimated were age, sex, (likely type of supermarket, school, etc.) at
age at marriage, qualifications and age at the small area level given that households
divorce. Models of income and wealth also socio-economic profile. This can then be
feed significantly into models of social policy fed into a household interaction model
change (see section 15.3.4). (or variant of discrete choice model) in order
to add place of work, shopping destination,
GP location, childrens school, etc. to the
household database contained within a spatial
15.3.3. Modelling household
microsimulation model.
activity patterns
Introduction
Wilson and Pownall (1976) provided an early Labour and housing markets
example of how microsimulation models As Table 15.4 suggests, one key link is
could be employed to build urban micro- between households and their job locations.
analytical models based on the interdepen- By adding a journey to work model house-
dencies between individual characteristics. holds can be allocated a job destination
In these examples they investigated the (by age, sex, occupation, social class, etc.).
interdependency of the person and household Ballas and Clarke (2001b) showed how it was
characteristics that are listed in Table 15.4. possible to build a journey to work model
for Leeds which linked individual households
to particular firms. Then, the impacts of the
closure of a major manufacturing firm in east
Leeds could be modelled in terms of which
Table 15.4 Attributes of individual households would be most affected and in
micro-unit examined by Wilson and terms of their consequent reduction in income
Pownall (1976) and expenditure. This local analysis showed
Person Attributes that most of the impacts occurred within
Wage 5 miles of the firms location analysis in
Job location
stark contrast to outputs from typical regional
Residential location
Journey to work costs
inputoutput models.
Housing expenditure Hooimeijer (1996) suggests a geographical
Shopping expenditure microsimulation framework to analyse the
Journey to shop costs linkages between supply and demand in
Shopping location the housing market and labour market
Other expenditure
simultaneously. He argues for the modelling
of spatial mobility of households and firms policies. In order to model the environmental
in three different time sets (daily commuting, impacts there is a need for small-area
relocation, and lifetime mobility). The prob- forecasts of emissions from stationary and
lem associated with this type of modelling is mobile sources as well as of emissions
the order in which processes are modelled. in terms of the affected population. After
It could be argued for example, that labour outlining the main characteristics of a micro-
force participation is dependent on family analytic theory of urban change, Wegener
status and attributes or that the family and Spiekermann (1996) report on modelling
formation procedure is dependent on the efforts carried out at the University of
labour market situation of each individual. As Dortmund to integrate microsimulation into
Falkingham and Lessof (1992) put it: a comprehensive urban land-use transport
model (see also Veldhuisen et al. (2000).
. . . while a womans labour force status can The links between households, housing
depend on the number of children she has and
on her marital status, it cannot also inuence
markets and labour markets have been
the probability of the woman having a child in explored more recently in Ballas et al.
any year. The ordering of the modules necessarily (2005).
involves making assumptions about the direction
of causality in relationships between variables.
(Falkingham and Lessof, 1992: 9)
Retail models
Their LIFEMOD model is based on Traditional spatial interaction or discrete
the assumption that demographic variables choice models have been used to estimate
determine labour-force participation and expenditure flows from households to each
that labour-force participation influences store. It is argued by Nakaya et al. (2005) that
health, although it is pointed out that it is possible to improve the applicability of
evidence suggests causality in either direction the retail interaction model, not by increasing
(Falkingham and Lessof, 1992). the complexity of the model formulation,
but by integrating the interaction modelling
framework with spatial microsimulation. To
Transport and land-use models attain a high level of predictive accuracy,
Wegener and Spiekermann (1996) explore models of retail interaction usually require
the potential of microsimulation for urban a high degree of disaggregation (Birkin
models, focusing on land-use and travel et al., 2002). Even if a survey of consumer
models. They argue that a new generation behaviour is conducted by randomly dis-
of travel models has emerged which requires tributing a questionnaire to local residents,
more detailed information on household response rates would vary by consumer type
demographics and employment character- and place of residence based on peoples
istics at the small area level. They also different levels of interest and tolerance of
point out that there are new neighbourhood- such a survey. Consequently, survey data
scale transport policies aimed at promoting of this type often contain bias in the type
public transport, walking and cycling. These of consumer behaviour measured, swayed
policies require detailed information on the towards the behaviour of individuals who
precise location of the population and its least object to completing surveys. This prob-
activities. Wegener and Spiekermann (1996) lem of missing data tends to get worse as the
also stress the need for urban models to spatial units used in the analysis get smaller.
predict not only the economic but also the A solution to this problem is to generate data
environmental impacts of land-use transport through spatial microsimulation which can be
used to generate estimates of expenditure on market transitions. These include education,

groceries by each household. These estimates scholarship, transitions from school, transi-
can then be aggregated to any grouping tions from being unemployed, retirement,
including lifestyle segments and residential etc. The final step in the NEDYMAS
zones simultaneously. The end product is a microsimulation procedure is to simulate
retail model with a more disaggregate and attributes or transitions that are related to
useful set of demand variables and both social security. Nelissen (1993) describes
attractiveness and distance decay parameters how sensitivity analysis was performed to
calibrated for different types of consumers. validate NEDYMAS and concludes that the
model is capable of reconstructing the long-
term socio-economic development at the
micro level.
15.3.4. Social policy change
It is interesting to note that the LIFEMOD
From the end of the 1960s onwards model described above has also been used to
microsimulation became the dominant quan- estimate the effects of the welfare state over
titative method for forecasting the impacts of the life-cycle of individuals (Falkingham and
policy changes in the social welfare area in Hills, 1995a, b, Falkingham et al., 1995),
the USA (Nelissen, 1993). This is the same as well as to estimate the degree to which
now in many developed countries. A good income is redistributed between people over
example of dynamic microsimulation mod- time, or across the life cycle (Fakingham
elling for economic and social policy analysis and Hills, 1995b). It has also been used
is NEDYMAS (Netherlands Dynamic Micro- to investigate financing options for higher
Analytic Simulation model; see Nelissen, education (Glennerster et al., 1995) and to
1993). The latter is a dynamic cross-sectional examine the dynamics of lone parenthood
microsimulation model aimed at simulating (Evandrou and Falkingham, 1995). Further,
future social security benefits and contribu- LIFEMOD has been used to explore the
tions. In particular, NEDYMAS is a com- lifetime distribution of health needs and use
prehensive model for the Dutch household of health services (Propper, 1995).
sector and comprises three main modules: In the UK the work of Holly Sutherland
a demographic module, a labour market and her colleagues has been very influential
and income formation module, and a social in terms of policy analysis using microsimu-
security module. Demographic processes are lation (Redmond et al., 1998; Mitton et al.,
simulated explicitly, which means that the 2000; Hancock, 2000; Sutherland et al.,
size of the microdata base changes during the 2003). Sutherland and Piachaud (2001) for
simulation period. The NEDYMAS micro- example, developed and used a microsim-
database included 204 household attributes. ulation methodology for the assessment of
Once the initial population has been deter- British government policies for the reduction
mined the attributes of each individual can be of child poverty in the period 19972001.
updated and the micro-population can be pro- Their results suggest that the number of
jected into the future. First, all demographic children in poverty will be reduced by
transitions are made in the model. These approximately one-third in the short term
include events such as birth, death, immi- and that there is a trend towards further
gration, family reunification, emigration, first reductions. However, they emphasized that
marriage, remarriage, cohabitation, divorce, there is a need for more measures in order
etc. Once all the demographic transitions are to meet the government target of abolishing
simulated, the next step is to consider labour child poverty in a generation.
Another example is the research conducted trip making models into microsimulation.
by Ballas et al. (2005) using SIMBRITAIN. The obvious next step is to link all these
This model assumes that the initial simulation components into a more comprehensive
of the future population of Britain could be urban model. First, more linkage is required
based on population projections (such as between households and the supply-side
those of the ONS) and on the assumption that of the economy. For example it should
the trends in the changes to society to 2021 be possible to link all households to a
are similar to that of the previous decade. retail destination (by type of good) and a
However, alternative projections would also destination for primary and secondary health
be provided on the basis of hypothetical and education. By adding more information
social policy changes. They also examined on linkages or flows within the city it can be
child poverty as a major application area. argued that such modelling would offer major
For example, it is possible to use a dynamic new insights into urban deprivation or quality
spatial microsimulation model to estimate of life. Many households will be identified as
the degree of child poverty eradication within having poor accessibility to major services.
the next 20 years under different policies and However, multiple deprivation may well
assumptions, such as the onset of a major exist in many areas where poor accessibility
recession or a redistribution of wealth, and exists to all major urban services. For
the model would provide projections in order example, a neighbourhood may be a long way
to suggest where current strategies are failing from decent retail opportunities, a hospital
to eradicate child poverty within a generation. and a GP. In addition, although close to
Microsimulation still has to gain credibility a secondary school, that school may be
amongst the social science community in suffering from very low examination success
general and social policy researchers in par- and hence access is constrained to only a
ticular. Thus, there is currently a major chal- poor-performing school.
lenge to build on the work described above Once all the relevant demand-side and
in order to project the population into the supply-side databases are constructed, the
future to predict what would happen under next step would be to perform what-if
different macro-economic, micro-economic policy impact analysis. In particular, it
and social policy scenarios. This will enable will be possible to model what would
an evaluation of the short and long-term be the impact on the quality of life of
impacts that various government policies residents in different localities, under dif-
are likely to have on different segments of ferent scenarios. For instance, it would be
society and different geographical areas. possible to estimate what would be the
socio-economic and spatial impact of a new
hospital in an area, new retail facilities,
new schools, etc. It will also be possible to
link these activities to events taking place
15.4. THE WAY FORWARD: THE elsewhere in the city. For example, the
RESEARCH AGENDA impact analysis of the factory closure that
has been given by Ballas and Clarke (2001b)
15.4.1. Towards a comprehensive
can be extended by estimating multiplier
spatial microsimulation of
effects and the loss of spending power
urban systems
in the local community. Further, it would
We have seen in section 15.3 that progress be possible to estimate the downgrading
has been made on adding behavioural or of service facilities as businesses close or
relocate to more affluent areas. It would based models (ABM). ABM models are
then be possible to determine whether this normally associated with the behaviour of
development leads to even poorer service multiple agents in a social or economic sys-
provision for those communities affected. tem. These agents usually interact constantly
The possibility of individuals made redun- with each other and the environment they live
dant finding new jobs in the area, migrating or or move within. Thus their actions are driven
being retrained could also then be estimated. by certain rules. Although this methodology
Ballas et al. (2006) have made a start in this sounds similar to microsimulation (where
direction. agents could be the individuals within the
The second major effort needed is to link households) Davidsson (2000) notes that
such models more into the local and regional ABM may offer a better framework for
labour market through a framework which including behavioural rules into the actions
combines spatial microsimulation models of agents (including an element of random
and regional inputoutput models or regional behaviour) and for allowing interactions
econometric models. It has long been argued between agents. There are a number of good
that the treatment of the household sector has illustrations in a geographical setting (Batty
been ignored by most inputoutput modellers and Densham, 1996; Heppenstall et al.,
who at best would model or aggregate 2006). Clearly there is a research agenda
variables such as household income and to link these two complementary approaches
expenditure in aggregate form, making no more effectively. Microsimulation could be
distinction between the behaviour of different used to give the agents in ABM their
types of household defined in terms of socio- initial characteristics and locations whilst
economic status, employment profile, skill ABM could then provide the capacity to
level, etc. (Batey, 2003). It can be argued model individual adaptive behaviours and
that spatial microsimulation can address this emergence of new behaviours (see also the
issue. For instance, the prediction of input discussion of Boman and Holm, 2004).
output models for different sectors of the In addition, data from household panel
local economy can be spatially disaggregated surveys such as the British Household
with the use of a spatial microsimulation Panel Survey (BHPS) may be utilized to
model. Likewise, predictions of regional formulate plausible assumptions regarding
econometric models for the whole region these behaviours. For instance, it is possible
can be disaggregated at the individual and to use panel data from surveys such as
household level with the use of spatial the BHPS to model the life paths of
microsimulation. Jin and Wilson (1993) particular individuals and households who
made some progress here but data limi- have moved into and out of work. Such
tations made it difficult to operationalize data can also be combined with information
their models. Microsimulation potentially from more qualitative analyses to simulate
has the ability to provide much of that the behaviour of workers made redundant
missing data. following plant closures and how they
fare in adapting to the changing labour
market and how long term unemployment
is increased for those unable to retrain
15.4.2. Linking microsimulation
(Ballas et al., 2006). The findings of quali-
and agent-based models
tative studies such these can provide useful
Microsimulation is closely linked to another insights when formulating the rules that
type of individual level modelling: agent determine the likely behaviour of households
in a combined ABM/spatial microsimulation Table 15.5 Database attributes that can be

framework. used for the linkage
Spatial microsimulation output Remotely sensed data
No. of residents in household Land use
(as a proxy to house size)
15.4.3. Spatial microsimulation House type Property size
and remote sensing Number of cars (as a proxy to House type
house size)
Another interesting research possibility is Number of rooms in household ...
the combination of spatial microsimula- space (as a proxy to
tion model outputs with remotely sensed house size)
data. One difficulty at present with spatial ...
microsimulation models is that most of

the probabilities are calculated from known
distributions (provided by data sources such
as the Census of Population) at the small
area level (e.g., the Census Output Area methodologies and how these can be linked.
(COA) level in the UK). That is, although As can be seen, these databases can be joined
estimations are made at the level of the on the basis of the fields that they have in
individual household, it is not possible to common, such as the housing type and house
know precisely where within a small area size. However, it can be argued that all the
(such as the COA) a particular household is attributes listed in Table 15.5 can be used
actually located. For many policy purposes to build an index of similarity between a
that is not a major problem it is the remotely sensed house and a microsimulated
overall effect on the locality that is most synthetic household.
important. However, it can be argued that Moreover, the linkage between the two
for certain applications this would be a databases can be achieved with the use of
worthwhile addition especially potential statistical matching or data fusion techniques.
business applications. It should be noted that although statistical
Using remote sensing techniques it is matching (also known as data fusion) has
possible to obtain a point data set of a relatively long history, its theoretical
houses which would contain the housing type basis is somewhat narrow and there is
attribute. These point data sets can then be no established, tested and widely applied
linked to spatially disaggregated microsim- methodology (Paas, 1986; Sutherland et al.,
ulated households in order to disaggregate 2002).
the simulated population at the COA level. The new framework would also offer
In other words, the task of this modelling further potential for calibration and for
exercise would be to populate the remotely dynamic modelling. The visualization of the
sensed residential properties with attribute area being modelled would provide useful
data. Table 15.5 lists the attributes that can additional diagnostic information and would
be used as a link between the remote sensing allow new comparisons to be made between
generated database and the microsimulation simulated households and real households.
output. New images obtained from remote sensing
Further, Figure 15.2 depicts schematically, may provide a very valuable additional
and in a simplified manner, the geographical source of information, highlighting new
databases that are typically generated by construction, demolitions and major changes
microsimulation models and remote sensing in land use types. The second major benefit of
Remotely sensed data Microsimulation model output

Household ID HHSPTYPE ED-CODE TENURE . . .
108604 Detached 08DAFX33 Rented from Local . . .
Authority / NT E+W
178913 Terraced 08DAFX33 Rented from Local . . .
Authority / NT E+W
23459 Semi- 08DAFX33 Rented from Local . . .
detached Authority / NT E+W
162890 Detached 08DAFX33 Owner Occupied . . .
outright
24005 Semi- 08DAFX33 Owner Occupied . . .
detached buying
67443 Semi- 08DAFX33 Rent private . . .
detached furnished
201538 Detached 08DAFX33 Owner Occupied . . .
outright
150226 Terraced 08DAFX33 Rent private . . .
unfurnished
5336 Detached 08DAFX33 Rent private . . .
furnished
detached furnished
detached furnished
. . . . . . .
. . . . . . .
. . . . . . .
Figure 15.2 Combining spatial microsimulation and Remote sensing (Ballas et al., 2000).
the new framework comes when the potential might involve giving numerous labels to each
of microsimulation for business applications locality. For more discussion on this see
is considered. Given the potential to create Feng and Flowerdew (1998) and See and
lists of household attributes, it has long been Openshaw (2001). However, microsimula-
recognized that microsimulation could be tion would potentially offer another route
useful as a business tool. However, to date, to finding customers or consumer groups
little progress has been made in exploring of various types. From a main database of
this potential. In a sense, the database say 100 household variables it is possible
underpinning the microsimulation model to search for distributions made up of any
offers the same kind of information cur- of these variables. The possible number of
rently in many geodemographic or lifestyle combinations is very large indeed and the
data systems. Nevertheless, microsimulation user could ask for very specific combina-
offers much greater flexibility than many tions of variables, adding great flexibility
standard geodemographic systems. In most to the task of finding customers. Second,
cases, the geodemographic systems provide it would be possible to provide unique
only one label for each locality. This is classifiers for different localities. At the
based on the greatest percentage of each moment the underprivileged group may be
group represented in the locality. Unless made up of key census variables clustered
this percentage match is close to 100% in many different ways to end up with
there are always ecological fallacy prob- this classification. A major research ques-
lems: i.e., the label does not capture all tion is whether the underprivileged groups
consumer types resident in a particular identified in Liverpool are the same as
area. This has led a number of authors those identified in the East End of London.
to suggest fuzzy geodemographics, which A more subtle look at the outputs of the
microsimulation could offer new insights into data tables together with some knowledge
this issue. of Java programming not a desirable
Finally, the framework suggested here task for the average policy or decision
would add much to the potential of remotely maker. MicroMaPPAS provides a spatial
sensed data. It would be possible to put decision making interface which is much
estimations on the types of buildings in more user-friendly and suitable for decision
terms of housing types and characteristics of makers who can utilize the power of the
their inhabitants. Clearly, it is not possible spatial microsimulation methodology. The
to categorically say what types of families MicroMaPPAS software also provides
were in each building. However, it may be some basic mapping functions such as
possible to give an estimation of the types panning and zooming and symbology
of families within blocks thus giving very editing. The mapping capability in the
detailed portraits of small areas of our cities. software is provided by the GeoTools
(www.geotools.org) open source Java
mapping library, which has been written
by a group of researchers independent of
15.4.4. Spatial microsimulation,
the MicroMaPPAS project. GeoTools is
spatial decision support
a versatile Java library which conforms
systems and virtual
to the Open GIS Consortium standard
decision-making
specifications in relation to GIS open
environments
operability. The library can be adapted to
Another area where spatial microsimulation work in any Java based GUI or web-based
models can play an important role is in the Applet. The mapping controls allow the user
ongoing debates on the potential of new to select a microsimulated variable from a
technologies to promote local democracy query and map the results at a wide range
and electronic decision-making. It can be of different geographical scales (see Ballas
argued that spatial microsimulation models et al. (2004) and Ballas et al. (2006) for
can be used not only to provide information more details).
on the possible consequences and the local It can be also argued that systems such as
multiplier effects of major policy changes MicroMaPPAS can have an e-government
but also to inform the general public about dimension by allowing networking tech-
these and to enhance, in this way, the public nologies including the Internet to be used
participation in policy making procedures. by policy makers as well as the gen-
An example of work moving towards eral public. In particular, these systems
this direction is the Microsimulation can be converted into web-based GIS
Modelling and Predictive Policy Analysis to enhance public involvement and par-
System (Micro-MaPPAS) developed for ticipation in environmental planning and
the Leeds City Council by researchers at decision making processes. Such systems
the Universities of Sheffield, Leeds and are typically referred to in the literature
Manchester (Ballas et al., 2004, 2006). as Public Participation GIS (PPGIS) and
MicroMaPPAS is a planning support are based on the belief that by providing
system based on the SimLeeds geographical citizens with access to information and data
microsimulation model mentioned above. in the form of maps and visualizations,
The SimLeeds software (Ballas, 2001) has they can make better informed decisions
been run from a Command prompt and about the natural and built environment
required the hard coding of parameters and around them. It is possible to build on
the existing infrastructure and knowledge in more about geodemographic variations in

order to combine GIS and PPGIS frameworks demand for health services. Of particular
to enhance e-government, local democracy concern in the UK at the moment are the
and public participation. In particular, GIS problems of obesity (especially childhood
and spatial microsimulation models can also obesity), diabetes and smoking. The difficulty
play a very important role in the ongoing is that little is known about the prevalence
debates on the role of the potential of new of these health issues by household or
technologies to promote local democracy neighbourhood. Given age, sex, social class,
and electronic decision-making. It can be occupation, ethnicity, etc., microsimulation
argued that a system such as MicroMaPPAS models can estimate the incidence of such
developed in JAVA, can be put on the World problems (and be calibrated against any exist-
Wide Web and linked to Virtual Decision- ing data). Once demand is better understood
Making Environments (VDMEs). The latter and measured, the location of community
are Internet World Wide Web based systems health services becomes easier in the sense
that allow the general public to explore of finding locations to maximize access to
real world problems and become more potential users. In addition, other what-if
involved in the public participation processes scenarios are possible. For the location of
of the planning system (Evans et al., 1999; stop smoking services for example, it would
Kingston et al., 2000). also be possible to simulate the success
across the city of services targeted at different
geodemographic groups (young adults, heavy
smokers aged 65 or over, pregnant mothers,
15.4.5. New application areas
etc.). Similarly, for diabetes, it would be
In addition to comprehensive models it is possible to model the impacts of improving
useful to highlight other areas of economic access to fresh fruit and vegetables and hence
or social geography where microsimulation improving diet across households of different
has been under-utilized. One such area is types. Smith et al. (2006) give further details
medical geography. A notable exception is on the research agenda for diabetes and
the work of Clarke and Spowage (1984), who food access.
designed morbidity and mortality sub-models
for health care planning in West Yorkshire,
UK. They estimated the probabilities of
15.4.6. Improving model
being ill or dying based on age, sex,
calibration
social class, ethnicity, etc. (by speciality
case). Another sub-model was constructed Despite the benefits of the applications
to simulate hospital workloads and patient described in this chapter, it should be noted
throughput. that caution is necessary when using spatial
Recent concerns in UK public health microsimulation methodologies to perform
planning have focused on two main issues. what-if policy analysis and evaluation. The
The first has been the concern to improve outputs of all microsimulation models, no
health inequalities by investing more on matter how good, are always simulations
intervention strategies. The second has been and not actual data. The validity of the
the concern to treat more patients within simulated data will depend on the quality of
the community. Microsimulation lends itself the original data that are used and on the
well to addressing both these concerns. For assumptions upon which the microsimulation
intervention strategies we need to understand model is based. Moreover, it will depend
on the specific microsimulation methodology (also see Voas and Williamson (2000) for
that is employed. In addition, spatial a more detailed discussion and an in-depth
microsimulation outputs generally depend evaluation of combinatorial optimization
on subjective judgements associated with techniques). Further, there is a need to
the ordering of the conditional probability build on existing work on the validity and
tables that are used as inputs and/or with reliability of microsimulation models (such
the selection of the data sets that are used as the work of Pudney and Sutherland (1994)
as small area constraints. As Birkin and who investigated the role of sampling error
Clarke (1995) point out, the modellers art in a tax-benefit model and the work of
in microsimulation is to generate population Voas and Williamson (2001) who present
characteristics in an appropriate order so new goodness-of-fit measures for synthetic
that potential errors are minimized. These microdata).
aspects should always be taken into account
when using spatial microsimulation models
for policy impact assessment.
However, there is the related problem 15.5. CONCLUSIONS
of how to validate microsimulation outputs,
since there are no available micro-data sets We hope that we have demonstrated that
at the desired level of geographical scale spatial microsimulation is a useful technique
(hence the need for microsimulation in the for estimating the characteristics of individ-
first place!). Model output validation is one uals or households which can then be used
of the biggest problems of microsimulation in a variety of what-if situations regarding
methodologies. As Williamson (1999) points policy change. The key advantage of this
out, in the United States the National methodology is data fusion or linkage
Academy was commissioned to evaluate a variety of data sets can be combined
the effectiveness of microsimulation for to provide new insights into household
tax-benefit analysis purposes. The National characteristics and, ultimately, household
Academy found that there is a general lack behaviour. Thus these models can help to
of thorough validation for microsimulation solve the problem of missing data such
models and proposed a number of validation as, in the UK, household income, wealth,
measures such as external validity studies tax payment, water demand, health problems,
in which model results are compared with crime, etc. Once built, these models can also
data from program administrative sources be linked to meso or macro models (such
(Williamson, 1999). Moreover, sensitivity as discrete choice models, spatial interaction
analysis and computer-intensive sample models, logit models, inputoutput models,
reuse technique methods to measure the etc.) to show how households interact with
variance in model estimates were proposed. the supply side of the economy (where they
Thus, further research is required, in go to work, shop, visit the doctor, etc.).
order to improve the performance of spatial The ability to change these circumstances
microsimulation models and to highlight the and assess the impacts of such actions is
sources of error. For instance, as Williamson another major advantage of this methodol-
et al. (1998) point out, there are many ogy. Simulations can be run which change
ways in which combinatorial optimization either the characteristics of the households
methodologies can be fine-tuned, through the (population ageing, new job allowing greater
evaluation of the use of more or different SAS income to be earned, change of residence,
tables or by changing the model parameters etc.) or the characteristics of the supply
side (new retail centre, closure of a major and Planning C: Government and Policy,
employer, new hospital, etc.). This ability to 19: 587606.
examine both household dynamics and the Ballas, D. and Clarke, G.P. (2001b). Towards local
impacts of infrastructure change allow the implications of major job transformations in the city:
analyst to explore both social policy impacts a spatial microsimulation approach. Geographical
Analysis, 33: 291311.
(tax or welfare changes for example) and/or
area-based policy impacts (new job creation, Ballas, D., Clarke, G.P. and Dewhurst, J. (2006).
new retail centre, etc.). Modelling the socio-economic impacts of major job
The research agenda outlined in the second loss or gain at the local level: a spatial microsimula-
tion framework. Spatial Economic Analysis, vol. 1(1),
half of the chapter is clearly our personal one pp. 127146.
but one that we hope other microsimulation
Ballas, D., Clarke, G.P., Dorling, D., Eyre, H., Rossiter, D.
modellers would at least partially agree
and Thomas, B. (2003). SimYork: Simulating Current
with. The agenda has not been presented and Future Trends in the Life of Households in
in any particular order of importance but York, report to the Joseph Rowntree Foundation,
the issue of how such models can support May 2003.
traditional spatial modelling seems a key Ballas, D., Clarke, G.P., Dorling, D., Eyre, H., Rossiter, D.
task to address in the short term. As we and Thomas, B. (2005). SimBritain: A spatial
noted above, a start has been made in this microsimulation approach to population dynamics,
direction but perhaps the greatest challenge Population, Space and Place, 11: 1334.
is merging microsimulation with more macro Ballas, D., Clarke, G.P., Feldman, O., Gibson, P.,
techniques such as input-output models. The Jianhui, J., Simmonds, D. and Stillwell, J. (2005b).
latter models are excellent for modelling A Spatial Microsimulation Approach to Land-use
Modelling, CUPUM 2005 (Computers in Urban
the interactions between key sectors of
Planning and Urban Management ) Conference
the economy but not so good at spatially Proceedings, UCL, London 29 June1 July 2005
disaggregating the outputs within cities and (available on-line from: http:// 128.40.59.163/
regions. A methodology which could feed cupum/searchPapers/papers/paper276.pdf)
individual households into the economic
Ballas, D., Clarke, G.P., Dorling, D. and Rossiter, D.
system at both stages of the modelling (2007). Using SimBritain to Model the Geographical
process (inputs and outputs) could be a Impact of National Government Policies, Geograph-
major advantage in future policy work. We ical Analysis, 39(1): 4477.
hope we can address this issue in the next Ballas, D., Kingston, R. and Stillwell, J. (2004). Using a
few years. spatial microsimulation decision support system for
policy scenario analysis. In: van Leeuwen, J. and
Timmermans, H. (eds), Recent Advances in Design
and Decision Support Systems in Architecture and
Urban Planning, pp. 177192. Dordrecht: Kluwer.
REFERENCES
Ballas, D., Kingston, R. and Stillwell, J. and Jin, J.
Ballas, D. (2001). A spatial microsimulation approach (2007). Building a spatial microsimulation-based
to local labour market policy analysis, unpublished planning support system for local policy making.
PhD thesis, School of Geography, University of Leeds. Environment and Planning A, 39(10): 24822499.
Ballas, D. and Clarke, G.P. (2000). GIS and microsim- Ballas, D., Rossiter, D., Thomas, B., Clarke, G.P. and
ulation for local labour market policy analysis. Dorling, D. (2005). Geography Matters: Simulating
Computers, Environment and Urban Systems, 24: the Local Impacts of National Social Policies,
305330. Joseph Rowntree Foundation contemporary research
issues, Joseph Rowntree Foundation, York.
Ballas, D. and Clarke, G.P. (2001a). Modelling
the local impacts of national social policies: a Batey, P.W.J. (2003). Extended inputoutput modelling
spatial microsimulation approach. Environment of regional impacts: does detail make a difference?
paper presented at the Royal Geographical Society Davidsson, P. (2000). Multi agent based simulation:
Annual Conference 2003 (Special session on beyond social simulation. In: S. Moss and
50 years of Regional Science or the Return of P. Davidsson (eds), Multi Agent Based Simulations,
Quantitative Economic Geography), London, 35 pp. 97100. Berlin: Springer.
September 2003.
Davies, H. and Joshi, H. (1992). Constructing
Batty, M. and Densham, P. (1996). Decsion support, GIS Pensions for Model Couples, in R. Hancock and
and urban planning. Systemma Terra, V(1): 7276. H. Sutherland (eds), Microsimulation Models for
Public Policy Analysis: New Frontiers, Suntory-Toyota
Birkin, M. and Clarke, M. (1988). SYNTHESIS a International Centre for Economics and Related
synthetic spatial information system for urban Disciplines LSE, London, 6796.
and regional analysis: methods and examples.
Environment and Planning A, 20: 16451671. Evans, A., Kingston, R., Carver, S. and Turton, I.
(1999). Web-based GIS to enhance public demo-
Birkin, M. and Clarke, G.P. (1995). Using microsim- cratic involvement, paper presented at the 4th
ulation methods to synthesize census data. In: International Conference on GeoComputation,
Openshaw, S. (ed.), Census Users Handbook, Fredericksburg, Virginia, USA, 2528 July.
pp. 363387. London: GeoInformation International.
Evandrou, M. and Falkingham, J. (1995). Gender, Lone-
Birkin, M. and Clarke, M. (1989). The generation parenthood and Lifetime Incomes, in J. Falkingham
of individual and household incomes at the small and J. Hills (eds), The dynamic of welfare: the welfare
area level using Synthesis, Regional Studies, 23: state and the life cycle, Prentice Hall/Harvester
535548. Wheatsheaf, New York, pp. 167183.
Birkin, M., Clarke, G.P. and Clarke, M. (1996).
Falkingham, J., Harding, A. and Lessof, C. (1995).
Urban and regional modelling at the microscale. In:
Simulating lifetime income distribution and redis-
Clarke, G.P. (ed.), Microsimulation for Urban and
tribution. In: J. Falkingham and J. Hills (eds),
Regional Policy Analysis, pp. 1027. London: Pion.
The Dynamic of Welfare: the Welfare State and
Boman, M. and Holm, E. (2004). Multi-agent the Life Cycle, pp. 6282. New York: Prentice
systems, time geography and microsimulations. Hall/Harvester Wheatsheaf.
In: M.O. Olsson and G. Sjostedt (eds), Systems,
Falkingham, J. and Lessof, C. (1992). Playing God or
Approaches and their Application, pp. 95118.
LIFEMOD The construction of a dynamic microsim-
Dordrecht: Kluwer Academic.
ulation model. In: R. Hancock and H. Sutherland
Caldwell, S.B. and Keister, L.A. (1996). Wealth in (eds), Microsimulation Models for Public Policy
America: family stock ownership and accumulation, Analysis: New Frontiers, pp. 532. London: Suntory-
19601995. In: Clarke, G.P. (ed.), Microsimulation Toyota International Centre for Economics and
for Urban and Regional Policy Analysis, pp. 88116. Related Disciplines LSE.
London: Pion.
Falkingham, J. and Hills, J. (1995a). The effects of the
Caldwell, S.B., Clarke, G.P. and Keister, L.A. (1998). welfare state over the life cycle. In: J. Falkingham and
Modelling regional changes in US household income J. Hills (eds), The Dynamic of Welfare: the Welfare
and wealth: a research agenda. Environment and State and the Life Cycle, pp. 83107. New York:
Planning C: Government and Policy, 16: pp. Prentice Hall/Harvester Wheatsheaf.
707722.
Falkingham, J. and Hills, J. (1995b). Redistribution
Clarke, G.P. (1996). Microsimulation: an introduction. between people or across the life cycle? In:
In: Clarke, G.P. (ed.), Microsimulation for Urban and J. Falkingham and J. Hills (eds), The Dynamic of
Regional Policy Analysis, pp. 19. London: Pion. Welfare: the Welfare State and the Life Cycle,
pp. 137149. New York: Prentice Hall/Harvester
Clarke, M. and Spowage, M. (1984). Integrated models Wheatsheaf.
for public policy analysis: an example of the practical
use of simulation models in health care planning. Fotheringham, A.S., Brunsdon, C. and Charlton, M.
Papers of the Regional Science Association, 55: (2000). Quantitative Geography: Perspectives on
2546. Spatial Data Analysis. Sage Publications.
Clarke, G. and Stillwell, J.C.H. (eds) (2004). Applied GIS Glennerster, H., Falkingham, J. and Barr, N. (1995).
and Spatial Modelling, London, Wiley. Education funding, equity and the life cycle.
In: J. Falkingham and J. Hills (eds), The Dynamic decision-making, Computers, Environment and
of Welfare: the Welfare State and the Life Cycle, Urban Systems, 24: 109.
pp. 150166. New York: Prentice Hall/Harvester
Longley, P.A. and Batty, M. (2003). (eds), Advanced
Wheatsheaf.
spatial analysis: The CASA book of GIS, Redlands,
Hgerstrand, T. (1967). Innovation diffusion as a spatial CA: ESRI Press.
process, University of Chicago Press, Chicago.
Longley, P.A., Goodchild, M.F., Maguire, D.J. and
Hancock, R. (2000). Changing for care in later life: an Rhind, D.W. (eds.) (1999). Geographical Information
exercise in dynamic microsimulation. In: L. Mitton, Systems: Management Issues and Applications.
H. Sutherland and M. Weeks (eds), Microsimulation New York: Wiley.
Modelling for Policy Analysis: Challenges and
Martin, D. (1996). Geographic Information Systems:
Innovations, pp. 226237. Cambridge: Cambridge
Socioeconomic Applications. London: Routledge.
University Press.
Mertz, J. (1991). Microsimulation A survey of prin-
Hancock, R. and Sutherland, H. (eds) (1992). ciples developments and applications, International
Microsimulation Models for Public Policy Analysis: Journal of Forecasting, 7: 77104.
New Frontiers. London: Suntory-Toyota Inter-
national Centre for Economics and Related Mitton, L., Sutherland, H. and Weeks, M. (eds) (2000).
Disciplines LSE. Microsimulation Modelling for Policy Analysis:
Challenges and Innovations. Cambridge: Cambridge
Hancock, R., Mallender, J. and Pudney, S. (1992). University Press.
Constructing a computer model for simulating the
future distribution of pensioners incomes for Great Nakaya, T., Yano, K., Fotheringham, A.S., Ballas, D.
Britain. In: R. Hancock and H. Sutherland (eds), and Clarke, G.P. (2003). Retail interaction modelling
Microsimulation Models for Public Policy Analysis: using meso and micro approaches, Paper presented
New Frontiers. pp. 3366. London: Suntory-Toyota at the 33rd Regional Science Association, RSAI
International Centre for Economics and Related British and Irish Section Conference, St. Andrews,
Disciplines LSE. Scotland, 2022 August.
Harding, A. (ed.) (1996). Microsimulation and Public Nelissen, J.H.M. (1993). Labour market, income
Policy. Amsterdam: North Holland, Contributions to formation and social security in the microsimula-
Economic Analysis 232. tion model NEDYMAS, Economic Modelling, 10:
225272.
Heppenstall, A.J., Evans, A.J. and Birkin, M.H. (2007).
Openshaw, S. (1995). Human systems modelling as a
Genetic Algorithm Optimisation of a Multi-Agent
new grand challenge area in science. Environment
System for Simulating a Retail Market. Environment
and Planning A, 27: 159164.
and Planning B, 34: 10511070.
Orcutt, G.H. (1957). A new type of socio-economic
Holm, E., Lindgren, U., Makila, K. and Malmberg, G.
system. The Review of Economics and Statistics,
(1996). Simulating an entire nation. In: Clarke, G.P.
39: 116123.
(ed.), Microsimulation for Urban and Regional Policy
Analysis, pp. 164186. London: Pion. Orcutt, G.H., Mertz, J. and Quinke, H. (eds) (1986).
Microanalytic Simulation Models to Support Social
Hooimeijer, P. (1996). A life-course approach to
and Financial Policy. Amsterdam: North-Holland.
urban dynamics: state of the art in and research
design for the Netherlands. In: Clarke, G.P. (ed.), Orcutt, G.H., Greenberger, M., Korbel, J. and Rivlin, A.
Microsimulation for Urban and Regional Analysis, (1961). Microanalysis of Socioeconomic Systems:
pp. 2863: London: Pion. A Simulation Study, Harper and Row, New York.
Huang, Z. and Williamson, P. (2001). A comparison Paas, G. (1986). Statistical match: Evaluation of
of synthetic reconstruction and combinatorial opti- existing procedures and improvements by using
misation approaches to the creation of small-area additional information, in G.H. Orcutt, J. Mertz and
microdata, Working Paper 2001/2, Department of H. Quinke (eds), Microanalytic Simulation Models to
Geography, University of Liverpool. Support Social and Financial Policy, North-Holland,
Amsterdam, 401421.
Kingston, R., Carver, S., Evans, A. and Turton, I.
(2000). Web-based public participation geographical Propper, C. (1995). For richer, for poorer, in sickness
information systems: an aid to local environmental and in health: The lifetime distribution of NHS
health care. In: J. Falkingham and J. Hills (eds), and Quantitative Geography European colloquium,
The Dynamic of Welfare: the Welfare State and Durham Castle, Durham, 37 September, 1999.
the Life Cycle. pp. 184203. New York: Prentice
Voas, D. and Williamson, P. (2000). An evaluation
Hall/Harvester Wheatsheaf.
of the combinatorial optimisation approach to the
Pudney, S. and Sutherland, H. (1994). How reliable are creation of synthetic microdata. International Journal
microsimulation results? An analysis of the role of of Population Geography, 6: 349366.
sampling error in a UK tax-benet model, Journal of
Voas, D. and Williamson, P. (2001). Evaluating
Public Economics, 53: 327365.
goodness-of-t measures for synthetic microdata,
Redmond, G., Sutherland, H. and Wilson, M. (1998). Geographical and Environmental Modelling,
The Arithmetic of Tax and Social Security Reform: 5: 177200.
a Users Guide to Microsimulation Methods and
Wegener, M. and Spiekermann, K. (1996). The potential
Analysis, Cambridge: Cambridge University Press.
of microsimulation for urban models, in G.P. Clarke
See, L. and Openshaw, S. (2001). Fuzzy geodemo- (ed.) Microsimulation for Urban and Regional Policy
graphic targeting. In: G.P. Clarke and M. Madden Analysis, Pion, London, 149163.
(eds), Regional Science in Business, 269282. Berlin:
Wertheimer II, R., Zedlewski, S.R., Anderson, J.,
Springer-Verlag.
Moore, K. (1986). DYNASIM in comparison with
Smith, D., Clarke, G.P., Ransley, J. and Cade, J. other microsimulation models, in G.H. Orcutt,
(2006) Food access and health: a microsimulation J. Mertz and H. Quinke (eds), Microanalytic
framework for analysis. Studies in Regional Science, Simulation Models to Support Social and Financial
35(4), 909927. Policy, North-Holland, Amsterdam, 187206.
Sutherland, H. and Piachaud, D. (2001). Reducing child Williamson, P. (1992). Community care policies for the
poverty in Britain: an assessment of government elderly: a microsimulation approach. Unpublished
policy 19972001, The Economic Journal, 111: PhD Thesis, School of Geography, University of
85101. Leeds, Leeds.
Sutherland, H., Sefton, T. and Piachaud, D. (2003). Williamson, P., Birkin, M. and Rees, P. (1998).
Poverty in Britain: the Impact of Government Policy The estimation of population microdata by using
since 1997, Joseph Rowntree Foundation, York (also data from small area statistics and samples of
available on-line from: http://www.jrf.org.uk) (ISBN anonymised records. Environment and Planning A,
1 85935 152 2). 30: 785816.
Sutherland, H., Taylor, R. and Gomulka, J. (2002). Williamson, P. and Voas, D. (2000). Income estimates
Combining household income and expenditure data for small areas: lessons from the census rehearsal,
in policy simulations, Review of Income and Wealth, BURISA, 146: 210.
48(4): 7594.
Williamson, P. (1999). Microsimulation: An idea whose
Taylor, M.F., Brice J., Buck, N., Prentice-Lane, E. time has come? paper presented at the 39th
(2001). British Household Panel Survey User Manual European Regional Science Association Congress,
Volume A: Introduction, Technical Report and University College Dublin, Dublin, Ireland, 2327
Appendices. Colchester: University of Essex. August.
Veldhuisen, K.J., Kapoen, L.L. and Timmermans, H.J.P. Wilson, A. and Pownall, C.E. (1976). A new
(2000). RAMBLAS: A regional planning model based representation of the urban system for modelling and
on the micro-simulation of daily activity patterns, for the study of micro-level interdependence, Area, 8:
Environment and Planning A, 31: 427443. 246254.
Vencatasawmy, C.P., Holm, E. and Rephann, T. Wilson, A.G. (2000). Complex Spatial Systems: the
et al. (1999). Building a spatial microsimulation Modelling Foundations of Urban and Regional
model, paper presented at the 11th Theoretical Analysis. London: Prentice Hall.
16
Detection of Clustering in
Spatial Data
Lance A. Waller
16.1. INTRODUCTION In this chapter, we review analytic methods

for detecting clusters or hot spots in
It is human nature to seek pattern within spatially-referenced data. We begin with a
any complex display of information. We discussion of what we mean conceptually,
organize stars into constellations, devour geographically, and mathematically by the
mystery novels, and even give detailed term cluster, then discuss and illustrate
descriptions of ink stains to analysts. This many standard approaches proposed and
innate desire for order within chaos applies applied in the literature within a variety of
spatially as well. Given a map of a set scientific fields. Many analytic approaches
of locations of an event, say, residences of for detecting clusters have been summa-
cases of a particular type of disease or the rized in several texts (Elliott et al., 1992,
locations of a particular type of crime, we 1999; Cressie, 1993; Bailey and Gatrell,
seek patterns that might reveal something 1995; Goldsmith et al., 2000; Lawson,
about the underlying process generating the 2001; Lawson and Denison, 2002; Diggle,
events, be that a common environmental 2003; Waller and Gotway, 2004; Eck et al.,
exposure or the habits of a particular 2005), so we do not attempt a complete
criminal. In short, our hope is that arrang- review here. Rather, we focus on developed
ing what we know spatially might reveal and developing conceptual and theoretical
something about how the events arise in the constructs behind many of the methods
first place. while contrasting the underlying questions of
4. Question we can
1. Question we
answer with data and
want to answer.
methods we have.
2. Data and methods 3. Data we can get.

we need to answer Methods we can use.
question.
Figure 16.1 The whirling vortex of spatial data analysis.
interest driving different families of analytic spatial data with the increasing sophistication
approaches. and data holdings of geographic information
To set the stage conceptually, Figure 16.1 systems (GISs). One is increasingly faced
provides a starting point for developing and with the ease of including found data
evaluating analytic methods for detecting collected by others that seems to fit the bill
clusters and clustering. We begin with for the data one would really like to have.
Step 1 with a question we wish to answer After obtaining the data we can retrieve,
(for example, Are disease risks elevated we conduct analysis on these available data,
for individuals living near a source of often without explicitly acknowledging that
pollution?). The question of interest defines our analyses may be addressing slightly
the sorts of data and methods we require to different questions (e.g., in our conceptual
answer the question (for example, individual- example, we have moved from a question
level case status and individual exposure involving associations between disease and a
histories). However, the data required often particular exposure, to associations between
are unavailable for reasons varying from cost disease and present proximity to a known or
to privacy and we often settle for related data suspected exposure source). As a final step,
we can obtain within budget and satisfying we should carefully examine how closely
availability constraints (for example, present the questions we do answer mirror those we
residential location of cases and proximity to originally intended to answer. All too often,
known sources of pollution). Similarly, avail- this last step is ignored.
able methods may only address part of the While we can consider the steps shown in
question or may be particularly susceptible Figure 16.1 as a linear set of steps (1, 2, 3, 4),
to data shortcomings (for example, missing it is often a loop where the answers obtained
data or location inaccuracy). This situation on the available observational data in Step 4
is particularly relevant in the analysis of inform on refinements to the questions
DETECTION OF CLUSTERING IN SPATIAL DATA 301
asked in Step 1 and suggest limitations the national annual incidence rate applied
arising due to the data compromises between directly to all individuals in the study area.
Steps 2 and 3. That is, the aggregation of six cases appears
to be unlikely under a simple model of all
children experiencing equal risk. Contrast
this example with that of clustering where we
16.2. WHAT ARE WE LOOKING FOR? observe multiple pockets of higher incidence
than expected from national rates, perhaps
It is appropriate to begin by considering the separated by areas of lower-than-expected
very basic question: What exactly do we hope local rates.
to find? Besag and Newell (1991) provide Besag and Newell (1991) also note the
several important observations relevant to the difference between seeking clusters or clus-
search for clusters. The first key distinction is tering anywhere versus around particular
between detection of clusters and the detec- locations of interest. They denote the former
tion of clustering. A cluster represents an as general methods and the later as focused
unusual collection of events while clustering methods, also referred to as global and
represents a general tendency for events to local methods, respectively, in the geog-
occur nearer other events than one might raphy literature by Anselin (1995) and in
expect. the disease clustering literature by Kulldorff
These definitions of cluster and et al. (2003).
clustering differ from those found in cluster As suggested by Figure 16.1, seeking
analysis, a set of analytical classification general or focused clusters or clustering
methods designed to group observations into defines different questions of interest and,
clusters wherein observations within the as a result, methods appropriate for seeking
same cluster are more alike than those from individual clusters might not be the best
different clusters. The overlap in terminology approach to measure clustering and vice
can be confusing when reviewing the versa. We will explore this in more depth in
literature, especially since some spatial the examples below.
methods to detect clusters and/or clustering The general ideas of clusters and clus-
utilize concepts and methods from cluster tering arise in many different disciplines.
analysis (Knorr-Held and Raer, 2000; However, each discipline often brings its
Denison and Holmes, 2001). As illustrated own particular sets of questions of interest,
in Figure 16.1, it remains critical to clearly assumptions regarding data availability, and
identify goals and conclusions in the context familiar statistical methods. For example, the
of both the questions addressed and the fields of epidemiology and criminology both
methods used to address them. exhibit interest in the detection of clustering
In the discussion below, we follow Besag within geographically referenced data sets.
and Newell (1991) and take the term However, the sets of techniques appearing in
cluster to define an anomaly, an interesting their respective literatures are largely distinct
collection of spatial locations that appears and cross-references between the fields are
to be inconsistent with some background rare. This situation is unfortunate since both
conceptual model of how events arise. For fields could draw from the experiences and
instance, a cancer registry might report six ideas of the other. Figure 16.1 provides
new cases of childhood leukemia in a small a general context for comparison and we
neighborhood in a particular year, when express and compare ideas from recent
only one new case would be expected if surveys in both fields in the sections below.
The remainder of the chapter addresses The background information is critically

the typical types of data available for cluster important in the interpretation of any detected
detection; some basic analytic concepts, clusters since it defines the amount of
assumptions, and complications; an illustra- clustering we would expect under some null
tive data set from archaeology; an overview model of event occurrence. This null model
of some different approaches for detecting defines the patterns we would expect in the
clusters and/or clustering in point-referenced absence of anomalies. A common null model
data with application to the data set; and is one of constant risk where each individual
general conclusions. As a result, the chapter in the study area experiences an identical
represents more of a review of the questions probability of experiencing the event under
one should ask in performing a search for study. To illustrate the importance of the
clusters or clustering than an exhaustive background information, consider as a con-
collection of methods. trived example a collection of six childhood
leukemia cases in one neighborhood which
would seem very unusual if only six children
reside there but not as unusual if 600,000
16.3. WHAT DATA DO WE HAVE? children live there. The background data
coupled with the null model provide our
As one might expect, the typical data for clus- statistical point of reference for detecting
ter detection consist of locations on a map. clustering and clusters.
These may be point locations of events or We also may have spatially-referenced
may represent counts of events occurring covariate information providing information
within a set of zones partitioning the study regarding the spatial distribution of factors
area into non-overlapping pieces. Examples impacting the local probability of the events
of the latter include census enumeration of interest. For instance, the incidence of
districts, postal zones, or other administra- most cancers increases dramatically with
tive regions. Regional counts may arise to age. As a result, we would tend to expect
preserve individual confidentiality or simply more cases in neighborhoods with higher
due to the relative ease of obtaining records numbers of older residents. The covariate
sorted by political district, mailing addresses, information may include both endogenous
or other identifier. We concentrate on point- and exogenous variables. In some sense, the
referenced data in the development below covariate information is collected to define
noting that methodologically we typically uninteresting clustering, that is, clustering
assume a latent, unobserved set of points for reasons we already know or suspect. In
behind regional counts and many of the most cases, cluster detection builds from a
analytic tools used for points provide the desire to identify areas where the observed
basis for similar tools for counts (Waller and pattern of events doesnt match our general
Gotway, 2004, Chapters 67). expectations.
In addition to the point locations or
regional counts of events, it is often very
important to have access to data defining
the spatial heterogeneity of the population 16.4. WHAT ANALYTIC TOOLS CAN
from which our events are drawn. These WE USE?
may be potential crime victims, individuals
susceptible to the disease of interest, or Most methods to detect clusters and
simply the population sizes for each area. clustering build from probability models
operationalizing the null model mentioned area is a constant, denoted l, across the
above. As a result, most tools aim to define entire study area. CSR corresponds to a
some measure of the unusualness of a spatial Poisson point process yielding the
cluster, then determine the distribution of following features: the number of events
this quantity under the null (uninteresting) observed in a region A within the study
model, and compare the quantity based on the area follows a Poisson distribution with
observed data to this null distribution (Waller mean l|A| where |A| denotes the area of
and Jaquez, 1995). In a statistical hypothesis A, the number of observed events in non-
setting, the null hypothesis is defined concep- overlapping areas are independent of one
tually as the absence of clusters/clustering, another, and, given the observed number
and operationally as the expected distribution of events, events are uniformly distributed
of our measure (statistic) under the null across the study area (and any region within
model. it). For clarity we follow Diggle (2003)
As a result, the analytic tools required and distinguish between an event location
for statistical inference are a definition of where an observed event did occur, and a
our statistic and its null distribution. In the point location where an event could occur.
sections below, we will illustrate several A typical data set consists of a set of
types of statistics and contrast the underlying event locations and we often compare the
questions addressed by each. value of our statistic based on events to
Before defining particular methods, we the distribution of values associated with
offer a brief review of some basic proba- randomly selected events.
bilistic elements for point-referenced event While CSR represents a complete lack of
locations driving many of the methods clustering, data generated by CSR nonethe-
illustrated below. The first is the definition less visually exhibits some clumping and
of complete spatial randomness (CSR). gapping due to the inherent randomness,
A set of events arising from CSR has the and one purpose of a statistical test is to
following properties: first, the total number determine whether the observed patterns in
of events observed in the study area follows our data are more extreme than the amount of
a Poisson distribution; second, given the clumping and gapping expected under CSR.
observed number of events, event locations Figure 16.2 illustrates three realizations of
occur independently of one another and CSR with 100 events uniformly distributed
the expected number of events per unit across a square. It is worth noting that the
1.0
1.0
1.0
0.8
0.8
0.8
0.6
0.6
0.6
y
y
0.4
0.4
0.4
0.2
0.2
0.2
0.0
0.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x x x
Figure 16.2 Three examples of complete spatial randomness (CSR).

uniform distribution of event locations rep- a lurking issue in the analysis of spatial
resents a uniform probability of occurrence, pattern in general, and specifically in
not an evenly spaced set of events. the detection of clusters. Bartlett (1964)
CSR represents a convenient null model showed that, without additional information,
and many tests of CSR exist in the literature a pattern of independent events arising
(Cressie, 1993, p. 604), but CSR may from a process with spatially varying
not be the appropriate reference pattern intensity is mathematically indistinguishable
for applications where the population at from a process of spatially dependent
risk is spatially heterogeneous. A common events arising from a process with
adjustment is the use of a heterogeneous spatially constant intensity, let alone
Poisson process where the number of events from patterns based on spatial variations
expected per unit area is allowed to vary with in both correlation and intensity. The
location. If we define l(x) as the expected additional information allowing one
number of events per unit area at location to separate the intensity and correlation
(point) x, we refer to l(x) as the intensity effects could be based on temporal
function of the process. We adjust the Poisson ordering of events to see if the location
process properties as follows: first, the of past events influences future events
number of events observed in any region still (for example, with infectious diseases or
follows a Poisson distribution but now with diffusion of new technologies), or replicated
the mean defined as the integral of l(x) over observations of the same process over
that region, counts from non-overlapping time to see if a suspected cluster remains
regions remain statistically independent, and in the same location (for example, near
events are distributed according to a (spatial) a putative source of increased risk) or
probability density function proportional to if one observes similar patterning but in
the intensity function. That is, more events different locations for each time period.
are expected in locations where the intensity When contrasting methods based on
function is high, and fewer events are independent or dependent events, it is
expected in locations where the intensity important to recognize that both approaches
function is low. represent an idealization of reality: neither
The heterogeneous Poisson process approach is right, both are useful, but each
offers a flexible model of the spatial answers our questions of interest in slightly
distribution of point-locations of events, different ways.
and its properties regarding counts for non- The basic probability models described
overlapping regions define the distributional above also provide a recipe for simulating
basis for several commonly-used models sets of events following a given null model,
for regional count data. However, the thereby providing a powerful tool for Monte
assumed independence of counts raises some Carlo-based statistical inference. Recall that
eyebrows, especially among geographers in frequency-based statistical hypothesis test-
for whom spatial autocorrelation is often ing, one often considers the p-value, the
a fundamental assumption in any spatial probability under the null hypothesis of
analysis (Toblers First Law of Geography; observing a more extreme value of the test
Tobler, 1970). The distinction between statistic than one observes in the data set.
a process defined by independent events Monte Carlo hypothesis testing (Barnard,
with spatially patterned means versus a 1963; Waller and Gotway, 2004; Chapter 5)
process defined by spatially correlated uses simulation to estimate this probability
counts with identical means represents by generating multiple data sets under the
null model, calculating the test statistic for the local probability of an event. Random
each, constructing a histogram of these values labelling provides a second null model,
as an approximation to the null distribution similar to the first, but designed when one has
of the test statistic, and calculating the a sample of event locations and a sample of
proportion of test statistic values associated non-event or control locations (individuals
with null simulations exceeding the value sampled from the population at risk of events)
of the test statistic associated with the (Diggle, 2003; Waller and Gotway, 2004,
observed data. Note that the accuracy of Chapter 6) wherein we condition on the
the estimated p-value is a function of observed locations and randomly assign the
the number of simulations, not the sample event status (label) among the full set of
size of the observed data, thereby putting locations. That is, if we observe 30 events
the level of accuracy into the analysts and have locations for 70 individuals not
hands. This is not to say that sample size experiencing an event (controls), we keep the
is unimportant. Sample size impacts the set of 100 locations, and randomly assign
variation of the statistic under the null and 30 of these to be events in each simulated
alternative hypotheses, while the number data set. Note that random labelling assumes
of simulations controls the accuracy of the a constant probability of event assignment,
simulation-based tail probability ( p-value) based on the observed ratio of events to non-
estimates. In some cases, theoretical deriva- events. At first glance, this seems identical
tions of proposed test statistics exist, but to the constant risk assumption but two
often these are based on particular distribu- subtle differences remain. First, the random
tional assumptions (for example, Gaussian labelling hypothesis is conditional on the
or normally-distributed observations) and it set of locations (both event and non-event)
is not always immediately clear whether so random labelling simulations will not
the results apply in settings having different place events in any other locations. Second,
structures. In contrast, as long as one can constant risk simulations could be based
simulate data under a reasonable null model, on an event probability estimated from the
the Monte Carlo approach yields accurate observed data or could be based on an
inference. externally reported probability (for example,
Two general null models are worth national disease or crime rates). If the
mention in our discussion of Monte Carlo study takes place in an area different from
techniques for the detection of clus- that providing the basis for the external
ters/clustering. The first, mentioned above, probability, it is possible that the local
is that of constant risk, that is, an assumed probability is sufficiently higher or lower
constant probability of the event outcome than the external probability so the observed
for each individual under study. If one has data will seem inconsistent with simulated
either point locations or regional counts values based on the external value for
reflecting a census, one can estimate the no other reason than the discrepancy in
overall global risk of the event through the the background probability and not due
overall observed proportion of individuals to spatial clusters or clustering within the
experiencing the event. Then, one may data set.
randomly assign the observed number of Again referring to Figure 16.1, each
events to the population at risk to obtain of these steps represents a decision that
each simulated data set. The constant risk may, subtly or not, impact the question
null model can also adjust for local covariate addressed in the analysis. In the develop-
effects by using the covariates to define ment, implementation, and review of specific
spatial analyses, it is important to design, field research in the area between 1967
report, and understand the type of null model and 1987 leading to a body of research
driving the simulations in order to place summarized in texts by Gumerman (1970),
results within the proper context and to Gumerman et al. (1972), Plog (1986), and
connect Steps 4 and 1 in Figure 16.1. Powell and Smiley (2002). The study is
Finally, it is worth noting that there are relatively unique in its careful survey of
many more advanced computational and a large tract of land and detailed mapping
mathematical methods of statistical analysis of the location of every site discovered on
of point patterns under current development. the surface. For our illustrative purposes,
Such models allow one to define parametric we make the simplifying assumption of a
models of clustering of event locations constant probability of detection of surface
(Lawson and Denison, 2002), assign random sites regardless of age or location. Figure 16.3
measurements (often referred to as marks) represents data locations abstracted from
to event locations, or allow interactions maps presented in Plog (1996). The 100
between multiple point processes observed open circles represent sites dated to the
over the same spatial study area (see Mller time period 9501049 CE and the 390
and Waagepetersen (2002) for detailed tech- filled circles represent sites dated to the
nical development). Many of these make time period 10501150 CE. The later period
use of computationally intensive Markov represents a time of great expansion of
chain Monte Carlo (MCMC) methods for the Anasazi culture (as represented by the
likelihood or Bayesian inference for para- increased number of settlement sites), but
metric models of point processes. However, ends coincident with a time of large-
the non-parametric Monte Carlo approaches scale abandonment of sites by the Anasazi
presented below represent exploratory tech- throughout the southwestern United States c.
niques for detecting the presence of clusters 11001150 CE.
and/or clustering without explicitly modeling To illustrate the methods described below,
the type of clustering. The approaches illus- we will compare spatial patterns between
trated here offer robust statistical inference the early and late sites represented in the
and a good starting place for analysis. data set, seeking both clusters and clustering
within the data sets.
16.5. ILLUSTRATIVE DATA SET:

ANASAZI SITES ON BLACK 16.6. DETECTING CLUSTERING
MESA, ARIZONA
We begin with a general examination of
To illustrate these concepts and to provide clustering, the overall tendency for events
an illustration of the methods below, we to occur near other events. In the Anasazi
consider a data set from the field of data, possible questions of interest are: Do
archaeology. The Peabody Coal Company we observe clustering among all sites?
leased land on the Black Mesa in northeastern and Do we observe different types of
Arizona, USA for coal mining. As part clustering among the early and the late
of the lease, the company contracted with sites? We focus on the latter question but,
archaeologists to conduct a detailed survey in the spirit of Figure 16.1, consider how
of archaeological sites in the lease area. The it differs from the former in discussions
Black Mesa Archaeology Project conducted below.
Anasazi sites
150
100
Early sites
v
Late sites
50
0
0 50 100 150
u
Figure 16.3 The Anasazi data set from the Peabody Coal Eastern Lease on Black Mesa in
northeastern Arizona. Empty circles represent early sites (dated 9501149 CE) and lled
circles represent locations of later sites (dated 10501150 CE).
16.6.1. Who is my neighbor? There is a long tradition of exploring

Nearest neighbor analysis nearest neighbor patterns in spatial data (Cliff
First, suppose we observe two types of events and Ord, 1973) and Cuzick and Edwards
in the same study area. In our data example, (1990) propose a test of clustering of one
these correspond to early and late sites and type of event within a set of two types of
the question of interest becomes: Does the events in the same area. The test statistic
pattern of clustering in late sites differ from is defined for a fixed number (k) of nearest
that in early sites? Note that this question neighbors and is, intuitively, the total number
explores the relative degree of clustering of late sites observed within the k nearest
within the set of early and late sites, not neighbors of other late sites. More formally,
whether either set of sites exhibits clustering suppose we observe N events of which nlate
or not. are late sites. If we define the matrix B to
have elements Bk,ij = 1 if event j is in 16.6.2. Second-order measures and

the k nearest neighbors of event i and if spatial scale
we define i = 1 if the ith event is late
The nearest neighbor approach above
and i = 0 otherwise, then the test statistic
explores clustering of event types among
becomes:
the sets of nearest neighbors but ignores
inter-event distances. Statistical estimation
of evidence for clustering as a function of

N
N distance provides an approach that addresses
Tk = Bk,ij i j . the question of clustering in a slightly
i=1 j=1 different manner.
The most commonly used distance-based
approach for assessing clustering among
a single set of events is the so-called
If late sites exhibit more clustering than
K function originally due to Ripley
early sites, we should observe more late
(1977).
sites near other late sites than we would
The K function is a function, K(d), of
expect under a random assignment of late
distance d defined as the average number of
sites to the observed locations of either
additional events observed within distance d
type of event. Cuzick and Edwards (1990)
of a randomly chosen event, scaled by the
derive an asymptotic normal distribution for
overall intensity (average number of events
the test statistic under the null hypothe-
per unit area). As a result, we could estimate
sis, but Monte Carlo tests under random
the K function via:
labelling are applicable for any sample
size.
Figure 16.4 illustrates the observed test
statistic, a histogram approximation to the
1 1
N
N
null distribution and the associated Monte ' ='
K(d) l (d(i, j) < d) (16.1)
Carlo p-value based on 999 simulations N
i=1 j=1
for odd numbers of nearest neighbors j=i
k = 3, 5, 7, 9, 11, and 13. None of the sets

of nearest neighbors considered suggest any
statistically significant clustering of late
sites among the set of early and late sites where N represents the number of observed
combined. events, ' l is an estimate the overall
In the spirit of our discussion of intensity of events, d(i, j) denotes the
Figure 16.1, the lack of statistically signifi- distance between events i and j, and
cant clustering of one type of events among (d(i, j) < d ) = 1 if d(i, j) < d and 0
its nearest neighbors does not necessarily otherwise. Note that the intensity l is
preclude the existence of a more general assumed to be constant so that any pattern
definition of clustering among sites. In in the events will be described within
addition, since clustering represents a feature the K function rather than as a spatially
averaged over the entire data set, non- heterogeneous intensity function. In prac-
significant clustering also does not preclude tice, we should make some adjustment
the existence of a few isolated clusters within for events observed near the edge of the
the data set. We next consider both options study area since events occurring nearby
with other analytic approaches. but outside of the study area will not
3 NN, p-value: 0.626 5 NN, p-value: 0.412

100 150 200 250 300
100 150 200 250 300

Frequency
Frequency
50
50
0
0
900 920 940 960 1520 1540 1560 1580 1600
Test statistic Test statistic

100 150 200 250 300
100 150 200 250 300

Frequency
Frequency
50
50
0
2120 2140 2160 2180 2200 2220 2240 2750 2800 2850

100 150 200 250 300
100 150 200 250 300

Frequency
Frequency
50
50
0
3350 3400 3450 3950 4000 4050 4100
Figure 16.4 Histograms and associated p-values of the cumulative number of late events
among the nearest neighbors of early events based on 999 random labelling simulations for
the Anasazi data set.
be observed. An edge corrected (ec) version not Do the late sites appear consistent
is provided by with CSR? but rather Do the late sites
exhibit more clustering than the early sites?
We can use a random labelling Monte
1
N
N
'ec (d) = '
K l (wij )1 (d(i, j) < d)
Carlo approach to address this question
i=1 j=1
by repeatedly sampling 390 sites from
j=i the set of early and late sites combined,
(16.2) estimating the K function and exploring the
variability of these estimates. Figure 16.5
illustrates the pointwise median, 2.5th and
where the average is replaced by a weighted
97.5th percentiles of estimates of 'L(d) d,
average with weight wij defined as the
based on 999 random labelling samples.
proportion of the circumference of the circle
We note that the estimate based on the
centered at event i with radius d(i, j)
data falls well within the band of values
which lies within the study area. With a
likely under the random labelling hypothesis
constant intensity, wij denotes the conditional
so that the observed set of late sites
probability of an event occurring at distance
does not differ from the patterns expected
d(i, j) from event i falling within the study
under random labelling in a statistically
area, given the location of event i. Note that
significantly way.
wij = 1 if the distance between events i and j
At this point, the pattern of the late sites
is less than the distance between event i and
does not appear to differ significantly from
the edge of the study area.
the pattern of the early sites either in its
Under CSR, K(d) = d 2 (the area of
observed nearest neighbor relationships or its
a circle with radius d and patterns exhibit
distance-based associations. However, both
clustering for K(d) > d 2 . To simplify the
approaches applied so far explore clustering
graphical expression of the K function, Besag
and we next consider approaches to evaluate
(1977) proposed a transformation:
the possible existence of clusters within the
late sites.
' 1/2
' Kec (d)
L(d) d = d

16.7. DETECTING CLUSTERS

where the first term on the right-hand side
equals d under CSR, so subtracting d yields We consider two conceptual approaches for
a CSR-associated reference value of zero. detecting clusters, namely, the detection of
Plotting d versus ' L(d) d allows us to the most unusual collection of events, and
quickly identify distances at which patterns the comparison of the distribution of event
exhibit clustering ('
L(d) d > 0) and those locations experiencing the phenomenon of
at which patterns appear too evenly spaced to interest (e.g., a disease case or a crime), to
be consistent with CSR (' L(d) d < 0). that of locations not experiencing the phe-
The thick line in Figure 16.5 provides nomenon (controls). These two approaches
a graph of ' L(d) for the late Anasazi sites. cover many but not all examples and we
The transformed K function is well above refer the interested reader to texts by Lawson
the CSR reference value of zero indicating (2001), Elliott et al. (1992, 1999), and Waller
more clustering that we would expect under and Gotway (2004) for additional approaches
CSR. However, the question of interest is and techniques.
L plot, late sites, random labeling

3
2
L(d )d
1
0
2.5th, 97.5th percentiles

median
1
0 20 40 60 80 100
Distance (d)
Figure 16.5 The estimate of the standardized K function (L(d )) for the late Anasazi sites
(solid line) compared to the median (dashed) and 95 percent tolerance bands based on
999 random labeling simulations.
covering the study area and the GAM

16.7.1. Finding the oddest ball in
approach mapped any circle whose collection
the urn: Scan statistics
of events was detected as unusual, e.g., those
If we consider a cluster to be defined by circles where the number of events exceeded
an unusual collection of events, then an the 99.8th percentile of a Poisson distribution
initial place to start is with methods designed with mean defined by the population size
to detect the most unusual collection (or within the buffer multiplied by the overall
collections) of events observed within the disease risk. (The use of the 99.8th percentile
data set. Such methods define a (large) set was an attempt to adjust for the extremely
of potential clusters, collections of events high number of hypothesis tests conducted,
each of which we might define as a cluster one for each potential cluster.)
if the collection appears unusual enough The GAM received a fair amount of
(discrepant from the null model of interest), attention, both in applications and in crit-
then identify the most unusual of these. icisms of the relatively ad hoc statistical
This general idea motivated the geograph- inference associated with it. Subsequent
ical analysis machine (GAM) of Openshaw methods proposed by Besag and Newell
et al. (1988) where potential clusters were (1991) and Turnbull et al. (1990) revised
defined as collections of events falling within the basic idea in more statistically-based
circular buffers of varying radii. The buffers ways, but the most widely-used variant
were centered at each point in a fine grid of this general idea is the spatial scan
statistic originally proposed by Kulldorff At this point, we have a value measuring

(1997) and freely available in the software the unusualness of each potential cluster.
package SaTScan (Kulldorff and Information Next, we identify the potential cluster with
Management Services Inc., 2002). the highest local likelihood ratio statistic as
The spatial scan statistic works as follows. the most likely cluster among the set of
The set of potential clusters consists of potential clusters considered.
all circular collections of cases centered at Next, we determine the statistical signifi-
observed cases or controls, and radii ranging cance of this most likely cluster, an important
from the minimum observed inter-event step since there will always be a most likely
distance to radii containing approximately cluster, i.e., the most unusual collection of
one-half of the study area. For each potential events considered. The relevant question is:
cluster, we measure its unusualness via a How unusual is this most unusual collection
local likelihood ratio statistic comparing a of events? Kulldorff (1997) addresses this
null hypothesis that events arise within the question in a clever way using Monte Carlo
potential cluster with the same probability hypothesis testing. Given the total number
as they do outside of the potential cluster to and locations of events of both types (those
an alternative hypothesis where events arise experiencing the phenomenon and those not),
within the potential cluster with a higher we randomly assign events among the
probability than outside of the potential set of all locations, find the most likely
cluster. If we assume events follow a cluster and save its associated likelihood
Poisson process within and without the ratio statistic. We repeat this exercise many
potential cluster, we are simply testing the times and construct a histogram of the
null hypothesis of equal intensities within maximum local likelihood ratio statistic for
and without the potential cluster versus the each random allocation. We estimate the
alternative hypothesis of a greater intensity statistical significance of the most likely
within the potential cluster. In this case, the cluster detected in our data set by the
local likelihood ratio statistic becomes: proportion of simulated maximized local
likelihood ratio test statistics exceeding that
N1,in N1,out of the observed data (i.e., the proportion,
N1,in N1,out N1,in N1,out
I > under the random labelling null hypothesis,
Nin Nout Nin Nout
of measures of unusualness that are more
(16.3)
unusual than observed in the data).
This approach avoids the multiple testing

where N1,in and Nin = N0,in + N1,in denote problem encountered in Openshaw et al.s
the number of event locations and persons at (1988) GAM in the following way. The key
risk (number of event and control locations) lies in comparing the measure of unusualness
within the potential cluster, of the most likely cluster in the observed
respectively, and
N1,out and Nout = N0,out + N1,out for data (the maximized local likelihood ratio
outside of the potential cluster. By extending statistic) to the same value from each of a
the statistic with the inclusion of the indicator large number of data sets simulated under the
function I() we can limit attention to only null hypothesis. Each simulated assignment
windows where the observed rate inside the generates its own most likely cluster and
window exceeds that outside the window, associated local likelihood ratio statistic.
rather than including cold spots where the These values are independent of one another
rate inside the window is less than that since the simulated data sets are independent
outside the window. of one another, so the collection of maximum
local likelihood ratio statistics represents an different exercise than seeking the most likely
independent sample under the null hypothesis cluster of early sites. In some applications
and its histogram provides an estimate it is clear which events one wishes to find
of the null distribution of the maximized a cluster of (e.g., cases versus non-case
local likelihood ratio statistic. Note that this controls in epidemiology); in others it is not
approach compares the maximized likelihood as obvious and both questions are of interest.
statistic regardless of where it occurs rather Second, we must interpret the results in light
than comparing the measure of unusualness of the set of potential clusters considered.
at its observed location to the measures of Here, we only consider circular clusters and
unusualness at that same location. may miss more oblong or sinuous clusters,
We can contrast these two approaches perhaps following rivers. The most recent
by considering the questions answered by version of SaTScan incorporates elliptical
each. By comparing the observed measure potential clusters and recent methodological
of unusualness to the measure observed work by Assuno (2006) and Patil and Tallie
anywhere in the simulated data sets we (2004) further expand the set of potential
answer How unusual does our most likely clusters at increasing computational cost. The
cluster appear compared to how unusual the impact of expanding the set of potential
most likely cluster appears under the null clusters on the statistical power of detection
hypothesis? If we compare the observed for subsets of this class remains to be studied
measure at a particular location to the in detail. For instance, it is not known
observed measure at that location in each to what extent including both circular and
of the simulated data sets, we answer elliptical clusters might reduce the power to
How unusual does our most likely cluster detect only elements of the subset of circular
appear compared to any other cluster at clusters.
this location? The first question represents
a single question particular to the most likely
cluster but the second is particular to a
16.7.2. Finding peaks and valleys:
location and radius. Openshaw et al.s (1988)
Estimating the spatial
GAM and similar methods essentially ask the
intensity
second question for each location and radius
which generates multiple hypothesis tests and The spatial scan statistic is appealing, but
complicates inference, again illustrating the is limited to the set of potential clusters.
importance of Figure 16.1. A more general approach involves estimation
To illustrate the spatial scan statistic, of the intensity function associated with a set
Figure 16.6 shows the most likely cluster of of observed event locations. The conceptual
late sites in the Anasazi data by the thick, framework of a spatial point process views
dark circle and the most likely cluster of the set of observed locations as a realization
early sites by the thin, light circle. Neither of a random distribution in space. The next
is statistically significant. Even though the step involves estimating the local probability
most likely cluster of late sites consists of of an event occurrence and defining clusters
only one early site (on the edge), the late sites as areas where events appear to be most
outnumber the early sites in the data set so likely.
this is not a particularly unusual grouping of Kernel estimation is a popular approach
events. for estimating probability distributions and
A few items merit mention. First, note that has seen broad use in spatial analy-
seeking the most likely cluster of late sites is a sis as well (Bailey and Gatrell, 1995;
Anasazi sites
150
100
v
p = 0.583
50
p = 0.530
0
0 50 100 150
u
Figure 16.6 SaTScan results for the Anasazi data set. Thick, dark circles and p-values
correspond to the most likely clusters of late sites, and thin, light circles and p-values
correspond to the most likely clusters of early sites.
McLafferty et al., 2000; Diggle, 2003; typically a probability density function such
Eck et al., 2005). Conceptually, suppose as a bivariate Gaussian density or other
we place an equal amount of soft mod- function which integrates to one. At each
eling clay over each event location on of a fine grid of points, we sum the
our map. These will overlap for events kernel values associated with each observed
near each other and the resulting height event, yielding a smooth surface estimat-
of the entire surface represents our esti- ing the unknown intensity function. The
mate of the spatial intensity, higher in bandwidth (spatial extent) of each kernel
areas with many observed events, lower controls the overall amount of smoothness
in areas with few observed events. More in the estimated intensity surface with
precisely, we place a smooth, symmet- larger bandwidths corresponding to smoother
ric function (the kernel) over events, surfaces. Essentially, the kernel takes each
observation and spreads its influence over bandwidth of 15 distance units. Visually, we
a local area corresponding to the kernel observe some differences between the two
function. intensity estimates, such as a more distinct
Mathematically, suppose x denotes the gap between site intensity for the late period
vector location of N events (x1 , x2 , . . . , xN ), (right-hand plot) in the northern third of
and x denotes any location within our study the study area, and perhaps an additional
area A. The kernel estimate of the intensity mode for the early period (left-hand plot) in
l(x) is: the southwestern section of the study area.
Such conclusions must be interpreted with
caution however, since they are dependent

1
N
x xi upon the bandwidth used for estimation. In
l(x) = Kern (16.4)
|A|b b this illustration we use the same bandwidth in
i=1
both plots to facilitate numerical comparisons
between them in the next subsection, even
where |A| denotes the geographic area of our though the two time periods contain different
study area A, Kern() is a kernel function sample sizes.
satisfying:

Kern(x) dx = 1 16.7.3. Comparing maps:
A Contouring relative risk
Intensity estimates provide a descriptive view
and b denotes the kernels bandwidth. of local variations in the probability of event
Figure 16.7 represents the two intensity occurrence. However, as mentioned above,
estimates for the Anasazi site data for a the interpretation of clustering depends on the
Late sites, bandwidth = 15 Early sites, bandwidth = 15

150
150
100
100
v
v
50
50
0
0 50 100 150 0 50 100 150

u u
Figure 16.7 Kernel estimates of the intensity functions for the patterns of late (left) and
early (right) sites for the Anasazi site data based on a bandwidth of 15 distance units.
(often spatially-varying) population at risk of or less likely than the other. In order
an event. That is, we are often more interested to use this approach to detect clusters,
in spatial variations in the risk (probability) we seek peaks or valleys in the surface.
of an event rather than in spatial variations To assess statistical significance, the next
in the actual numbers of events. For crime step is to decide whether the peaks and
data, we often do not have point-level valleys are more extreme than one would
population data or samples of the locations expect to observe under a null hypothesis.
of control individuals not experiencing the Kelsall and Diggle (1995) propose using
crime under study, and intensity analysis random labeling simulations to determine
concludes with interpretation of the intensity local clusters. Suppose we have n0 type 0
function of events (Eck et al., 2005). In events and n1 type 1 events. Conditional
other fields, such comparison patterns are on the complete set of observations of both
more readily available, and we next con- types of events, we randomly assign n0 of
sider statistical identification of clusters via the events to be type 0, the rest to be type 1,
comparisons between two estimated intensity and calculate r(g) for a grid of locations
functions. g = (g1 , g2 , . . . , gG ). We repeat the random
Suppose we have two types of events labeling a large number of times providing
(events and controls, early or late sites, etc.). a large number of r(gi ) values for each gi
Bithell (1990), Lawson and Williams (1993), in our grid, under the random labeling null
and Kelsall and Diggle (1995) propose hypothesis. If the value of r(gi ) based on the
approaches for comparing kernel estimates observed data is more extreme than the 2.5th
from each type of event, say l0 and l1 . or 97.5th percentiles of the values based on
Kelsall and Diggle (1995) examine the the simulation, we mark the location on the
surface generated by the natural logarithm of map. We note that this approach provides
the ratio of the two intensity functions: pointwise inference, not overall inference
due to the multitude of grid points and the
correlation between values of r(g) induced by
l1 (x) the kernel function (nearby estimates share
r(x) = log
l0 (x) the same data).
Figure 16.1 provides a basis for com-
parison between the spatial scan statistic
for any location x in our study area A. and the log relative risk surface. The scan
To borrow a term from epidemiology, the statistic addresses the question Where is the
ratio of the two intensity functions reflects most unusual collection of cases and how
the relative risk, and the log transformation unusual is it compared to what would be
places the ratio on a more symmetric scale expected of the most unusual collection under
around its null value (0.0 on the log scale). the null hypothesis? The log relative risk
Kelsall and Diggle (1995) point out technical surface addresses: Where are different types
and practical reasons for using the same of events more or less likely than others and
bandwidth for both kernel estimates, pri- how do these differences compare to what
marily to avoid confounding the smoothness we would expect under the null hypothesis?
of the r(x) surface by differences in the One important distinction between these
underlying smoothness of the two intensity two questions is the emphasis on a single
estimates. cluster in the first and the emphasis on
The log relative risk surface r(x) illustrates the entire log relative risk surface in the
areas where events of each type are more second. For instance, a focus on a single
cluster ignores the size, number, and location To illustrate the approach, Figure 16.8
of other local peaks and valleys across illustrates the log relative risk of late versus
the surface. Also, if we were to use the early sites based on the kernel intensity
pointwise interval inference to identify a estimates shown in Figure 16.7. On the
single most likely cluster from the log contour plot, we indicate grid points with
relative risk surface we would fall into local relative risk estimates falling above
the same multiple inference problem as and below the 95 percent tolerance intervals
discussed above for GAM-type methods. (defined by random labeling) by + and
Instead, we should think of the collection symbols, respectively. We see locally statis-
of pointwise intervals as a general guide tically significant increases in the relative
to describe the variability (under the null probability of late versus early sites in the
hypothesis) of the estimated log relative risk north-central area mentioned in our discus-
surface across the study area, and draw sion of Figure 16.7.
attention to locations where the estimated log How can we reconcile the locally sig-
relative risk surface wanders outside of these nificant cluster shown in Figure 16.8 with
bounds. Leong (2005) recently proposed the non-significant most likely cluster found
and compared several approaches to move by the spatial scan statistic in Figure 16.6?
from pointwise to simultaneous intervals Closer examination of Figure 16.6 reveals
around such log relative risk functions in that the collection of late sites (filled circles)
one dimension and extensions to higher driving the cluster identified in the log
dimensions would provide a stronger basis relative risk plot is an oblong concentration
for inference. of late sites in the north central portion of
Log relative risk surface Log relative risk, bandwidth = 15

0.5
150
Log
++0.5
+
0 ++
++
++
++
++
++
++
+
rela
+
+++
+++
++
++++
+++
t
ive r
100
0

v
isk

0
50
+0+.++
v
+++++5+
+ 0.5
++
++
++
+++
+++
++++
++ +2+++
0 +
+1
u
0 50 100 150
u
Figure 16.8 Log relative risk surface comparing the probabilities of late sites versus that of
early sites for the Anasazi site data based on a bandwidth of 15 distance units. On the
contour plot, + denotes a point exceeding the upper 95 percent pointwise tolerance limits
and a point exceeding the lower 95 percent limit (see text).
the study area. This concentration would not This work is supported in part by
be considered among the circular potential grant NIEHS R01 ES007750. The opin-
clusters we used in our application of the ions expressed herein are solely those of
spatial scan statistic. The example illustrates the author and may not reflect those of
the importance of understanding the types the National Institutes of Health or the
of clusters evaluated by a particular method National Institute of Environmental Health
when comparing results between different Sciences.
approaches. In addition, the most likely
clusters identified by the spatial scan statistic
do not appear as unusual peaks in the
log relative risk surface since (as with
the scan statistic) there is not a strong REFERENCES
excess of early or late sites in these
Anselin, L. (1995). Local indicators of spatial
locations. association: LISA. Geographical Analysis, 27:
93116.
Assuno, R., Costa, M., Tavares, A. and Ferreira, S.
(2006). Fast detection of arbitrarily shaped
16.8. CONCLUSIONS disease clusters. Statistics in Medicine, 25:
723742.
The sections above illustrate the importance Bailey, T.C. and Gatrell, A.C. (1995). Interactive
of understanding what sort of spatial patterns Spatial Data Analysis. New York: John Wiley and
statistical approaches investigate in studies Sons.
to detect clusters and/or clustering. The data Barnard, G.A. (1963). Contribution to the discussion
set provides an interesting example where of Professor Bartletts paper. Journal of the Royal
we observe no significant clustering but a Statistical Society, Series B, 25: 294.
significant cluster, provided we examine a Bartlett, M.S. (1964). The spectral analysis of
broad enough class of potential clusters. two-dimensional point processes. Biometrika, 51:
Figure 16.1 illustrates that the example is 299311.
not simply a situation of applying multiple Besag, J. (1977). Discussion of Modeling spatial
methods until we get the answer we desire, patterns by B.D. Ripley. Journal of the Royal
but rather an example of the sorts of patterns Statstical Society, Series B, 39: 192225.
not considered by many common summaries Besag, J. and Newell, J. (1991). The detection of
of spatial pattern, and how some potentially clusters in rare diseases. Journal of the Royal
interesting patterns may be missed by some Statistical Society, Series A, 154: 327333.
methods. Bithell, J. (1990). An application of density estimation
to geographical epidemiology. Statistics in Medicine,
9: 691701.
Chainey, S. (2005). Methods and techniques for
ACKNOWLEDGMENTS understanding crime hot spots. In: Mapping Crime:
Understanding Hot Spots. Eck, J.E., Chainey, S.,
Cameron, J.G., Leitner, M. and Wilson, R.E.
Thanks to John Richardson, a toxicologist (eds.), National Institute of Justice Report NCJ
for US EPA Region IV, who provided the 209393. Washington DC: United States Depart-
initial sketch that became Figure 16.1. In ment of Justice, Ofce of Justice Programs,
a simple diagram, he provided a summary pp. 1534.
of many important issues relating to the Cliff, A.D. and Ord, J.K. (1973). Spatial Autocorrelation.
cluster/clustering detection problem. London: Pion.
Cressie, N.A.C. (1993). Statistics for Spatial Data, Spatial and Space-time Scan Statistics. Bethesda,
Revised Edition. New York: John Wiley and MD: National Cancer Institute.
Sons.
Kulldorff, M., Tango, T. and Park, P.J. (2003). Power
Cuzick, J. and Edwards, R. (1990). Spatial clustering comparisons for disease clustering tests. Statistics in
for inhomogeneous populations (with discussion). Medicine, 42: 665684.
Journal of the Royal Statistical Society, Series B, 52:
Langworthy, R.H. and Jefferis, E.S. (2000). The utility of
73104.
standard deviation ellipses for evaluating hot spots.
Denison, D. and Holmes, C. (2001). Bayesian In: Analyzing Crime Patterns: Frontiers of Practice.
partitioning for estimating disease risk. Biometrics, Goldsmith, V., McGuire, P.G., Mollenkopf, J.H.
57: 143147. and Ross, T.A. (eds.), Thousand Oaks, CA: Sage
Publications, Inc.
Diggle, P.J. (2003). Statistical Analysis of Spatial
Point Patterns, Second Edition. New York: Oxford Lawson, A.B. (2001). Statistical Methods in Spatial
University Press. Epidemiology. Chichester: John Wiley & Sons.
Eck, J.E., Chainey, S., Cameron, J.G., Leitner, M. Lawson, A.B. and Denison, D.G.T. (2002). Spatial
and Wilson, R.E. (2005). Mapping Crime: Under- Cluster Modelling. Boca Raton FL: Chapman &
standing Hot Spots. National Institute of Justice Hall/CRC.
Report NCJ 209393. Washington DC: United
Lawson, A.B. and Williams, F.L.R. (1993). Applications
States Department of Justice, Ofce of Justice
of extraction mapping in environmental epidemiol-
Programs.
ogy. Statistics in Medicine, 12: 12491258.
Elliott, P., Cuzick, J., English, D. and Stern, R. (1992). Leong, T. (2005). First- and second-order properties of
Geographical and Environmental Epidemiology: spatial point processes in biostatistics. Unpublished
Methods for Small-Area Studies. Oxford: Oxford Ph.D. dissertation, Department of Biostatistics,
University Press. Rollins School of Public Health, Emory University.
Elliott, P., Wakeeld, J.C., Best, N.G. and Briggs, D.J. Atlanta, GA.
(1999). Spatial Epidemiology: Methods and Applica- McLafferty, S., Williamson, D. and McGuire, P.G.
tions. Oxford: Oxford University Press. (2000). Identifying crime hot spots using kernel
Goldsmith, V., McGuire, P.G., Mollenkopf, J.H. and smoothing, In: Analyzing Crime Patterns: Frontiers
Ross, T.A. (2000). Analyzing Crime Patterns: of Practice. Goldsmith, V., McGuire, P.G.,
Frontiers of Practice. Thousand Oaks, CA: Sage Mollenkopf, J.H. and Ross, T.A. (eds.), Thousand
Publications, Inc. Oaks, CA: Sage Publications, Inc.
Gumerman, G.J. (1970). Black Mesa: Survey and Mller, J. and Waagepetersen, R. (2002). Statistical
Excavation in Northeastern Arizona, 1968. Prescott Inference and Simulation for Spatial Point Patterns.
College Press. Boca Raton, FL: Chapman & Hall/CRC.
Gumerman, G.J., Westfall, D. and Weed, C.S. (1972). Openshaw, S., Craft, A.W., Charlton, M. and Birch, J.M.
Archaeological Investigations on Black Mesa: The (1988). Investigation of leukaemia clusters by use of
19691970 Seasons. Prescott College Press. a geographical analysis machine. Lancet, 1 (8580):
272273.
Kelsall, J. and Diggle, P.J. (1995). Non-parametric
estimation of spatial variation in relative risk. Patil, G.P. and Taillie, C. (2004). Upper level
Statistics in Medicine, 14: 23352342. set scan statistic for detecting arbitrarily shaped
hotspots. Environmental and Ecological Statistics,
Knorr-Held, L. and Raer, G. (2000). Bayesian detection 11: 183197.
of clusters and discontinuities in disease maps.
Biometrics, 56: 1321. Plog, S. (ed.) (1986). Spatial Organization and
Exchange: Archaeological Survey on Northern Black
Kulldorff, M. (1997). A spatial scan statistic. Com- Mesa. Southern Illinois University Press.
munications in Statistics: Theory and Methods, 26:
14871496. Powell, S. and Smiley, F.E. (2002). Prehistoric Culture
Change on the Colorado Plateau: Ten Thousand
Kulldorff, M. and Information Management Services, Years on Black Mesa. Tucson AZ: The University of
Inc. (2002). SaTScan v. 3.0: Software for the Arizona Press.
Ripley, B.D. (1977). Modeling spatial patterns (with incidence in upstate New York. American
discussion). Journal of the Royal Statistical Society, Journal of Epidemiology, 132, supplement:
Series B, 39: 172212. S136S143.
Tobler, W. (1970). A computer movie simulating urban Waller, L.A. and Jacquez, G.M. (1995). Disease models
growth in the Detroit region. Economic Geography, implicit in statistical tests of disease clustering.
46: 234240. Epidemiology, 6: 584590.
Turnbull, B.W., Iwano, E.J., Burnett, W.S., Waller, L.A. and Gotway, C.A. (2004). Applied Spatial
Howe, H.L. and Clark, L.C. (1990). Monitoring for Analysis of Public Health Data. Hoboken NJ: John
clusters of disease: Application to leukemia Wiley & Sons.
17
Bayesian Spatial Analysis
Andrew B. Lawson and Sudipto Banerjee
17.1. INTRODUCTION by Cressie (1993), Chils and Delfiner

(1999), Mller and Waagpetersen (2004),
Spatially referenced data occur in diverse Schabenberger and Gotway (2004), and
scientific disciplines including geological and Banerjee et al. (2004) for a variety of
environmental sciences (Webster and Oliver, methods and applications.
2001), ecological systems (Scheiner and With recent advances in computational
Gurevitch, 2001), disease mapping (Lawson, methods (particularly in the area of Monte
2006) and in broader public health contexts Carlo algorithms), it is now commonplace
(Waller and Gotway, 2004). Very often, such to be able to incorporate spatial correlation
data will be referenced over a fixed set of as an important modeling ingredient. It is
locations in a region of study. These locations now feasible to fit routinely linear models
can be with regions or areas with well-defined with a variety of features within a modeling
neighbors (such as pixels in a lattice, counties hierarchy. With the implementation of fast
in a map, etc.), whence they are called areally algorithms such as Markov Chain Monte
referenced or lattice data. Alternatively, they Carlo (MCMC), sophisticated models that
may be simply points with coordinates were previously inaccessible are now within
(latitudelongitude, EastingNorthing etc.), reach allowing us to move beyond the
in which case they are called point refer- simpler, and often inadequate, descriptive
enced or geostatistical. Statistical theory and measures for analyzing spatial structure.
methods to model and analyze such data Spatial analysis can be viewed in a number
depend upon these configurations and has of ways. For the statistician, there are two
enjoyed significant developments over the basic approaches to statistical modeling and
last decade; see, for example, the books inference: frequentist or likelihood based
inference, and Bayesian inference. Here settings. When the referencing is done using
we focus on the latter approach. Bayesian coordinates (latitudelongitude, Easting
inference and modeling can be seen as an Northing, etc.) over a domain D, we denote
extension of likelihood methods, but it also it as s D; for instance in two-dimensional
has a fundamentally different view of the domains we have s (sx , sy ). The most
inferential process. frequently encountered scenario observes
a spatial field measured at a finite set
of locations, say S = {s1 , . . . , sn }.
We usually name this a random field,
17.2. NOTATION which we denote as {w(s) : s D} or
simply as w(s) in short. A realization
The following notation will be used through- of this random field will be a vector
out this chapter. A random variate is denoted w = (w(s1 ), . . . , w(sn )).
yi , for an item in a vector. The vector of
these items is y. Often y will be related to
independent variables (such as in a linear
model). In that case the matrix of such 17.2.2. Health data notation
variables can be defined as X. A linear model For health data discussed in this chapter we
can be defined, for a single independent will confine ourselves (mostly) to examining
variable x1 as: count data arising within small arbitrary
administrative areas (such as census tracts,
yi = 0 + 1 x1i + ei . zip codes, postcodes, counties). Define yi as
the count of disease within the ith small area.
Assume that i = 1, . . . , m. For this we need
In general, the matrix formulation of the to define a relative risk for the ith region: i .
model, where i = 1, . . . , n will be: We usually want to make inferences about the
relative risk, in any study.
y = X + e (17.1) We also usually have available an expected
rate for the ith region: ei . Often the count
within the regions will have a Poisson
where y is an n 1 vector of the dependent
distribution, i.e., yi Pois(ei i ).
variable, X is an np matrix of p independent
predictors (or covariates), is a p 1
parameter vector of the corresponding slopes
and e is an n 1 vector of the errors. Often 17.3. LIKELIHOOD AND BAYESIAN
we make distributional assumptions, such as MODELS
e N(0, ) These expressions imply that
the errors are normally distributed with a 17.3.1. Likelihood
zero-vector, 0, as the mean and a covariance
matrix . A random variable X is usually associ-
ated with a distribution which governs its
behavior. We denote this distribution as
f (x | ) where is a parameter. In general,
17.2.1. Point-referenced spatial
could be a vector of parameters and so
data notation
is denoted . In this case we have f (x | ).
As we will be dealing with spatial data, we When a random sample of values of X are
will require some notation specific to such taken {xi , i = 1, . . . , n} then the likelihood is
BAYESIAN SPATIAL ANALYSIS 323
defined as the joint distribution of the sample unobserved effects as random variables, the
values: hierarchical Bayesian approach to statistical
analysis provides a cohesive framework for
(
n combining complex data models and external
f (x | ) = f (xi | ). (17.2) knowledge or expert opinion (e.g., Berger,
i=1 1985; Carlin and Louis, 2000; Robert, 2001;
Gelman et al., 2004; Lee, 2005) In this
approach, in addition to specifying the distri-
It is assumed that conditional on the
butional model f (y | ) for the observed data
sample values are independent. If this were
y = ( y1 , . . . , yn ) given a vector of unknown
not so, then we would require to take
parameters = (1 , . . . , k ), we suppose
the product of conditional distributions in
that is a random quantity sampled from a
equation (17.2). When using the frequentist
prior distribution p( | l), where l is a vector
inferential process it is important to base
of hyperparameters. Inference concerning
decisions about parameters (estimation of
is then based on its posterior distribution:
parameter values or confidence intervals) on
the likelihood function. Maximum likelihood
estimation seeks point estimates of the p(y, | l) p(y, | l)
parameters in by maximising f (x | ) or p( | y, l) = =)
p(y | l) p(y, | l) d
log f (x | ). Testing and interval estimation
is often based on likelihood ratios derived f (y | )p( | l)
=) . (17.3)
for different values of under different f (y | )p( | l) d
hypotheses. Inference for quantities such as
confidence intervals is based on the concept
of repeated experimentation, in that probabil- Notice the contribution of both the data
ity statements are derived based on properties (in the form of the likelihood f (y | )) and
of repeated sequences of experiments. the external knowledge or opinion (in the
form of the prior p( | l)) to the posterior.
If l is known, this posterior distribution is
fully specified; if not, a second-stage prior
17.4. BAYESIAN INFERENCE distribution (called a hyper-prior) may be
specified for it, leading to a fully Bayesian
Fundamental philosophical differences with analysis. Alternatively, we might simply
the frequentist approach are found when a replace l by an estimate l obtained as
Bayesian perspective is assumed. First of the value which maximizes the marginal
all, parameters within Bayesian models are distribution p(y | l) viewed as a function of l.
assumed to be random variables and hence Inference proceeds based on the estimated
are governed by distributions themselves. posterior distribution p( | y, l), obtained by
Hence, there is no longer a fixed (true) value plugging l into equation (17.3). This is called
for a given parameter. Instead an expected an empirical Bayes analysis and is closer to
value or other functional of a distribution maximum likelihood estimation techniques.
can be defined. Because parameters have The Bayesian decision-making paradigm
distributions then the likelihood previously improves on the classical approaches to
defined must be extended to accommodate statistical analysis in its more philosophically
these distributions. sound foundation, its unified approach to
By modeling both the observed data data analysis, and its ability to formally
and any unknown parameter or other incorporate prior opinion or external
empirical evidence into the results via the and Casella, 2005). Univariate MCMC
prior distribution. Statisticians, formerly algorithms are particularly attractive for
reluctant to adopt the Bayesian approach general purpose implementation, since all
due to general skepticism concerning that is required is the ability to sample
its philosophy and a lack of necessary easily from each parameters complete con-
computational tools, are now turning to ditional distribution, namely p(i | y, j=i ),
it with increasing regularity as classical i = 1, . . . , k. The recently developed
methods emerge as both theoretically and WinBUGS language (www.mrc-bsu.
practically inadequate. Modeling the i s as cam.ac.uk/bugs/welcome.shtml)
random (instead of fixed) effects allows us and the R statistical platform (www.
to induce specific (e.g., spatial, temporal or r-project.org) with its Bayesian
more general) correlation structures among packages are promising steps towards
them, hence among the observed data yi as a general purpose software package for
well. Hierarchical Bayesian methods now hierarchical modeling, though it may be
enjoy broad application in the analysis of insufficiently general in some advanced
complex systems, where it is natural to pool analysis settings, and in any case more work
information across different sources e.g., is needed before it is suitable for routine use
Gelman et al. (2004). by statistical support staff.
Modern Bayesian methods seek complete Statistical prediction in Bayesian settings
evaluation of the posterior distribution using is particularly elegant and intuitive. Let
simulation methods that draw samples from ypred denote the random variables (they
the posterior distribution. This sampling- can be a collection) we seek to predict.
based paradigm enables exact inference Then, we simply treat ypred as a random
free of unverifiable asymptotic assumptions variable whose prior, conditional upon the
on sample sizes and other regularity parameters, is the data likelihood f (y | ).
conditions. A computational challenge in Then, all predictions will be summarized in
applying Bayesian methods is that for many the posterior predictive distribution:
complex systems, the simulations required

to do inference under equation (17.3)
p(ypred | y) = f (ypred | )p( | y) d .
generally involve distributions that are
intractable in closed form, and thus one
needs more sophisticated algorithms to
sample from the posterior. Forms for Once the posterior samples are available
the prior distributions (called conjugate from p( | y), it is routine to draw samples
forms) may often be found which enable from p(ypred | y) using the principle of
at least partial analytic evaluation of these composition: for each posterior draw of , we
distributions, but in the presence of nuisance draw ypred from f (ypred | ). Details of such
parameters (typically unknown variances), methods are particularly well explained in the
some intractable distributions remain. Here texts by Carlin and Louis (2000) and Gelman
the emergence of inexpensive, high-speed et al. (2004).
computing equipment and software comes
to the rescue, enabling the application of
17.4.1. Posterior sampling
recently developed MCMC integration
methods
methods, such as the MetropolisHastings
algorithm (Hastings, 1970) and the Gibbs Practical Bayesian modeling relies upon
sampler (Geman and Geman, 1984; Robert efficient computation of the posterior
distribution of the parameters. As mentioned dependent on the data. A compact notation

above, the main computational challenge for this model is:
lies in evaluating the integral in the
denominator of equation (17.3). This is
especially compounded when is multi- yi | Pois(ei )
dimensional. Hence, instead of designing
multi-dimensional integration routines, G(, ).
even the best of which can easily prove
inadequate for several practical settings, Here, the posterior distribution is again a
we focus upon sampling from the posterior Gamma and one can sample from it by
distribution, also known as simulating the simply employing a Gamma random number
posterior distribution. Once a posterior generator.
sample is obtained, all inference summaries Another useful mechanism for posterior
(e.g., point estimates and credible intervals) simulations when the posterior distribution
are calculated using the sample. In principle, is not a standard family arises from the
this strategy works equally well for simpler principle of composition. This essentially
models where the posterior distribution observes that the joint posterior distribu-
is a standard family as well as for very tion of two arbitrary parameter vectors,
complex hierarchical models where the say 1 and 2 can be expressed as
posterior distribution is highly complex. P( 1 , 2 | y) = P( 1 | y)P( 2 | 1 , y). To
Depending upon the complexity of the obtain samples from the above joint posterior
posterior distribution, the sampling strategies ( j)
distribution, we first sample 1 from the
will vary: with a standard family we can marginal posterior distribution P( 1 | y) and
directly draw a random sample, while ( j)
then sample a 2 from the conditional
with complex families more elaborate ( j)
posterior distribution P( 2 | 1 , y)). Repeat-
MCMC algorithms (see below) may be
ing this for j = 1, . . . , M results in a joint
required. j ( j)
posterior sample ( 1 , 2 )Mj=1 of size M. We
Since the posterior distribution now
describes the behavior of the parameters illustrate this principle below using the linear
once the data are observed, we work with regression model mentioned in equation
this distribution for estimation and inference. (17.1) from a Bayesian perspective. Several
To obtain estimates of parameters this other examples can be found in the texts
distribution must be summarized. by Carlin and Louis (2000) and Gelman
A simple example of this type of model in et al. (2004).
disease mapping is where the data likelihood Let us suppose that we have data yi
is Poisson and there is a common relative from n experimental units, which forms our
risk parameter with a single gamma prior dependent variable. Suppose also that we
distribution: have observed p covariates, x1i , . . . , xpi , on
the ith individual. Using matrix notations,
we write:
p( | y) L(y | )g( )
y = X + e; e N(0, 2 I)
where g( ) is a gamma distribution with

parameters , , i.e., G(, ), and L(y | ) = where y is an n 1 vector of observations,
*m
i=1 {(ei ) exp(ei )} bar a constant only
yi
X is a n p matrix of independent
predictors with full column rank (we assume Following the principle of composition
independent columns so that covariates are sampling, we draw, say for j = 1, . . . , M,
not collinear), is a p1 vector of regression 2( j) IG(n p/2, (n p)s2 ) followed
coefficients, and e is the n 1 vector of by ( j) N(, 2j (X T X)1 ). This yields
uncorrelated normally distributed errors with our desired posterior sample ( ( j) , 2( j) )
common variance 2 . with j = 1, 2, . . . , M. Posterior confidence
To construct a Bayesian framework, we intervals and all inference will again be
will need to assign a prior distribution for carried out using these samples.
(, 2 ) in the above model. For illustration,
consider the non-informative or reference
prior distribution for (, 2 ):
17.5. HIERARCHICAL MODELS
1 The idea that the values of parameters could

P(, 2 ) .
2 arise from distributions is a fundamental
feature of Bayesian methodology and leads
naturally to the use of models where
This is equivalent to a flat or Uniform prior
parameters arise within hierarchies. In the
on (, 2 ). In hierarchical language we write
Poisson-gamma example there is a two level
the Bayesian linear regression model as:
hierarchy: has a G(, ) distribution at the
first level of the hierarchy and will have a
hyperprior distribution (h ) as will (h ), at
y | , 2 N(0, 2 I)
the second level of the hierarchy. This can be
1 written as:
, 2 P(, 2 ) .
2
yi | Pois(ei )
Simple computations (see, e.g., Gelman
et al., 2004, Section 14.2) reveal that the | , G(, )
marginal distribution p( 2 | y) is a scaled
Inv- 2 (n p, s2 ) distribution, which is the | h ()
same as the Inverse-Gamma distribution
IG((n p)/2, (n p)s2 /2) where: | h ().
1 Clearly it is important to terminate a

s2 = (y X )T (y X )
np hierarchy at an appropriate place, otherwise
one could always assume an infinite hierar-
chy of parameters. Usually the cut-off point
with = (X T X)1 X T y being the usual is chosen to lie where further variation in
least-squares estimate (also the MLE). The parameters will not affect the lowest level
distribution P( | 2 , y) is N(, 2 (X T X)1 ). model. At this point the parameters are
In fact, here the marginal posterior assumed to be fixed. For example, in the
distribution for P( | y) can be derived in gamma-Poisson model if you assume and
closed form as a multivariate-t distribution were fixed then the Gamma prior would
(see, e.g., Robert, 2001) but we outline the be fixed and the choice of and would be
sampling-based perspective. uninformed. The data would not inform about
the distribution at all. However, by allowing The basic algorithms used for this
a higher level of variation i.e., hyperpriors for construction are:
, , then we can fix the values of and
without heavily influencing the lower level
1 the Metropolis and its extension, the Metropolis
variation. This allows the data to inform more Hastings algorithm;
about the different parameters in the lower
levels of the hierarchy. 2 the Gibbs Sampler algorithm.
17.6. MARKOV CHAIN MONTE 17.6.1. Metropolis and

CARLO METHODS MetropolisHastings
algorithms
Markov chain Monte Carlo (MCMC) meth- In all MCMC algorithms, it is important to
ods are a set of methods which use iterative be able to construct the correct transition
simulation of parameter values within a probabilities for a chain which has P( | y) as
Markov chain. The convergence of this chain its equilibrium distribution. A Markov chain
to a stationary distribution, which is assumed consisting of 1 , 2 , . . . , t with state space
to be the posterior distribution, must be and equilibrium distribution P( | y) has
assessed. transitions defined as follows.
Prior distributions for the p components Define q( , ) as a transition probability
of are defined as gi (i ) for i = 1, . . . , p. function, such that, if t = , the vector t
The posterior distribution of and y is drawn from q(, ) is regarded as a proposed
defined as: possible value for t+1 .
(
P( | y) L(y | ) gi (i ). (17.4)
17.6.2. Metropolis and
i
MetropolisHastings
updates
The aim is to generate a sample from In this case choose a symmetric pro-
the posterior distribution P( | y). Suppose posal q(, ) and define the transition
we can construct a Markov chain with probability as:
state space c , where c k . The
chain is constructed so that the equilibrium

distribution is P( | y), and the chain should ( , )q(, ) if =

be easy to simulate from. If the chain is run p( , ) =
1 q(, )( , ) if =
over a long period, then it should be possible
to reconstruct features of P( | y) from the
realized chain values. This forms the basis

of the MCMC method, and algorithms are where (, ) = min 1, P( | y)/P( | y) .

required for the construction of such chains. In this algorithm a proposal is generated
A selection of recent literature on this area from q( , ) and is accepted with probability
is found in Ripley (1987), Besag and Green ( , ). The acceptance probability is a
(1993), Gelman et al. (2004), Gamerman simple function of the ratio of posterior
(2000) and Robert and Casella (2005). distributions as a function of values.
The proposal function q( , ) can be defined 17.6.3. Gibbs updates

to have a variety of forms but must be an
The Gibbs Sampler has gained consider-
irreducible and aperiodic transition function.
able popularity, particularly in applications
MetropolisHastings (MH) is an exten-
in medicine, where hierarchical Bayesian
sion to the Metropolis algorithm where
models are commonly applied (see, e.g.,
the proposal function is not confined to
Gilks et al. (1993)). This popularity is
symmetry and:
mirrored in the availability of software that
allows its application in a variety of problems
(e.g., WinBUGS, MLWin, BACC). This
+
P( | y) q( , ) sampler is a special case of the Metropolis
(, ) = min 1, .
P( | y) q(, ) Hastings algorithm where the proposal is
generated from the conditional distribution
of i given all other s, and the resulting
proposal value is accepted with probability 1.
Some special cases of chains are found
More formally, define:
when q(, ) has special forms. For
example, if q( , ) = q( , ) then the ,
original Metropolis method arises and p(j | j
t1
) = t1
if j
q(j , j ) = j
further, with q(, ) = q( ) (i.e., when no 0 otherwise
dependence on the previous value is
assumed) then:
where p(j | j
t1
) is the conditional distribu-
tion of j given all other values (j ) at time
+ t1. Using this definition it is straightforward
w( )
( , ) = min 1, to show that:
w()
q( , ) P( | y)
=
q( , ) P( | y)
where w( ) = P( | y)/q( ) and w(.) are
importance weights. One simple example of
and hence (, ) = 1.
the method is q( ) Uniform ( a , b ) and
gi (i ) Uniform ( ia , ib ) i; this leads to
an acceptance criterion based on a likeli-
hood ratio. Hence the original Metropolis
17.6.4. MH versus Gibbs
algorithm with uniform proposals and prior
algorithms
distributions leads to a stochastic exploration There are advantages and disadvantages
of a likelihood surface. This, in effect, leads to MH and Gibbs methods. The Gibbs
to the use of prior distributions as proposals. Sampler provides a single new value for
However, in general, when the gi (i ) are not each at each iteration, but requires the
uniform this leads to inefficient sampling. evaluation of a conditional distribution. On
The definition of q( , ) can be quite the other hand the MH step does not require
general in this algorithm and, in addition, the evaluation of a conditional distribution
posterior distribution only appears within a but does not guarantee the acceptance of
ratio as a function of and . Hence, the a new value. In addition, block updates
distribution is only required to be known up of parameters are available in MH, but
to proportionality. not usually in Gibbs steps (unless joint
conditional distributions are available). If simulations have reached the equilibrium

conditional distributions are difficult to distribution of the Markov chain. There are
obtain or computationally expensive, then a wide variety of methods now available
MH can be used and is usually available. to assess convergence of chains within
In summary, the Gibbs Sampler may MCMC. algorithms (ARS algorithm; Robert
provide faster convergence of the chain if the and Casella, 2005, pp. 5759) provide
computation of the conditional distributions recent reviews. The available methods are
at each iteration are not time consuming. largely based on checking the distributional
The MH step will usually be faster at each properties of samples from the chains.
iteration, but will not necessarily guarantee
exploration. In straightforward hierarchical
models where conditional distributions are
easily obtained and simulated from, then 17.7. MODEL GOF MEASURES
the Gibbs Sampler is likely to be favored.
In more complex problems, such as many It is inevitable that our statistical analysis
arising in spatial statistics, resort may be will entail the fitting and comparison of a
required to the MH algorithm. variety of models. For this purpose, we will
need to attend to issues concerning model
adequacy and model comparison. To compare
between the different models and perhaps
17.6.5. Special methods
help us choose those that provide better
Alternative methods exist for posterior sam- fits, we will use the Deviance Information
pling when the basic Gibbs or MH updates Criteria (DIC) (Spiegelhalter et al., 2002) as
are not feasible or appropriate. For example, a measure of model choice. The DIC has nice
if the range of the parameters is restricted theoretical properties for a very wide class of
then slice sampling can be used (Robert likelihoods since it provides an estimate of
and Casella, 2005, Ch. 7; Neal, 2003). goodness-of-fit and for model complexity and
When exact conditional distributions are not is particularly convenient to compute from
available but the posterior is log-concave posterior samples. This criterion is the sum of
then adaptive rejection sampling algorithms the Bayesian deviance (a measure of model
can be used. The most general of these algo- fit) and the (effective) number of parameters
rithms (ARS algorithm; Robert and Casella, (a penalty for model complexity). It rewards
2005, pp. 5759) has wide applicability for better fitting models through the first term
continuous distributions, although they may and penalizes more complex models through
not be efficient for specific cases. Block the second term, with lower values indicating
updating can also be used to effect in some favorable models for the data. The deviance,
situations. When generalized linear model up to an additive quantity not depending
components are included then block updating upon the parameters , is simply minus twice
of the covariate parameters can be effected the log-likelihood, D( ) = 2 log f (y | ),
via multivariate updating. where f (y | ) is the first stage likelihood for
the respective model. The Bayesian deviance
is the posterior mean, D( ) = E | y [D( )],
while the effective number of parameters is
17.6.6. Convergence
given by pD = D( ) D( ). The DIC is then
MCMC methods require the use of given by D() + pD and is easily computed
diagnostics to assess whether the iterative from the posterior samples.
We also often use predictive fits to 17.8. UNIVARIATE SPATIAL

assess model performance using the posterior PROCESS MODELS
predictive distributions. We will employ the
posterior predictive loss approach (Gelfand
17.8.1. Ingredients of a Gaussian
and Ghosh, 1998) to identify models pro-
process
viding the best fit. The actual computations
are very similar to the predictive paradigm As briefly mentioned in the Introduction,
discussed towards the end of Section 17.2. modeling of point-referenced spatial data
Here, for any given model, if is the typically proceeds from a spatial random field
set of parameters, the posterior predictive {w(s) : s D}, where D is typically an open
distribution of a replicated data set is subset of d where d is the dimension; in
given by: most practical settings d = 2 or d = 3.
We say that a random field is a valid spatial
process if for an any finite collection of
P(yrep | y) = P(yrep | ) P( | y) d sites S = {s1 , . . . , sn } of arbitrary size, the
vector w = (w(s1 ), . . . , w(sn )) follows a
well-defined joint probability distribution.
where P(yrep | ) has the same distribution For the practical spatial modeller, the most
as the data likelihood. Replicated data common specification is a Gaussian Random
sets from the above distribution are Field (GRF) or a Gaussian Process (GP),
easily obtained by simulating a replicated which additionally specifies that w follows
data set from the above distribution. a multivariate normal distribution.
Preferred models will perform well To be more specific, we write
under a decision-theoretic balanced loss w(s) GP((s), C()) which is a Gaussian
function that penalizes both departure from Process with a mean function (s), i.e.,
corresponding observed values (lack of E[w(s)] = (s), and a covariance function
fit), as well as from what we expect the Cov(w(s), w(s )) = C(s, s ). This specifies
replicates to be (variation in replicates). the joint distribution for a collection of
Measures for these two criteria are sites s1 , . . . , sn as w N(, ), where
evaluated as G = (y rep )T (y rep ) and = ((si ))ni=1 is the corresponding n 1
P = tr (Var (yrep ) | y), where rep = E[yrep | y] mean vector and w = [C(si , sj )] is the
is the posterior predictive mean for the n n covariance matrix with (i, j)th element
replicated data points, and P is the trace of given by C(si , sj ).
the posterior predictive dispersion matrix for Clearly the covariance function cannot
the replicated data; both of these are easily be just any function: it needs to ensure that
computed from the samples drawn. Gelfand the resulting w matrix is symmetric and
and Ghosh (1998) suggest using the score positive definite. Symmetry is guaranteed
D = G + P as a model selection criterion, as long as C(s, s ) is symmetric in its
with lower values of D indicating better arguments, while functions that ensure
models. the positive-definiteness are known as
Using these formal statistical methods, we positive definite functions. The important
will be able to enhance the accuracy of characterization of such functions, at least
the outputs of computer models, compare from a modelers perspective, says that a
between them to validate an underlying real-valued function is a valid covariance
scientific hypothesis and provide predictions function if and only if it is the characteristic
of complex systems. function of a symmetric random variable
(this is derived from a famous theorem due and identically distributed as N(0, 2 ), where
to Bochner). Further technical details about 2 is a measurement error variance or micro-
positive definite functions can be found in scale variance. The key to incorporating
Cressie (1993), Chils and Delfiner (1999) spatial association is by modeling w(s) as
and Banerjee et al. (2004). a Gaussian Process with spatial variance
Since it is common for spatial data to 2 and a valid correlation function (, )
consist of single observations from a site, with representing parameters that quantify
we often need to assume stationary or correlation decay and smoothness of the
isotropic processes for ensuring estimable resulting spatial surface.
models. Stationarity, in spatial modeling When we have observations, y =
contexts, refers to the setting when (Y (s1 ), . . . , Y (sn )), from n locations, we
C(s, s ) = C(s s ); that is, the covariance treat the data as a partial realization of
function depends upon the separation of a spatial process, modeled through w(s).
the sites. Isotropy goes further and specifies Hence, w(s) GP(0, 2 (, )), is a
C(s, s ) = C(s s ), where s s is zero-centered Gaussian Process with
the distance between the sites. Furthermore, variance 2 and a valid correlation function
we will parametrize the covariance function (d, ), which depends upon inter-site
as C(s s ) = 2 (s s ), where (s s ) distances (dij = si sj ) and a parameter
is called a correlation function and 2 is quantifying correlation decay. Also, we
a spatial variance parameter. In particular, assume (s) are i.i.d. N(0, 2 ). Inferential
we will use the the isotropic exponential goals include estimation of regression
correlation function (d, ) = exp (d), coefficients, spatial and nugget variances,
with d = s s . and the strength of spatial association thro-
ugh distances. Likelihood-based inference
proceeds from the distribution of the data,
17.8.2. Bayesian spatial regression y N(X, ), with = 2 R() + 2 I,
and kriging where X is the covariance matrix and R()
is the correlation matrix with Rij = (dij , ).
There is an expanding literature on modeling See Cressie (1993) for details, including
point-referenced spatial data. The most com- maximum-likelihood and restricted maximum-
mon setting assumes a response or dependent likelihood methods, and Banerjee et al.
variable Y (s) observed at a generic location s, (2004) for Bayesian estimation.
referenced by a coordinate system (e.g., Statistical prediction (kriging) at a new
UTM or latlong), along with a vector of location s0 proceeds from the conditional
covariates x(s). One seeks to model the distribution of Y (s0 ) given the data y.
dependent variable in a spatial regression Collecting all the model parameters into
setting such as: = (, 2 , 2 , , ), we note that
Y (s) = xT (s) + w(s) + (s). (17.5)

E[Y (s0 ) | y] = x(s0 )T + T 1 (y X)
The residual is partitioned into a spatial
process, w(s), capturing residual spatial (17.6)
association, and an independent process,
(s), also known as the nugget effect, Var [Y (s0 ) | y] = 2 + 2 T 1
modeling pure errors that are independently (17.7)
where = ( 2 (; d01 ), . . . , 2 (; d0n )) When we want to capture spatial and

and d0j = s0 sj . Classical prediction temporal associations, modeling is accom-
computes the BLUP (Best Linear plished by envisioning a spatial process
Unbiased Predictor) by substituting evolving through time. The literature in
maximum-likelihood estimates for the spatiotemporal models is quite rich (see, e.g.,
above parameters. A Bayesian solution Cressie, 1993; Banerjee et al., 2004, and
first computes a posterior distribution the references therein). Essentially, modeling
P( | y), where = (, 2 , 2 , ) is the proceeds from a spatiotemporal process
collection of all model parameters and w(s, t) in the above context, where s denotes
then computes the posterior predictive the location, and t denotes time. Of course,
distribution P(Y (s0 ) | y) by marginalizing appropriate assumptions on the covariance
over (averaging
) over) the posterior function associated with w(s, t) have to be
distribution, P(Y (s0 ) | y, ) P( | y). made. A popular covariance specification for
A Bayesian framework is convenient here, spatiotemporal models is separability, which
driving inference assisted by proper and models spatiotemporal correlation functions
moderately informative priors on the weakly as a product of a purely spatial and a
identified correlation function parameters. purely temporal covariance function. These
For example, for the smoothness parameter and other more general specifications may be
in the Matrn covariance, , we can follow found in Banerjee and Johnson (2006).
Stein (1999) in assuming that the data
cannot distinguish = 2 and > 2, which
suggests placing a Unif (0, 2) prior on .
17.8.3. Illustration
Usually a MCMC algorithm is required to
obtain the joint posterior distribution of the Interest lies in predicting the relative den-
parameters, but again there are different sity of eastern hemlock across the Bartlett
strategies to opt for. For example, we may Experimental Forest. Basal area per hectare1
work with the marginalized likelihood as of all tree species was estimated at each of
above, y | N(X, 2 H() + 2 I), or we 438 forest inventory plots distributed across
may add a hierarchy with spatial random the domain of interest. The response variable
effects, w = (w(s1 ), . . . , w(sn )): is the fraction of estimated eastern hemlock
basal area per hectare. Covariates include
elevation and six spring and fall Tasseled Cap
y | , w N(X + w, 2 I) spectral components that were derived from
Landsat satellite images (Kauth and Thomas,
w N(0, 2 R()).
1976).
A spatial regression model (as in
equation (17.5)) was fitted to the data.
In either framework, a Gibbs sampler may We employed flat priors for the regression
be designed, with embedded Metropolis or estimates and, based on estimates
slice-sampling steps, to obtain the marginal from initial descriptive analyses including
posterior distribution (see, e.g., Banerjee variograms (see, e.g., Banerjee et al., 2004),
et al., 2004). Much more complex hierarchi- we used inverted-gamma IG(2, 0.01) for both
cal models have been discussed extensively the spatial variance 2 and the measurement
in the spatial literature but, irrespective of error variance 2 . The maximum distance
their complexity, they mostly fit into the between inventory plots is 4834.81 meters,
template we outlined above. so a uniform prior on was set so that the
effective range was less than 3000 meters. the exponential correlation function this
Using these priors an MCMC algorithm is approximately 3/. Finally Figure 17.1
was devised to obtain posterior samples. displays an image plot of the estimated
Gibbs updates were used for the regression response surface overlaid with contours
parameters while Metropolis updates were of the estimated spatial random effects
employed for spatial variance components (the w(s)s). The random effects serve to offset
( 2 , 2 ) and the spatial range parameter . the spatially varying density of the response
The CODA package in R (www.r- surface.
project.org) was used to diagnose
convergence by monitoring mixing, Gelman
Rubin diagnostics, autocorrelations, and 17.9. BAYESIAN MODELS FOR
cross-correlations. Analysis was based on DISEASE MAPPING
three chains of 11,000 samples each. The
first 1,000 samples were discarded from In previous sections we have alluded to a
each chain as a part of burn-in. Subsequent simple Poisson model for disease counts. In
parameter estimation and analysis used the fact, this is the basic model often assumed
remaining 30,000 (10,000 3) samples. for small area counts of disease (in tracts, zip
Table 17.1 presents the 95% central codes, counties, etc.). We consider two data
credible intervals for the parameter estimates resolutions here. First we consider case event
based upon the posterior samples. All six data where, within a suitable study region
covariates are significant and perhaps explain (W ), realization of cases arises. The locations
some of the spatial variation in the data, of cases are usually residential addresses.
as is indicated by the spatial variance 2 These form a spatial point process. Often
being smaller than the measurement error data is not available at this level of spatial
variance 2 . The spatial range is calcu- resolution and aggregation to larger spatial
lated as the distance beyond which the units occurs. Aggregated counts of disease
correlation function drops below 0.05; for are often more readily available (e.g., from
Table 17.1 Parameter estimates for the model covariates

elevation and spring and fall Tasseled Cap spectral components.
Lower table provides parameter estimates for error terms 2
and 2 , spatial range , and associated effective range
Parameter Estimate: 50% (2.5%, 97.5%)
Intercept 0.262 (0.954, 0.387)
ELEV 0.002 (0.002, 0.001)
SPR-TC1 0.007 (0.001, 0.013)
SPR-TC2 0.007 (0.011, 0.003)
SPR-TC3 0.011 (0.006, 0.015)
FALL-TC1 0.007 (0.011, 0.003)
FALL-TC2 0.008 (0.004, 0.011)
FALL-TC3 0.004 (0.008, 0.001)
2 0.009 (0.005, 0.016)
2 0.014 (0.012, 0.018)
0.002546 (0.001325, 0.005099)
Effective range (meters) 1178.448 (588.301, 2264.629)
2597000
2595000 2596000
Lat. UTM
2594000
1949000 1950000 1951000 1952000 1953000

Lon. UTM
Figure 17.1 Contour lines of estimated spatial random effects overlayed on an image plot
of estimated relative density of eastern hemlock. Note, the random effects serve to offset
the spatially varying density of eastern hemlock.
official government sources). Hence, the of locations. Often the natural likelihood
second common data type is disease count model for such data is a heterogeneous
data within small areas. These small areas Poisson Process (PP). In this model, the
are arbitrary with respect to the disease distribution of the cases (points) is governed
process (such as census tracts, counties, by a first-order intensity function. This
postcodes) and form a sub-division of the function, l(s) say, describes the variation
study region. In what follows we will briefly across space of the intensity (density) of
consider case event data, but will concentrate cases. This function is the basis for modeling
discussion on the more commonly available the spatial distribution of cases. we denote
count data type. this model as:
17.9.1. Case event data s PP(l(s)).
Assume we observe within a study region

(W ), a set of m cases, with residen-
tial addresses given as {si }, i = 1, . . . , m. The likelihood associated with this model is
Figure 17.2 displays an example of such data: given by:
larynx cancer incident case addresses for a
fixed time period (see Lawson, 2006, Ch 1
for discussion). Here the random variable is (
m
the spatial location, and so we must employ L= l(si ) exp { l(u) du}
i=1
models that can describe the distribution W
+ +
++ +
+ + +
+
+ + +
+ + + ++
+
++ +
42500 + ++
+
+
+
+ +
+
+
+ + +
+
y
42000
+
+
+ + + +
+ +
+
+ +
+ + +
41500
+
++
+ +
+
+
34800 35000 35200 35400 35600 35800 36000

x
Figure 17.2 Larynx cancer incident case address locations in NW England (19741983).
where l(si ) is the first-order intensity is a vector of parameters. In modeling we

evaluated at the sample locations {si }. usually specify a parametric form for l1 (s | )
This likelihood involves an integral of l(u) and treat l0 (s) as a nuisance effect that
over the study region. must be included. Usually some external data
In disease mapping studies, usually is used to estimate l0 (s) nonparametrically
the variation in disease relates closely to (leading to profile likelihood). This data
the underlying population that is at risk for relates to the local population density.
the disease in question. This is known as the Alternatively, if the spatial distribution of
at risk background. Hence any definition of a control disease is available (see Lawson
the intensity of cases must make allowance and Cressie (2000) for more details), then
for this effect. Any areas where there are lots the problem can be reformulated as a binary
of at risk people are more likely to yield cases logistic regression where l0 (s) drops out of
and so we must adjust for this effect. Often the likelihood. Denote the control disease
the intensity is specified with a multiplicative locations as {sj }, j = m + 1, . . . , m + n, and
link between these components: with N = n + m, a binary indicator function
can be defined:
l(s) = l0 (s)l1 (s | ).

1 if i 1, . . . , m
yi =
Here the at risk background is represented 0 otherwise
by l0 (s) while the modeled excess risk of
the disease is defined to be l1 (s | ), where i, i = 1, . . . , N
and the resulting likelihood is just given by: (s) is that it is a random field defined to
be a spatial Gaussian process.
In the intensity (17.8), all the variables
(
N
[l1 (si )]yi can be estimated using maximum likelihood.
L(s | ) = .
1 + l1 (si ) However when a Bayesian approach is
i=1
assumed then all parameters have prior
probability distributions and so we would
By conditioning of the joint set of cases and need to consider sampling the posterior
controls the population effect is removed and distribution given by:
does not require estimation.
P1 (, , | s) L(s | , , ) P0 (, , )
17.9.2. Parametric forms
Often we can define a suitable model for where P0 (, , ) is the joint prior distribu-
excess risk within l1 (s). In the case where tion of the parameters. Assuming indepen-
we want to relate the excess risk to a known dent prior distributions for each parameter
location (e.g., a putative source of pollution) component, i.e., P0 (, , ) = g1 (1 )
then a distance-based definition might be g2 (2 ) g3 (3 ) . . . g () g ( ), this model
considered. For example: can be sampled via standard MCMC algo-
rithms. In intensity (17.9), the spatial com-
ponent (s) would have a spatially correlated
l1 (s) = exp{F(s) + ds } (17.8)
prior distribution and so a Bayesian approach
would be natural.
where is an overall rate parameter, ds is a
distance measured from s to a fixed location
(source) and is a regression parameter, F(s) 17.9.3. Count data
is a design vector with columns representing
spatially-varying covariates, and is a Often only count data is available within a
parameter vector. The variables in F(s) could set of small areas. Denote yi as the count
be site-specific or could be measures on the of disease within the ith small area where
individual (age, gender, etc.). In addition this i = 1, . . ., p. As in the case of case event data
definition could be extended to include other we need to allow for the at risk population
effects. For example we could have: in our models. This can usually be easily
achieved for count data since expected rates
or counts can be obtained or calculated
l1 (s) = exp{F(s) + (s) + ds } for small areas. For example, age sex
standardized rates for census tracts, postal
(17.9) zones, or zip codes are often available from
government sources. Denote these rates as
ei , i = 1, . . ., p. Also, in our model we
where (s) is a spatial process, and is a want to model the relative risk of disease
parameter. This process can be regarded as via the parameter i , i = 1, . . ., p. The
a random component and can include within relative risk will be the focus of modeling
its specification spatial correlation between and it is usually assumed that the {ei }
sites. One common assumption concerning are fixed.
The simplest model for such data is a to a mean angle parameter (0 ), while the
Poisson log linear model where: distance component is assumed to be log-
linearly related to risk. The final term i is
meant to repesent unattributed extra variation
yi Poiss(ei i ).
in risk. This could include random effect
terms, such as:
In addtion the relative risk i is usually
modeled with a log link for positivity.
i = ui + i
A simple example could be:
where each term could represent different

log i = 0 ,
aspects of the extra variation. For example,
ui is often defined to have a correlated
a constant. This model represents constant prior distribution (and is called correlated or
area-wide risk and often the null hypothesis structured heterogeneity (CH)), whereas i
aasumed by many researchers is that 0 = 0, is often assumed to represent uncorrelated
so that i = 1. This represents the situation heterogeneity (UH). The prior distributions
where the underlying rate or count gener- assumed for these terms are commonly:
ates the risk directly (i.e., yi Poiss(ei )).
This would be applicable if there were no
i N(0, )
excess risk in the study area. Of course this
is seldom reality and it is the alternative
1
hypotheses where i have some spatilal (ui | ) exp wij (ui uj )2
structure that is of interest in modeling.
j i
Some examples of models currently
adopted for different applications can be
where wij = 1/2 i, j. The neighborhood
instructive:
i is assumed to be the areas with common
boundary with the ith area. The second of
Putative health hazard assessment these prior distributions assumes dependence
Usually in these applications some measure between neighboring areas. This distribution
of the association between small area counts is termed a conditional autoregressive (CAR)
and a fixed location or locations is to be prior distribution. It is an example of a
made. This association could be via distance Markov random field. Note that in this
or directional measures. For example, define definition the parameter controls the
the distance from the ith small area centroid spatial smoothness (or correlation) of the
to the source as di and the angle as i . A log component.
linear model for risk related to a source might The posterior distribution can be specified
be of the form: as follows:
log i = 0 + 1 di + 2 cos(i 0 ) P(u, v, , , | y) L(y | )
+ 3 sin(i 0 ) + i . f1 (u)f2 (v)f3 ()f ()f ( )
Here, the directional component is summa- where f1 (u) is the CAR prior distribution,
rized by the cosine and sine terms in relation f2 (v) is a zero mean normal distribution,
f3 () is the joint prior distribution for aggregate level. Often the main issue relates
the regression parameters, f () and f ( ) to making individual level inference from
are prior distributions for the remaining aggregate data. Aggregation or averaging
parameters. Note that and are hyper- induces biases in estimation of parameters
parameters and they have prior distributions for models (see, e.g., Wakefield, 2004). The
as could any hyperparameters within the modifiable areal unit problem (MAUP) is an
other prior distributions (f1 , f2 , f3 ). The prior example of an aggregation-related inference
distributions for regression parameters are problem. Another problem that can arise
often assumed to be independent and each is the misaligned data problem (MIDP).
parameter is often assumed to have a zero This arises when the spatial resolution of
mean normal prior distribution. covariates is different from the outcome
variable. The classic example of this would
be modeling cancer outcomes at zip code
Disease map reconstruction level and relating these to groundwater
Often the main aim of modeling disease inci- uranium measured at point locations (wells).
dence is simply to provide a good estimate A fuller discussion of these issues can be
of disease risk. This can be specified as the found in Banerjee et al. (2004). In general the
relative risk within each region (i ). Hence type of model assumed is often of the form:
the aim is to provide an accurate estimate of
the true underlying risk within the map. Much
recent work has been focussed on this area of log i = xiT + ziT
concern, and many models and approaches
have been developed (see, e.g., Banerjee
where xiT is a row vector of fixed covariate
et al., 2004, section 5.4; Lawson, 2006,
values for the ith small area and is a
Chapter 8.0, Lawson (2008)). Typically a log
corresponding parameter vector, and ziT is a
linear model with random effects is defined:
row vector of random effects and a unit
vector.
log i = 0 + i where i = ui + i .
Surveillance
Here the ui , i terms are CH and UH defined With recent concerns over bioterrorism
as above. This is often called the convolution (Fienberg and Shmueli, 2005; Sosin, 2003;
model and was originally proposed by Besag Lawson and Kleinman, 2005), the focus of
et al. (1991). This model has proved to disease surveillance has become important.
be very robust against mis-specification of Essentially this focus concerns the moni-
the risk, although it can also over-smooth toring of disease incidence with a view to
rates. Lawson et al. (2000), Best et al. detecting aberrations or unusual incidence
(2005) and Hossain and Lawson (2006) have events. This often requires the monitoring of
provided recent simulation-based evaluations large scale databases of health information.
of a range of methods in this area. In addition, the focus of the monitoring could
be a range of effects. There could be a need
to find clusters of disease on maps or change
Ecological analysis points in time series or some mixture of these
This area of focus arises when the risk within effects in spacetime. Detection of change
a small area is to be related to a covari- in multiple time and spatial series is the
ate or covariates usually measured at the focus. This is a challenging area that requires
the use of fast computational algorithms The two effects have the following prior
and novel spatial-sequential inference. In distributions:
essence, a range of models found in equa-
tions (17.1)(17.3) above may need to be
examined simultaneously in this analysis. ui CAR(ui , /ni )
where i is the neighborhood of the ith area,

17.9.4. Example
ui is the mean of ui in the neighborhood,
Here we examine briefly an example of and ni is the number of neighbors, is the
relative risk estimation. The example consists variance, and
of the South Carolina incidence of congenital
anomalies deaths by county for 1990. This
has also been examined in Chapter 6 of i N(0, )
Lawson et al. (2003). Figure 17.3 diplays the
standardised mortality ratio for this disease
for 1990. We are concerned to estimate the where is the variance. Now 0 is assumed
true relative risk underlying these county to have a uniform prior distribution on a
rates. To achieve this we propose a log linear large range, while the and are variances
model for the risk in each area. Hence we and their inverses (precisions: 1/, 1/)
assume the likelihood: have gamma prior distributions with fixed
parameters (shape: 0.5, scale: 0.0005). There
is some debate currently about how infor-
yi Poiss(ei i )
mative such hyperprior distributions are
(see, e.g., Gelman, 2005). In fact it is
and then a log linear model of the form always recommended that sensitivity to prior
assumptions be examined in any application.
The Bayes estimate of the relative risk is the
log i = 0 + i where i = ui + i . posterior expected value of relative risk for
SMR
less than 0.5000
0.50010.7800
0.78011.0900
1.09011.5100
1.5101 and over
Figure 17.3 Congenital anomalies deaths, standardized mortality ratio, South

Carolina, 1990.
each region. This can be obtained from a 17.10. SOFTWARE FOR BAYESIAN
posterior sample by averaging the converged MODELING
sample output. The estimates of the relative
risk for the congential abnormalities data Posterior sampling is the commonest
are displayed in Figure 17.4. The posterior approach to Bayesian inference. There is
probability of i > 1 over the whole map is now a range of software that can peform
shown in Figure 17.5 Note that this quantity this task. The best known of these is the
can be used to assess whether ther are any free software WinBUGS (downloadable
areas of significant risk elevation on the from www.mrc-bsu.cam.ac.uk/bugs/). This
map. For more details of this example see package employs both Gibbs sampling and
Lawson et al. (2003: chapter 6). MetropolisHastings updating methods for a
RR
less than 0.3720
0.37210.8230
0.82311.4410
1.44112.2180
2.2181 and over
Figure 17.4 Posterior expected relative risk estimates for the congenital abnormalities data
for South Carolina, 1990.
PP
less than 0.0820
0.08210.2050
0.20510.4170
0.41710.6710
0.6711 and over
Figure 17.5 Posterior probability of exceedance (Pr (i > 1)) for the South Carolina
congenital abnormalities data.
wide range of models. The package also has Curve Modelling with Applications to Weed Growth.
a wide range of online runnable examples Biometrics, 61, 617625.
and has a GIS tool called GeoBUGS that Berger, J.O. (1985). Bayesian Decision Theory.
allows mapping of small area data and New York: Springer Verlag.
parameter estimates, as well as spatial Besag, J. and Green, P.J. (1993). Spatial statistics
modeling of various kinds. Bayesian Kriging and Bayesian computation. Journal of the Royal
and both CAR and multivariate CAR models Statistical Society, Series B, 55: 2537.
can be fitted using this package. Facilities Besag, J., York, J. and Molli, A. (1991). Bayesian
also exist within R (e.g. packages such image restoration with two applications in spatial
as bayesm, geoR, geoRglm, MCMCpack, statistics. Annals of the Institute of Statistical
mCmC, spBayes etc.) and MATLAB Mathematics, 43: 159.
(spatial statistics toolbox) to perforn MCMC Best, N., Richardson, S. and Thomson, A. (2005).
computations for Bayesian spatial models. A comparison of Bayesian spatial models for disease
mapping. Statistical Methods in Medical Research,
14: 3559.
Carlin, B.P. and Louis, T. (2000). Bayes and Empirical
ACKNOWLEDGMENTS Bayes Methods for Data Analysis, 2nd edn. London:
Chapman and Hall/CRC Press.
Portions of this research were based upon Chen, M., Shao, Q. and Ibrahim, J. (2000). Monte
data generated in long-term research studies Carlo Methods in Bayesian Computation. New York:
on the Bartlett Experimental Forest, Bartlett, Springer Verlag.
NH, funded by the U.S. Department of Chils and Delner (1999). Geostatistics: Modelling
Agriculture, Forest Service, Northeastern Spatial Uncertainty, p. 43. New York: Wiley.
Research Station. The authors would espe-
Cressie, N.A.C. (1993). Statistics for Spatial Data,
cially like to thank Marie-Louise Smith in the revised edition. New York: Wiley.
USDA Forest Service Northeastern Research
Fienberg, S. and Shmueli, G. (2005). Statistical issues
Station for sharing a data set and Andrew
and challenges associated with rapid detection
Finley in the Department of Forest Resources of bio-terrorist attacks. Statistics in Medicine, 24:
at the University of Minnesota for help with 513529.
the statistical computations.
Gamerman, D. (2000). Markov Chain Monte Carlo:
Stochastic Simulation for Bayesian Inference.
New York: CRC Press.
NOTE Gelfand, A. and Ghosh, S. (1998). Model choice:

A minimum posterior predictive loss approach.
1 Basal area is the cross-sectional area of a tree at Biometrika, 85: 111.
1.37 meters from the ground. Basal area per hectare Gelman, A. (2005). Prior distributions for variance
is the sum of all the basal area per tree in the hectare.
parameters in hierarchical models. Bayesian Analysis,
1: 119.
Gelman, A., Carlin, J.B., Stern, H.S., and Rubin, D.
(2004). Bayesian Data Analysis. London: Chapman
REFERENCES and Hall/CRC Press.
Banerjee, S., Carlin, B.P. and Gelfand, A.E. (2004). Geman, S. and Geman, D. (1984). Stochastic relaxation,
Hierarchical Modeling and Analysis for Spatial Data. Gibbs distributions and the Bayesian restoration of
London: Chapman and Hall/CRC Press. images. IEEE Trans. PAMI, 6: 721741.
Banerjee, S. and Johnson, G.A. (2006). Coregionalized Gilks, W.R., Clayton, D.G., Spiegelhalter, D.J.,
Single- and Multi-resolution Spatially-varying Growth Best, N.G., McNeil, A.J., Sharples, L.D. and Kirby, A.J.
(1993). Modelling complexity: Applications of Gibbs Lee, P. (2005). Bayesian Statistics, 4th edn. London:
sampling in medicine. Journal of the Royal Statistical Arnold.
Society B, 55: 3952.
Mller, J. and Waagpetersen, R. (2004). Statistical
Hastings, W. (1970). Monte Carlo sampling methods Inference and Simulation for Spatial Point Processes.
using Markov chains and their applications. New York: CRC/Chapman and Hall.
Biometrika, 57: 97109. 44
Neal, R.M. (2003). Slice sampling. Annals of Statistics,
Hossain, M. and Lawson, A.B. (2006). Cluster detection 31: 134.
diagnostics for small area health data: with reference
Ripley, B.D. (1987). Stochastic Simulation. New York:
to evaluation of local likelihood models. Statistics in
Wiley.
Medicine, 25: 771786.
Robert, C. (2001). The Bayesian Choice: A Decision-
Kauth, R.J. and Thomas, G.S. (1976). The tasseled
theoretic Motivation. New York: Springer Verlag.
cap a graphic description of the spectral-temporal
development of agricultural crops as seen by landsat. Robert, C. and Casella, G. (2005). Monte Carlo
In: Proceedings of the Symposium on Machine Statistical Methods, 2nd edn. New York: Springer.
Processing of Remotely Sensed Data, pp. 4151.
Schabenberger, O. and Gotway, C. (2004). Statistical
West Lafayett: Purdue University.
Methods For Spatial Data Analysis. Boca Raton, FL:
Lawson, A.B. (2006). Statistical Methods in Spatial Chapman and Hall/CRC Press.
Epidemiology, 2nd edn. New York: Wiley.
Scheiner, S.M. and Gurevitch, J. (2001). Design and
Lawson, A. B. (2008) Bayesian Disease Mapping: Analysis of Ecological Experiments, 2nd edn. London:
Hierarchical Modeling in Spatial Epidemiology. Oxford University Press.
London: Chapman and Hall/CRC Press.
Sosin, D. (2003). Draft framework for evaluating
Lawson, A.B., Biggeri, A., Boehning, D., Lesaffre, E., syndromic surveillance systems. Journal of Urban
Viel, J.-F., Clark, A., Schlattmann, P. and Divino, F. Health, 80: i8i13. supplement.
(2000). Disease mapping models: an empirical
Spiegelhalter, D.J., Best, N.G., Carlin, B.P. and van der
evaluation. Statistics in Medicine, 19: 22172242.
Linde, A. (2002). Bayesian deviance, the effective
Special issue: Disease mapping with emphasis on
number of parameters and the comparison of
evaluation of methods.
arbitrarily complex models. Journal of the Royal
Lawson, A.B., Browne, W.J. and Vidal-Rodiero, C.L. Statistical Society, 64: 583640.
(2003). Disease Mapping with WinBUGS and
Stein, M. (1999). Statistical Interpolation of Spatial
MLwiN. New York: Wiley.
Data: Some Theory for Kriging, p. 46. New York:
Lawson, A.B. and Cressie, N. (2000). Spatial sta- Springer Verlag.
tistical methods for environmental epidemiology.
Wakeeld, J. (2004). A critique of statistical aspects
In: Rao, C.R. and Sen, P.K. (eds), Handbook
of ecological studies in spatial epidemiology.
of Statistics: Bio-Environmental and Public Health
Environmental and Ecological Statistics, 11: 3154.
Statistics, volume 18, pp. 357396. Amsterdam:
Elsevier. Waller, L. and Gotway, C. (2004). Applied Spatial
Statistics for Public Health Data. New York: Wiley.
Lawson, A.B. and Kleinman, K. (eds) (2005). Spatial
and Syndromic Surveillance for Public Health, p. 45. Webster, R. and Oliver, M. (2001). Geostatistics for
New York: Wiley. Environmental Scientists. New York: Wiley.
18
Monitoring Changes in
Spatial Patterns
Peter A. Rogerson
18.1. INTRODUCTION owing to its widespread use and popularity.

The use of K-functions to assess the nature of
The tools of spatial analysis have long point patterns over a range of spatial scales
been used to study the characteristics of (Ripley, 1976) and kernel density methods
geographic patterns. Central to this effort has to visualize the spatially-varying intensity of
been the application of statistical tools to test variables are now in common use (see Bailey
the null hypothesis of spatial randomness. and Gatrell (1995) and Waller and Gotway
Interest in the spatial distribution of species (2004) for reviews).
within the field of ecology gave rise to While the majority of early approaches
some of the earliest approaches, including relied upon a single, global statistic to
the nearest neighbor statistic for use with evaluate the null hypothesis of spatial ran-
point data (Skellam, 1952; Clark and Evans, domness, there has been more recent interest
1954) and the quadrat approach for use with in local statistics; these are the location-
counts of events lying within predefined specific components of global statistics that
subregions (see Blackman (1935) for an early allow one to test whether spatial association
application). exists in the vicinity of a particular location
As the field of spatial analysis has (see, e.g., Anselin, 1995; Getis and Ord,
developed, other statistical measures and 1992; Ord and Getis, 1995).
tests for geographic pattern have become Many of the more recent developments in
popular. Morans I (1950) is of special note, the statistical analysis of spatial patterns have
taken place within the field of epidemiology, new map of burglaries, or the epidemiologist
where there is interest in the detection who maps the locations of new cancer cases
of geographic clusters. Besag and Newell each year. A market researcher may wish to
(1991) suggest three categories for these assess the degree to which customers cluster
statistical approaches. In addition to the around a store, and it may be of particular
global and local statistics outlined above interest to monitor this each month, based
(referred to by Besag and Newell as general upon new sales data. If statistical tests are
and focused tests, respectively), they note simply carried out each time a new map
that there is a separate category for tests for is available, the multiplicity of tests will
the detection of clustering. While global tests increase the likelihood that a false declaration
lead to acceptance or rejection of a specified of significance is made. For instance, if
null hypothesis (perhaps one of spatial 20 tests are carried out using a Type I error
randomness, but more realistically, one where probability of 0.05, we can expect to find
the observed spatial distribution of cases is on average one false rejection of the null
compared with an expected distribution based hypothesis among the 20 tests.
upon population distribution and possibly In this chapter we describe and review
other covariates), they do not indicate the the use of statistical approaches designed for
size and/or location of geographic clusters. carrying out repeated tests concerned with the
Similarly, local tests are limited in the sense evaluation of spatial patterns. The common
that they evaluate only one location. A test objective of such repeated tests is the quick
for the detection of clustering may essentially detection of geographic change (where most
be viewed as a set of local tests (where one commonly the goal is to find new, emergent
or more specifications of potential cluster clusters as quickly as possible). It can be
size are made for many locations within the noted that this objective of prospective,
study area). Scan statistics (Kulldorff and quick detection of temporal change in spatial
Nagarwalla, 1994), and the maximum of pattern differs from that of retrospectively
smoothed Gaussian random fields (Rogerson, finding spacetime interaction in a set of
2001) fall into this category, where the data using a single test such as those
extreme local statistic is assessed, and the outlined by say Knox (1964), Mantel (1967),
multiple hypothesis testing associated with or Raubertas (1988).
carrying out many local tests is accounted for. The development of methods for the
Like other subfields of spatial analysis, surveillance or monitoring of spatial patterns
interest in the statistical analysis of spatial has received much of its impetus during the
patterns and the development of statistical last few years from intense interest in surveil-
methods for cluster detection has grown lance for bioterrorism, and following from
rapidly in the last decade. Waller (Chapter 16 that, interest in public health surveillance.
in this volume) provides a review of many of The recent reviews of outbreak detection
these developments and related issues. algorithms (Buckeridge et al., 2005) and
Spatial statistical tests of null hypotheses control charts for public health surveillance
are almost always carried out on a single set (Woodall, 2006) include discussions of spa-
of data; the hypothesis is accepted or rejected, tial considerations in surveillance and sum-
and ideally the size and location of significant marize the many recent advances in this area.
geographic clustering is revealed. However, In addition, Chapter 9 of Lawson (2001) and
there are many situations where repeated the more recent collection of contributions
tests of this type are required. Imagine the edited by Kleinman and Lawson (2005) also
crime analyst, who, each month, receives a attest to the growing importance of this field.
MONITORING CHANGES IN SPATIAL PATTERNS 345
The remainder of the chapter is structured application of Shewhart charts, sequential

as follows: Section 18.2 describes and observations are plotted on a chart that
reviews the use of the methods of statistical has both a centerline corresponding to the
process control; these methods have been assumed process mean, and upper and
developed primarily within an industrial lower control limits, usually corresponding
context for the quick detection of change to plus and minus three standard deviations
in industrial processes and product quality. from the mean, respectively. If data come
They are appropriate for monitoring an from a standard normal distribution, an
outcome variable for a single region, and observation outside of the control limits
they lie at the core of many approaches of plus or minus three would be observed
to spatial surveillance. The focus is upon once every 370 observations on average,
cumulative sum methods in particular, due since the tail area of a normal distribu-
to both their optimality properties and their tion lying outside three standard devia-
widespread use both in temporal public tions is approximately equal to 1/370. One
health surveillance and in many early possible rule for declaring a process to be
attempts at spatial surveillance. Section 18.3 out-of-control, therefore, could be to do so
gives a very brief history of the recent when an observation is observed outside of
development of interest in the methods the control limits; the average run length
of spatial surveillance. The intent here is (i.e., number of observations) until an alarm
to indicate some of the early approaches is declared, when the process is in control
and some of the general perspectives taken (designated ARL0 ), would be 370. Since this
in various attempts to monitor geographic procedure would declare alarms for single,
patterns; no attempt is made to be compre- outlying observations, various alternative
hensive. Section 18.4 describes how these rules are also commonly implemented
statistical process control methods have been for example, some users advocate declaring
adopted for use with spatial statistics to an out-of-control alarm if there are nine
carry out surveillance for potential changes consecutive observations on one side of
in geographic patterns. the mean (see, e.g., Nelson (1984) for this
and other suggested rules). For normally
distributed observations, the control limits for
a Shewhart chart can easily be redefined to
18.2. STATISTICAL PROCESS be consistent with a desired value of ARL0 .
CONTROL FOR TEMPORAL For example, suppose that false alarms were
SEQUENCES OF desired only once every 700 observations.
OBSERVATIONS The standard normal score associated with a
two-tail area of 1/700 (i.e., an area of 1/1,400
18.2.1. Shewhart charts
in each tail) is found to be 3.19, and so
The majority of statistical approaches for upper and lower control limits would be set
spatial surveillance have their methodologi- at 3.19 standard deviations.
cal roots in the field of statistical process con-
trol. Industrial processes are often monitored
so that various process parameters stay within
18.2.2. Cumulative sum
tolerable limits, and so that manufactured
(CUSUM) charts
products maintain acceptable quality. Control
charts for such purposes were developed Although Shewhart charts are straightforward
by Shewhart in 1924. In a straightforward to employ, and they are good at detecting
large changes from the process mean, they are For other choices of k in the range

not as sensitive as other methods at detecting 1/ ARL0 k 1 one can use the more
smaller and therefore more subtle deviations general
from the baseline process. The cumulative
sum (CUSUM) chart was introduced by Page 2
2k 2 ARL0 +2 2k ARL0
(1954); the approach consists of maintaining h ln +1 1.166.
2k 2 ARL0 +1 2k
the cumulative sum of deviations between
(18.2)
observed and expected values. Cumulative
sum methods are covered in detail by
Hawkins and Olwell (1998). For the partic- The Shewhart chart is a special case of the
ular example of standardized, independent, cusum chart, where k is equal to the Shewhart
normally distributed observations (zt ), the control limit and h = 0.
one-sided cumulative sum at time t, St , is: There is a tradeoff between the rate of
false alarms and the ability to detect change
when it actually occurs; the higher the value
St = max(0, St1 + zt k) of ARL0 (and hence the lower the false
alarm rate), the greater will be the time
until true change is detected (as signified by
ARL1 , the average number of observations
where k is a parameter chosen to be equal until an alarm is signaled, once change has
to one-half the size of the deviation that occurred). Moustakides (1986) shows, and
is expected when the process goes out of Frisen and Sonesson (2005) note, that the
control. In this example, the expected value cusum approach minimizes the maximum
of each observation is equal to zero (since expected delay until an alarm is sounded, for
observations have been standardized), and a particular changepoint.
it is easy to see that the cusum is, more
precisely, the cumulative sum of deviations
for observations that exceed their expectation Cusums for Poisson data
by more than k standard deviations. The Regional data to be used for monitoring are
parameter k is almost always chosen in often not normally distributed. For example,
this case to be equal to ; this choice counts of disease or crime incidents are often
minimizes the time it will take to detect a one taken to have a Poisson distribution. Lucas
standard deviation increase in the mean of (1985) gives the Poisson cusum as:
the process. An alarm indicating an increase
in the underlying mean of the process is
St = max(0, St1 + yt k)
declared when the cumulative sum exceeds
some predefined threshold, h (i.e., St > h).
The threshold is chosen in conjunction with where yt is the count at time t. If the expected
a desired value of ARL0 ; for the case count is constant and equal to l0 , the value
of k = , Rogerson (2006) provides the of k is:
following formula:
l1 l0
k=
ln l1 ln l0

ARL0 + 4 ARL0
h ln + 1 1.166.
ARL0 + 2 2 where it is desired to detect an increase
(18.1) in the Poisson parameter from l0 to l1 as
quickly as possible. Lucas gives tables for the where 1/ is the mean time between
threshold h, which is determined from both events.
k and the analysts choice of the in-control To detect a decrease in the mean time
average run length, ARL0 . between events (and hence an increase in
An alternative approach is to attempt to from, say, 0 to 1 ), one can use the
transform the Poisson counts to normality. exponential cusum:
Rossi et al. (1999) find that the following
transformation converts the data, approxi-
St = max(0, St1 xt + k)
mately, to a standard normal distribution:
where xt is the time between events, and:

yt 3l0 + 2 l0 yt
zt = .
2 l0 1 0
k= .
ln(1 0 )
Rogerson and Yamada (2004) give examples

however showing that this transformation Rogerson (2005) derives the approximate
may be unreliable when l0 < 2. In addi- threshold associated with a desired ARL0
tion, Hawkins and Olwell (1998) note that by first transforming the problem into one
detection times are shorter when cusums having an in-control parameter of =1; this is
designed for the distribution are employed, achieved by dividing each observation by 0 .
in comparison with cusums based upon The normalized out-of-control parameter is
transformations to normality. then 1 = 1 /0 . The threshold is then
Situations where the expected count given by:
remains constant over time are unusual; dis-
ease counts might be expected to vary season- (q + 2) ln(q + 1)
ally, or exhibit other temporal trends. The use h 1.33
(q + 1) ln(1 )
of transformations to normality allows such
temporally-varying expectations to be easily
handled. Alternatively, the Poisson cusum where:
can itself be generalized to handle changing
expectations (see Hawkins and Olwell, 1998;
q = ARL0 ln(1 )|1 k|.
Rogerson and Yamada, 2004).
Cusums for exponential data 18.2.3. Other methods for

Quicker detection of increases in the rate temporal surveillance
of rare events can often be achieved by
The exponentially weighted moving average
monitoring the times between events (Wolter,
(EWMA) chart was introduced by Roberts
1987; Gan, 1994). For a random process,
(1959) and is discussed further by Hunter
the times between events are exponentially
(1986) and by Lucas and Saccucci (1990);
distributed:
it is based upon the quantities:
f (x) = exp ( x) zt = (1 l)zt1 + lxt

where xt is the observation at time t and l interest and in the surrounding regions.
is a parameter that dictates the importance The weights define the spatial structure of
of dated information. An alarm is signaled at the alternative, and should be matched as
the first time when the value of zt exceeds a closely as possible with the definition of any
time-varying threshold that over time reaches presumed cluster. The weights for example
an asymptotic limit. In the special case of might decline as the distance from the
l = 1, only current information is used, and region of interest increases. For each time
the method is identical to the Shewhart chart. period, the weighted sum of observations is
The ShiryaevRoberts method, based upon compared with expectations, and deviations
contributions from Shiryaev (1963) and are cumulated; if these deviations exceed a
Roberts (1966), can be derived as a special pre-specified threshold, an alarm signaling a
case of a likelihood ratio method with a possible increase in disease in the vicinity of
noninformative prior distribution on the time the region of interest is sounded. Raubertas
of the changepoint (Frisen and Sonesson, notes some of the complications that
2005). This approach minimizes the expected arise when one wishes to monitor several
time until an alarm following a change. regions simultaneously, since there will
Many other approaches to temporal be correlation in the monitoring statistics
surveillance exist; these range from simple obtained for regions that are close to
calculations of historical limits that are one another (since they will have shared
empirically based upon recent data, to sophis- neighborhoods).
ticated use of time series analysis these are Statistical process control approaches to
reviewed by Farrington and Beale (1998), spatial surveillance may be categorized into
and more recently by Le Strat (2005). those that maintain separate, local charts
for each region (where, like Raubertas,
the regional chart may possibly include
information from a defined neighborhood
around the region), and those that monitor
18.3. SPATIAL SURVEILLANCE
a single, global spatial statistic.
As an example of the latter category,
18.3.1. Brief overview of the
Rogerson (1997) also uses cumulative sum
development of methods
methods to monitor temporal changes in a
for spatial surveillance
global spatial statistic (specifically, Tangos
Like recent developments in spatial cluster 1995 statistic). Each time a new case is
detection, many of the recent developments observed, Tangos statistic is updated and the
in the monitoring of spatial patterns have resulting statistic is then compared with the
occurred within the field of public health. expectation of the statistic (conditional upon
Raubertas (1989) was one of the first to the previous value of the statistic, before
outline how statistical approaches to spatial the new case was observed) under the null
surveillance could be developed, and he did hypothesis of no raised incidence in any
so in the context of disease surveillance. subregion. An alarm is sounded, indicating
Raubertas employed cumulative sum a significant change in the global statistic,
methods to suggest how disease monitoring if deviations between observed and expected
for a particular region within a study area statistics cumulate sufficiently.
could be carried out. Monitoring is based Kulldorff (2001) has extended his spatial
upon forming a weighted sum of the number scan statistic to the case of prospective
of cases occurring both in the region of disease surveillance, by considering the
likelihood of the observed number of events and Clayton, 1993) for an historical period.
in space-time cylinders (where the vertical In particular, they use a logistic equation
axis represents time, and the horizontal to model the probability that a particular
plane represents a region and its surrounding individual is a case. Next, they use the
neighborhood), under the null hypothesis. coefficient estimates to derive the expected
The spatial scan statistic (Kulldorff and probability that an individual becomes a
Nagarwalla 1994) is based upon the like- case during the next time period. Statistical
lihood ratio associated with the number significance is achieved if the observed count
of events inside and outside of a circular of cases is unlikely to have occurred using a
scanning window. The numerator of the ratio binomial distribution based upon the number
is associated with the hypothesis that the of individuals and the predicted probability
rates inside and outside of the rate are resulting from the model.
different, and the denominator of the ratio Other approaches to spatial surveillance
is associated with the null hypothesis that include distance-based methods (see, e.g.,
the rates inside and outside of the window Forsberg et al., 2005), and perspectives
are the same. Likelihood ratios are found that adopt more of a model-based than
using circular scanning windows of various a statistical hypothesis testing perspective
sizes, and the window moves, to scan over (Lawson, 2005).
space. The most unusual window under the
null hypothesis is the one displaying the
maximum likelihood ratio. This maximum
observed ratio is compared with ratios 18.3.2. Spatial issues in spatial
that are simulated by assuming the null surveillance
hypothesis to be true; if for example the
maximum observed ratio is greater than One way to monitor variables for a set of
95% of the simulated ratios, the cluster is said regional subunits is to maintain a separate
to be significant using = 0.05. cusum chart for each subunit. An immediate
For disease surveillance, the circular scan- issue that arises in the context of monitoring
ning windows become cylinders with time on across a set of regional subunits is how
the vertical axis, where the top of the cylinder to properly account for the multiple testing
represents the most recent time period. To across spatial units. If cusum control charts
find space-time clusters as the cylinders are kept for each region, the average run
grow vertically with the progression of time, length between false alarms will be less than
the maximum likelihood ratio concept is that implied by the threshold derived for each
simply generalized. At each time period, the chart (which is based upon the desired ARL).
likelihood of the most interesting cylinder Thus if thresholds for each chart are chosen
(i.e., the one with the highest likelihood using a desired ARL0 of 100, the mean time
ratio) is compared with the likelihood of until the first alarm on at least one of the
the most interesting cylinder generated from charts will be less than 100. More precisely,
many simulations of the null hypothesis. The the average run length between false alarms
popularity of the method has been aided by for a set of m charts (one for each region),
freely available software (SatScan), available ARL0 , will be
at www.satscan.org.
Kleinman et al. (2004) model the count of
1
cases in a small region using covariates in ARL0 = .
a generalized linear mixed model (Breslow 1 (1 1/ARL0 )m
This is based upon the fact that the time control. This suggests that the adjustments
between false alarms has an exponential for multiple testing may be too severe, and
distribution (Page, 1954), and hence the prob- recent developments in the area of multiple
ability that any single observation leads to a testing can be used to lower the thresholds
false alarm is 1/ARL0 . Alternatively stated, (for a review, see Castro and Singer, 2006).
the ARL to use on each chart is given by: A second reason that equations (18.3) and
(18.4) can be conservative is that they assume
0 that the m regional charts are independent.
1/m 11
1 More commonly, regional charts may exhibit
ARL0 = 1 1 (18.3)
ARL0 spatial dependence; a cusum chart for one
region may look a lot like a chart for a nearby
region. Finally, if emergent clusters might
where, again, ARL0 is the desired time exceed the size of regional subunits, this
between alarm investigations. A computa- will provide a rationale for monitoring local
tionally simpler way to account for the statistics for neighborhoods around regions.
simultaneous monitoring of the m charts is Maintaining separate charts for each region
to use a Bonferroni-type adjustment; instead is a directional scheme; the approach will
of using Equation (18.3) to determine the work very well when the actual change
threshold for each chart, the quantity: occurs in one of the regions (and not,
for example, combinations of regions), but
can lose considerable power in detecting
ARL0 = m ARL0 (18.4) change quickly when changes in other
directions occur. If, for example, an increase
occurs in a neighborhood containing several
is used. Thus if there are m = 10 regional regions (corresponding to several charts), this
units and a desired time between false alarms approach will not be as effective and can
of ARL0 = 100, the threshold for each chart yield longer times to detection than other
is found using ARL0 = 10 (100) = 1000, methods.
together with equation (18.1) or (18.2). In the next section, we examine some alter-
This type of adjustment is appropriate native approaches to multiplicity adjustment.
and will yield the desired ARL when
(a) no spatial autocorrelation in the regional
variables exists, (b) when all regions are in
Monitoring a single local statistic
control, and (c) there is a desire to monitor
Suppose that there is no spatial autocorrela-
individual regions, and not neighborhoods
tion in the regional values being monitored,
around regions. However, Equations (18.3)
and that we suspect that when change occurs,
and (18.4) will often lead to thresholds that
it will occur in the form of increases in
are too conservative (i.e., thresholds that are
a subset of regions comprising a neighbor-
too high). One reason for this is that not
hood. There are at least two ways forward
all m regions may be in-control; we only
if our objective is to detect this increase
require a threshold and false alarm rate that
quickly:
have been adjusted for the number of in-
control regions (which is unknown, but is
less than or equal to m). When a region 1 Keep a single chart for the variable consisting of
goes out of control, other (e.g., surrounding a weighted sum of the regional values (similar to
regions) may simultaneously go out of the suggestion of Raubertas).
2 Use the approach of Healy (1987), which is where = I; the Healy and Raubertas charts
optimal for quick detection of change in a single, will be identical. An important issue is the
hypothesized direction. adjustment for multiplicity; using individual
thresholds for each chart based upon mARL
would be too conservative, since the charts
While these approaches should give iden-
will be correlated (nearby local statistics will
tical results under the conditions specified,
be similar, since they use shared regional
Healys approach is more general, since
values). On the other hand, thresholds based
it can also handle the situation where
on ARL alone would be too liberal, unless the
the underlying variables are correlated.
charts for all local statistics were identical. It
Specifically, when the variancecovariance
is of interest to find the number of effectively
matrix associated with the regional values is
independent charts (say, e); in that case each
designated , the following cumulative sum
individual threshold could then be based
based on vectors of regional observations (xt )
upon e(ARL).
is optimal for detecting a change in mean
Let the regional variables be denoted by
from G to B , where these latter quantities
{yi } and the local statistic to be monitored
are vectors of regional values for the good,
by {zi }. Rogerson (2005) suggests that
in-control, and bad, out-of-control means,
a Gaussian kernel be used to define the
respectively:
neighborhood weights:
St = max(0, St1 + a (xt G ) 0.5D)

zi = wij yj
j
where: wij
wij = 3
wij2
j
(B G ) 1
a = 4 5
{(B G ) 1 (B G )}1/2 1 dij2
wij = exp
2 2 (A/m)
and:
where A is the size of the study area, and
2 where is the width of the Gaussian kernel,
D = (B G ) 1 (B G ). expressed in terms of multiples of the square
root of the average regional area. Then one
possibility is to use the following for an
Monitoring many local statistics estimate of e:
simultaneously
ow suppose that we wish to carry out surveil-
m
lance of several such local statistics simulta- e= .
1 + 0.81 2
neously. We could either keep a Raubertas-
type chart for each local statistic, or, more
generally (since it is possible to account This is based upon results reported in
for underlying spatial autocorrelation in the Rogerson (2001), who modified the work
regional values), keep a Healy-type chart for of Worsley (1996) on the use of Gaussian
each region. Consider first the special case random fields to find the probability that
specified thresholds were exceeded anywhere ACKNOWLEDGMENTS

in the study area by at least one local
statistic. The support of Grant 1R01 ES098101 from
Although this idea gives results that are the National Institutes of Health and National
similar to those found through Monte Carlo Cancer Institute Grant R01 CA9269301 is
simulation, the adjustment is based upon the gratefully acknowledged.
(static) correlation between regional local
statistics observed at a single point in
time. In practice, the cusum charts being
maintained for each regional local statistic REFERENCES
will have correlations that are not necessarily
Anselin, L. (1995). Local indicators of spatial associa-
the same as this static correlation. Any
tion LISA. Geographical Analysis, 27: 93115.
adjustments to chart thresholds should ideally
be based upon the probabilities of charts Bailey, A. and Gatrell, A. (1995). Interactive Spatial
Data Analysis. Essex: Longman (published in the U.S.
jointly signaling. Additional approaches to
by Wiley).
monitoring data from multiregional systems
include methods designed for multivari- Besag, J. and Newell, J. (1991). The detection of
clusters in rare diseases. Journal of the Royal
ate surveillance (Rogerson and Yamada,
Statistical Society Series A, 154: 143155.
2004) and monitoring regional maxima
(Rogerson, 2005). Blackman, G.E. (1935). A study by statistical methods of
the distribution of species in grassland associations.
Annals of Botany, 49: 749777.
Breslow, N. and Clayton, D.G. 1993. Approximate
inference in generalized linear mixed models. Journal
of the American Statistical Association, 88: 925.
18.4. SUMMARY
Buckeridge, D.L., Burkom, H., Campbell, M.,
Hogan, W.R., and Moore, A.W. (2005). Journal of
The prospective surveillance of geographic
Biomedical Informatics, 38: 99113.
patterns, based upon incoming streams of
spatial data, is a field that has grown Castro, M.C. and Singer, B.H. (2006). Controlling the
false discovery rate: a new application to account
rapidly in the last decade. This growth has for multiple and independent tests in local stastistics
been motivated largely through interest in of spatial association. Geographical Analysis, 38:
public health surveillance. There are also 180208.
many potential applications in other areas, Clark, P.J. and Evans, F.C. (1954). Distance to nearest
including applications to crime analysis neighbor as a measure of spatial relationships in
(where there is interest in emerging areas populations. Ecology, 35: 445453.
of criminal activity), and in marketing, Farrington, C.P. and Beale, A.D. 1998. The detection
where the spatial pattern of customers in a of outbreaks of infectious disease. In: GEOMED
competitive retailing environment could be 97, International Workshop on Geomedical Systems.
monitored. Gierl, L., Cliff, A.D., Valleron, A., Farrington, P.
This chapter has only touched upon and Bull, M. (eds.), pp. 97117. Stuttgart:
B.G. Teubner.
some of the major approaches and issues.
The reader interested in investigating the Forsberg, L., Bonetti, M., Jeffery, C., Ozonoff, A.
and Pagano, M. (2005). Distance-based methods
topic further may find the edited collec-
for spatial and spatio-temporal surveillance. In:
tion of Kleinman and Lawson (2005), and Kleinman, K. and Lawson, A.B. (eds), (2005). Spatial
the software GeoSurveillance (available at and Syndromic Surveillance, pp. 3152. New York:
wings.buffalo.edu/rogerson) of interest. Wiley.
Frisen, M. and Sonesson, C. (2005). Optimal Lucas, J. M. (1985). Counted data cusums. Technomet-
surveillance. In: Kleinman, K. and Lawson, A.B. rics, 27, 129144.
(eds), (2005). Spatial and Syndromic Surveillance,.
Mantel, N. (1967). The detection of disease clustering
pp. 3152. New York: Wiley.
and a generalized regression approach. Cancer
Gan, F.F. (1994). Design of optimal exponential CUSUM Research, 27: 209220.
control charts. Journal of Quality Technology, 26:
Moran, P.A.P. (1950). Notes on continuous stochastic
109124.
phenomena. Biometrika, 37: 1723.
Getis, A. and Ord, J. (1992). The analysis of
Moustakides, G.V. (1986). Optimal stopping-times
spatial association by use of distance statistics.
for detecting changes in distributions. Annals of
Statistics, 14: 13791387.
Hawkins, D.M. and Olwell D.H. 1998.Cumulative
Nelson, L.S. (1984). The Shewhart control chart: tests
Sum Charts and Charting for Quality Improvement.
for special causes. Journal of Quality Technology, 16:
New York: Springer.
237239.
Healy, J.D. (1987). A note on multivariate CUSUM
Ord, J. and Getis, A. (1995). Local spatial auto-
procedures. Technometrics, 29: 409412.
correlation statistics: distributional issues and an
Hunter, J.S. (1986). The exponentially weighted application. Geographical Analysis, 27: 286306.
moving average. Journal of Quality Technology, 18:
Page, E.S. (1954). Continuous inspection schemes.
203210.
Biometrika, 41: 100115.
Kleinman, K. and Lawson, A.B. (eds), (2005). Spatial
and Syndromic Surveillance. New York: Wiley. Raubertas, R.F. (1988). Spatial and temporal analysis
of disease occurrence for detection of clustering.
Kleinman, K., Lazarus, R. and Platt, R. (2004). Biometrics, 44: 11211129.
A generalized linear mixed models approach for
detecting incident clusters of disease: biological Raubertas, R.F. (1989). An analysis of disease
terrorism and other surveillance. American Journal surveillance data that uses the geographic locations
of. Epidemiology, 156: 217224. of the reporting units. Statistics in Medicine, 8:
267271.
Knox, G. (1964). The detection of space-time
interactions. Applied Statistics, 13: 2529. Ripley, B.D. (1976). The second-order analysis of
stationary point processes. Journal of Applied
Kulldorff, M. and Nagarwalla, N. (1994). Spatial Probability, 13: 255266.
disease clusters: detection and inference. Statistics
in Medicine, 14: 799810. Roberts, S.W. (1959). Control chart tests based
on geometric moving averages. Technometrics, 1:
Kulldorff, M. (2001). Prospective time-periodic geo- 239250.
graphical disease surveillance using a scan statistic.
Journal of the Royal Statistical Society Series A, Roberts, S.W. (1966). A comparison of some control
164: 6172. chart procedures. Technometrics, 8: 411430.
Lawson, A. 2001. Statistical methods in spatial Rogerson, P. (1997). Surveillance methods for monitor-
epidemiology. New York: Wiley. ing the development of spatial patterns. Statistics in
Medicine, 16: 20812093.
Lawson, A.B. (2005). Advanced modeling for
surveillance: clustering of relative risk changes. Rogerson, P. (2001). A statistical method for the
In: Kleinman, K. and Lawson, A.B. (eds), (2005). detection of geographic clustering. Geographical
Spatial and Syndromic Surveillance, pp. 3152. Analysis, 33: 215227.
New York: Wiley.
Rogerson, P. (2005). Spatial surveillance and cumula-
Le Strat, Y. (2005). Overview of temporal surveillance. tive sum methods. In: Kleinman, K. and Lawson, A.
In: Kleinman, K. and Lawson, A.B. (eds), Spatial and (eds), Spatial and Syndromic Surveillance for Public
Syndromic Surveillance, pp. 1329. New York: Wiley. Health, pp. 95114. New York: Wiley.
Lucas, J.M. and Saccucci, M.S. (1990). Exponentially Rogerson, P. (2006). Formulas for the design of CUSUM
weighted moving average control schemes: proper- quality control charts. Communications in Statistics
ties and enhancements. Technometrics, 32: 112. Theory and Methods, 35: 373383.
Rogerson, P. and Yamada, I. (2004). Approaches to Tango, T. (1995). A class of tests for detecting general
syndromic surveillance when data consist of small and focused clustering of rare diseases. Statistics in
regional counts. Morbidity and Mortality Weekly Medicine, 7: 649660.
Report, 53 (Supplement): 7985.
Waller, L. (2006). Detection of clustering in spatial
Rogerson, P. and Yamada, I. (2004). Monitoring change data. Handbook of Spatial Analysis. London: Sage
in spatial patterns of disease: comparing univari- Publications.
ate and multivariate cumulative sum approaches. Waller, L. and Gotway, C. (2004). Applied Spatial
Statistics in Medicine, 23: 21952214. Statistics for Public Health Data. New York: Wiley.
Rossi, G., Lampugnani, L., and Marchi, M. (1999). An Wolter, C. (1987). Monitoring intervals between rare
approximate CUSUM procedure for surveillance of events: a cumulative score procedure compared with
health events. Statistics in Medicine, 18: 21112122. Rina Chens sets technique. Methods of Information
Shiryaev, A.N. (1963). On optimum methods in quickest in Medicine, 26: 215219.
detection problems. Theory of Probability and its Woodall, W.H. (2006). The use of control charts in
Applications, 8: 2246. health-care and public health.
Skellam, J.G. (1952). Studies in statistical ecology. Worsley, K.J. (1996). The geometry of random images.
I. Spatial pattern. Biometrika, 39: 346362. Chance, 9 (1): 2740.
19
Case-Control Clustering for
Mobile Populations
Geoffrey M. Jacquez and Jaymie R. Meliker
The effect of [human] mobility could be a when assessing case-clustering often do not
timespace lag between causes and effects that adequately account for known risk factors
makes conventional mapping spurious.
A. Shaerstrom (2003) (e.g., smoking), covariates (e.g., age, gender,
race, education, etc.) and the spacetime
lag between exposure and disease. This
chapter is based closely on two previous
19.1. INTRODUCTION papers published by our research group
(Jacquez et al., 2005, 2006). It provides
Traditionally, geographic clustering tech- background on human mobility and its
niques have concerned themselves with implications in disease clustering, and then
static spatial distributions in which human offers an approach for analyzing case-
mobility is ignored. For example, within the control data for mobile individuals that
case-control framework, place-of-residence addresses latency and incorporates covariates
at time of diagnosis or death is often and other risk factors in the analysis.
analyzed even though there may be a Called Q-statistics, this approach is used for
substantial space time lag or latency between analyzing clustering in case-control data for
timing of causative exposures and disease mobile individuals. An example analysis of
diagnosis. The few techniques currently bladder cancer in southeastern Michigan is
available for accounting for human mobility presented within an inductive framework in
which the plausible explanatory hypotheses

19.1.1. A historical perspective on
are first enumerated and then systematically
human mobility
evaluated. We demonstrate that clustering
of residential histories of bladder cancer Recent generations have seen an expo-
cases is only partially explained by smok- nential increase in human mobility (Cliff
ing, age, gender, race, and education. We and Haggett, 2003) and a global shift in
also identify clusters of unexplained risk the population distribution such that cities
(focused clusters) surrounding the business and developing countries are growing the
address histories of 22 industries whose fastest. Geographical space has collapsed,
reported emissions and/or business processes and travel times have fallen exponentially
release known or suspected bladder cancer from the 1800s to the present (Davies,
carcinogens. The methods developed and 1964). Improved transport and population
demonstrated in this chapter provide a sys- growth have contributed to changing travel
tematic approach for evaluating increasingly patterns, as exemplified by Figure 19.1
realistic alternative hypotheses regarding the which illustrates the increasing size and
identification and explanation of clusters in complexity of travel networks over four
case-control data. male generations of the same family
Market Corby 5
Harborough 1 Beds
2 Bucks 3
3 Cambs 6 1
4 Herts
5 Leics 2 4
6 Northants
Kettering 7 Greater London
7
Life time tracks
10 km
Area shown in A
(a)
100 km
(b)
10000 km
1000 km
(c) (d)
Figure 19.1 Exponential increase in lifetime distances traveled over generations of males
from great grandfather (A), grandfather (B), father (C), and son (D). From Bradley (1988),
with kind permission of Dr. Bradley and Springer Science and Business Media.
CASE-CONTROL CLUSTERING FOR MOBILE POPULATIONS 357
(Bradley, 1988). The lifetime travel-track of 19.1.2. Background on residential

the great-grandfather remained within 40 km mobility in environmental
of a village in Northamptonshire, whereas health studies
the grandfather ranged throughout southern
In recent years residential mobility has
England as far as 400 km. The father traveled
increasingly been incorporated in exposure
throughout Europe to a scale of 4000 km and
assessment. Exposure reconstruction often
the son was a global traveler, reaching a
involves assessment of proximity of indi-
scale of 40,000 km. Although this is only one
vidual place-of-residence to environmental
illustrative example, it demonstrates what
hazards such as super-fund sites, incinerators,
is commonly accepted, that people traveled
and hazardous waste sites. In these instances
much greater distances at the turn of the
Geographic Information Systems (GIS) have
21st century, on average, than they did at the
been used to reconstruct individual-level
turn of the 20th century.
exposures to environmental contaminants
In addition to travel mobility, other
(Beyea and Hatch, 1999; Nuckols et al.,
aspects of human mobility include short-
2004; Ward et al., 2000). Examples include
term daily mobility (e.g., commuting to
assessments of proximity of individuals to
work and running errands) and long-term
landfills (OLeary et al., 2004), hazardous
mobility (e.g., housing mobility and choice of
waste sites (Elliott et al., 2001; McNamee
location) (Scheiner and Kaspar, 2003). U.S.
and Dolk, 2001), and farms for assessing
population-based surveys estimate that adults
exposures to pesticide application (Reynolds
spend 87% of their day indoors, 69% in their
et al., 2005). Perhaps because of the emphasis
place of residence, and 6% in a vehicle in
on the individual, exposure reconstruction
transit (Collia et al., 2003; Klepeis et al.,
has concerned itself both with human mobil-
2001; Reuscher et al., 2002). Residential
ity as well as with temporally dynamic
mobility histories compared across several
environmental contaminants for which con-
countries in the early 1980s found nearly 13
centrations may change through time. Res-
moves per person over a lifetime in New
idential histories and changes through time
Zealand, 11 in the U.S., 7 in Great Britain,
in the concentrations of environmental con-
6 in Japan, 5 in Belgium, and 4 in Ireland
taminants have been addressed in studies
(Long, 1992). Individuals in their early 20s
of air pollutants (Bellander et al., 2001;
in New Zealand will, on average, have
Bonner et al., 2005; Nyberg et al., 2000),
experienced as many moves as a resident
drinking water contaminants (Swartz et al.,
of Ireland over a lifetime. Approximately
2003), pesticides (Aschengrau et al., 1996;
5070% of the moves occur within localities
Brody et al., 2002) and herbicides (Stellman
(e.g., counties), 2035% between localities,
et al., 2003).
1015% between regions (e.g., states or
provinces), and 010% between countries
(Long, 1992). While the median distance
moved was just 3 km in Great Britain and
19.1.3. Unrealistic assumptions of
10 km in the US, 1720% of the moves
disease clustering
were between regions or between coun-
tries, demonstrating considerable mobility Only recently has the role of human mobility
for a large segment of the population. The and temporally varying exposures been
challenge thus is to incorporate residential addressed within the context of disease
and other forms of human mobility into clustering. That risk of disease may vary from
environmental health investigations. one geographic sub-population to another,
and that this risk is time-dependent, is a as individuals move throughout their days
fact for almost all human diseases, including and lives. In the context of human health
infectious as well as chronic diseases such studies these have been called geospatial
as cancer. Goodchild (2000) referred to lifelines, and their mathematical representa-
the failure to appropriately represent the tion, properties, and means of analysis have
time dimension as a static world-view. become important research topics. Sinha
To date, many disease clustering methods and Mark (2005) employed a Minkowski
have been based on a static world-view in metric to quantify the dissimilarity between
which individuals are considered immobile, the geospatial lifelines of cases and con-
migration between populations does not trols, and suggested that their technique
occur, and in which background disease could be used to evaluate differences in
risks under the null hypothesis are assumed exposure histories between the case and
to be time-invariant and uniform through control populations. The Minkowski metric
geographic space. As a result, many of the provides a global measure of dissimilarity
applications in the published literature suffer between cases and controls, but does not
from violations of fundamental assumptions identify where or when these dissimilari-
that are inherently unrealistic (Jacquez, ties occur. Using k-function analysis, Han
2004). et al. (2004) evaluated clustering of breast
cancer in two New York state counties and
detected significant spatial clustering at the
global level. Their approach incorporated
19.2. CONSEQUENCES OF THE knowledge of residential locations of both
STATIC WORLD VIEW IN cases and controls at biologically relevant
DISEASE CLUSTERING ages in a womans life, namely at birth,
menarche, and at womans first birth. The
When analyzing chronic diseases such as k-function was applied to the spatial pattern
cancer, causative exposures may occur over described by place of residence at specific
a long time period, and the disease may time slices in the participants lives. Sabel
be manifested only after a lengthy latency and colleagues (Sabel et al., 2000, 2003) used
period. During this latency period individuals residential histories to analyze clustering of
may move from one place of residence to cases of motor neurone disease in Finland.
another. This can make it difficult to detect They calculated risk surfaces using kernel
clustering of cases in relation to the spatial functions that were weighted by duration at
distribution of their causative exposures. Yet specific locations of residence. This approach
the static spatial point distribution is the point thus used the residential history information
of departure for many clustering approaches, more fully, but ignored the temporal ordering
including Turnbulls test (Turnbull et al., of place of residence.
1990), and tests suggested by Cuzick and Jacquez and colleagues (2005) developed
Edwards (1990), Besag and Newell (1991), global, local and focused versions of so-
the Bernoulli form of the scan test (Kulldorff called Q-statistics that evaluate clustering
and Nagarwalla, 1995), Tango (1995), and a in residential histories using case-control
host of others. Especially for chronic diseases data. Their approach is based on a space
with long latencies, human mobility must be time representation that is consistent with
accounted for. Hagerstrands model of spacetime paths,
Hagerstand (1970) developed conceptual and evaluates local, global, and focused
models of the spacetime paths formed clustering of the residential histories of the
cases relative to the residential histories and biologically active compounds are of
of the controls. One of the benefits of anthropogenic origin (e.g., PCBs) and were
the different versions of the Q-statistics not present in the environment in prior
is their ability to quantify what is hap- generations.
pening at the local, spatial, and temporal As noted earlier, the majority of cluster
scales that are of relevance to individuals, methods assume a static geography and work
while also providing global statistics for with static spatial point patterns (instead of
evaluating aggregations of cases. But their location histories) to represent cases and
approach did not incorporate explicit models controls. The spatial coordinate employed
of disease latency, nor did it account may be the place of residence of cases
for those times in a persons life when at time of diagnosis, death, hospitalization,
they might be most susceptible to specific or whatever health-related event is being
exposures. studied. But clustering of cases at time of
diagnosis or death is often of little scientific
or practical interest in terms of enhancing
our understanding of healthenvironment
19.3. A HISTORICAL PERSPECTIVE relationships. Of greater import is whether
ON LATENCY MODELS there is clustering in the locations where
the causative exposures occurred, but this
It seems a truism to observe that people question cannot be adequately addressed
are mobile, the environment varies through by techniques that employ a static world-
time, and that populations grow and their view because those approaches implicitly
composition changes, thereby complicating assume the duration between exposure and
the adjustment for covariates. We therefore the date of the health related event (e.g.,
need to understand the contributions to diagnosis, death) to be negligible. When
individual exposure that transpire at home, exploring spacetime interaction whether
at work, and while commuting. Substantial nearby cases tend to occur at about the
disease latencies may need to be accounted same time the Knox test (Knox, 1964)
for, and an individuals susceptibility to employs critical time and space distances
disease and to environmental insults may that may be specified to reflect a latency
vary with age. Metabolic responses may period and the average distance individuals
be non-linear and synergistic, and observed might move during this period. But to
impacts of current exposures may be medi- date and to our knowledge none of the
ated by past exposures. Enzymes involved available tests for geographic clustering take
in metabolism may be inducible, such as into account disease latency for location
the example of alcohol dehydrogenase and histories. Methods for addressing this need
alcohol metabolism. In addition, exposures are proposed later in this chapter.
are temporally dynamic, may be episodic For purposes of this chapter we make
or cyclic, and can occur on time scales a distinction between the evolution of risk
including days, weeks, years, decades, and through time of a known exposure (e.g., when
potentially over the entire life-course. For the exposure began, ended, the mid-point, as
example, in summer, air pollutants may vary well as changes in the exposure level through
over the course of day; while concentrations time) and the definition of a time window
of naturally-occurring metals in groundwa- within which an unknown exposure might
ter may be relatively static over months have occurred that plausibly could explain
and even years. And certain carcinogens a known disease outcome (what we refer to
in this article as the exposure window). Let how can exposures during the life course be
us now consider approaches that have been accounted for when modeling latency and
used for modeling latency. exposure windows? Robins and Greenland
Langholz et al. (1999) observed that (1991) showed that in cohort analyses, years
effects of latency as described in the epi- of life lost (YLL) due to early exposures
demiological literature are largely insufficient cannot be estimated without bias in the
for addressing questions related to public absence of causal models for how exposure
health. They proposed latency models using causes death. Morfeld (2004) demonstrated
bilinear and exponential decay functions, this result analytically, resulting in a pro-
and fitted these models to case-control posed framework for formulating such causal
data within a likelihood framework. Their models (e.g., Robins G-estimation procedure
working definition of latency is the function (Robins, 1997; Rothmann and Greenland,
describing how the relative risk associ- 1998) that can be used to estimate the latency
ated with a known exposure changes through between exposure and death). Of course any
time. So, for example, in their analysis of results from an exploratory analysis with
lung cancer in a cohort of uranium miners no a priori hypothesis would need to be
they found that . . . relative risk associated verified with another study. A model that
with exposure increases for about 8.5 years links exposures and latency periods to the
and thereafter decreases until it reaches health outcomes thus appears to be required
background levels after about 34 years. in order to evaluate alternative specifications
As for most latency models of occupational of exposure windows, an important result that
studies, Langholzs metric was calculated we will refer to later in this chapter.
for a known exposure for example, the For purposes of clustering, the putative
period of employment. For purposes of exposure is often unknown, and we therefore
clustering we are interested in determining must be able to handle uncertainty in
whether the residential histories of cases exposure windows. Later in this chapter
clustered during those times when causative we define approaches for explicitly model-
exposures plausibly might have occurred, ing exposure windows, and for specifying
but we do not necessarily know what those sampling distributions for exposure win-
exposures might be. We thus wish to use our dows. These can then be used to evaluate
admittedly inadequate knowledge of cancer the sensitivity of the cluster statistics to
latency to define exposure windows that alternative specifications of and uncertainty
bracket those time periods within which an in the exposure windows. But in general,
environmental exposure might be associated the latency model employed should be
with an observed cancer. This could indicate, specified to correspond to some a priori
for example, those times in a persons life hypothesis regarding disease causation a
when exposures (should they occur) are most causal model.
likely to result in a cancer at some later date.
This is an important distinction that, as noted
above, must be kept in mind for the remainder
of this chapter. 19.4. AGE-DEPENDENT MODEL OF
Exposures early in life and over an DISEASE LATENCY AND
individuals life course may be important EXPOSURE WINDOWS
risk factors for the onset of chronic diseases
such as cancer (Barker, 1992; Han et al., Detailed specification of a latency model
2004; Kuh and Ben-Shlomo, 1997). But requires a causal model of how disease results
in death. At this writing our knowledge of exposure might have occurred and given rise
the causes of most cancers is incomplete to the observed cancer that time interval
and in almost all instances is insufficient from E0 (Ad ) and E1 (Ad ).
to fully specify such a model. But in For the purposes of this chapter we will
order to tackle this problem it first is assume the age at which latency begins
necessary to develop an understanding of is the age at which the exposure window
the information the construction of such ends (E1 (Ad )) although this does not have
a model might require. We therefore now to be the case and the modeling approach
consider how one might construct and then (below) is readily adapted to instances in
employ a model of disease latency within which the end of the exposure window
the framework of Q-statistics, using a simple is not the same as the beginning of the
and necessarily unrealistic age-dependent latency period. We would like to model the
function as our point of departure. As more exposure window and latency as functions
realistic models of disease causation are of the age at diagnosis, Ad . The duration
developed they may be radically different in of the exposure window is therefore age
form and will replace what we acknowledge dependent and we now write E(Ad ) =
is a simplistic first step. But for now and E1 (Ad ) E0 (Ad ), and the duration of the
for convenience define the latency L(Ad ) latency period is L(Ad ) = Ad E1 (Ad ).
as the duration between the age of the For our purposes we wish to construct
participant at the time of onset of the a model of E(Ad ) + L(Ad ) so that
condition, E1 (Ad ) (age of the participant the duration of the latency period and
at that date when the participant has the exposure window becomes shorter as the
beginnings of a cancer, yet to be diagnosed) age at diagnosis decreases, since we wish
and the age at diagnosis, Ad (Figure 19.2). to avoid implausible situations such as
Further, suppose the exposure window the causative exposures occurring after the age
time in an individuals life course when at diagnosis. Notice, however, that the
he or she is biologically vulnerable should model can be specified in a manner that
an exposure occur commences at age would allow maternal exposures prior to
E0 (Ad ) and ends at age E1 (Ad ). Recall conception. We would also like the model
the distinction made in the Introduction to allow in utero exposures occurring after
regarding exposure windows and an actual conception. To accomplish these objectives
exposure. The exposure window is simply we employ a modified form of the logistic
that time in a persons life when a causative equation initially attributed to Verhulst (1838,
1845). Define the variable g at a given age of
diagnosis to be:
E(Ad ) L(Ad )
E(Ad ) + L(Ad )
g(Ad ) = . (19.1)
t max(E(Ad ) + L(Ad ))
E 0(Ad ) E 1(Ad ) Ad
Figure 19.2 Schematic of a model of

age-dependent exposure windows This is the duration between the beginning of
beginning at E0 (Ad ), ending at E1 (Ad ), and the exposure window to the age at diagnosis,
followed by latency L(Ad ), with latency scaled to the range 0 . . . 1, by dividing
ending at diagnosis at age Ad . by the maximum of that duration over all
ages considered. Now define the parameter for which the exposure and its timing
g0 to be: are known (or at least presumed known,
being related for example to employment
dates), as well as the date of diagnosis or
min(E(Ad ) + L(Ad ))
g0 = . (19.2) death. Since in our case the exposures are not
max(E(Ad ) + L(Ad )) observable we require a sampling distribution
for exposure windows that will allow us to
This is the smallest possible value of g(Ad ). assess the sensitivity of any observed case
The model of the latency and exposure clustering to uncertainty in that exposure
window as a function of age is then: window.
We will accomplish this by modeling
exposure windows for an individual with
1 a given age at diagnosis as the waiting
g(Ad ) = . (19.3)
1 time from the beginning of the exposure
1+ 1 erAd
g0 window (E0 (Ad )) to the end of that exposure
window (E1 (Ad )). Our approach will be to
find the duration of the exposure window for
Here r is a parameter describing the rate
individuals of a given age using the function
of increase of g(Ad ) as a function of age
in equation (19.3) and solving for E(Ad )
at diagnosis, with positive values indicating
in equation (19.1). We then obtain individual
that the time period between the onset
realizations of that exposure window by
of the causative exposure and the end of
sampling from a distribution of waiting times.
the latency period increases as the age at
Suppose we define events as being the
diagnosis increases (Figure 19.3). Hence
beginning and end of an exposure window,
equation (19.3) is how we model g(Ad ) and
and that these events are separated by a
equation (19.1) is the relationship between
waiting time E(Ad ). Assume E0 (Ad ) and
g(Ad ) and the latency and exposure windows
E1 (Ad ) are Poisson distributed and that the
at a given age of diagnosis.
Poisson process has intensity l. For a given
waiting time we can estimate the intensity
of the Poisson process adjusting for edge
19.5. SAMPLING DISTRIBUTIONS effects as:
FOR EXPOSURE WINDOWS
2
l = . (19.4)
With an age-dependent model of the latency E(Ad ) + 1
and exposure windows defined we now
concern ourselves with models of their
Or when ignoring edge effects as:
uncertainty. Recall that exposure windows
represent that time interval within which a
causative environmental exposure plausibly 1
l = . (19.5)
could have occurred. Notice that we observe E(Ad )
the cancer outcome (e.g., date of diagnosis)
but do not know whether the cancer was in The cumulative distribution of E(Ad ) is
fact caused by an environmental exposure, then estimated by:
nor what the exposure might actually be.
This is in contrast to models of latency
that were summarized in the Introduction, D(E(Ad )) = 1 elE(Ad ) (19.6)
25 95
20 75
Age at event
15
E + L
55
10 35
5 15
0 5
0 50 100 0 50 100
Age at diagnosis Age at diagnosis
35 95
30
75
25
Age at event
E + L
20 55
15
35
10
5 15
0 5
0 50 100 0 50 100
Age at diagnosis Age at diagnosis
Figure 19.3 Age dependent model of exposure window and latency. The sum of the
exposure window plus the latency as a function of age at diagnosis is shown in the rst
column. The second column shows the age at diagnosis (top solid line), the age at the end of
the exposure window (dashed line) and the age at the beginning of the exposure window
(bottom solid line). Top row: r = 0.05; bottom row r = 0.125. Minimum latency is 0.375
years, maximum latency is 15 years. Minimum exposure window is 0.375 years, maximum
exposure window is 15 years.
And the probability of E(Ad ) is given by: 19.6. THE DETECTION OF

CLUSTERING IN RESIDENTIAL
HISTORIES
P(E(Ad )) = lelE(Ad ) . (19.7)
In this section we first review Q-statistics.
We then define exposure traces that are
Having defined exposure windows and their the geographic projection of exposure win-
uncertainty we now turn to cluster statistics dows and extend the Q-statistics to provide
that account for human mobility. global, local, and focused tests that account
for risk factors, covariates, and disease relationships, and define a nearest neighbor
latency. We then describe an experimental indicator to be:
data set for bladder cancer in southeastern
Michigan, and apply some of these new

methods to this dataset to illustrate the 1 if and only if j is a k nearest
approach. i,j,k,t = neighbor of i at time t

Jacquez et al. (2005, 2006) developed 0 otherwise
global, local, and focused tests for case- (19.10)
control clustering of residential histories
for use with chronic diseases such as
cancer and that account for covariates We then can define a binary matrix of kth
and other risk factors such as smoking. nearest neighbor relationships at a given
Readers unfamiliar with Q-statistics may time t as:
wish to refer to the original works. We
now briefly present these techniques and
k,t =
then extend them to account for exposure
windows.
Define the coordinate ui,t = {xi,t , yi,t } 0 1,2,k,t 1,N,k,t

to indicate the geographic location of the 2,1,k,t 0

.
ith case or control at time t. Residential
histories can then be represented as the set N1,N,k,t
of spacetime locations: N,1,k,t N,N1,k,t 0
(19.11)
Ri = {ui0 , ui1 , . . ., uiT }. (19.8)
This matrix enumerates the k nearest neigh-
bors (indicated by a 1) for each of the N
This defines individual i at location ui0 at the
individuals. The entries of this matrix are 1
beginning of the study (time 0), and moving
(indicating that j is a k nearest neighbor of
to location ui1 at time t = 1. At the end
i at time t) or 0 (indicating j is not a k nearest
of the study individual i may be found at
neighbor of i at time t). It may be asymmetric
uiT . T is defined to be the number of unique
about the 0 diagonal since nearest neighbor
location observations on all individuals in
relationships are not necessarily reflexive.
the study. We now define a case-control
Since two individuals cannot occupy the
identifier, ci , to be:
same location, we assume at any time t
, that any individual has k unique k-nearest
1 if and only if i is a case neighbors. The row sums thus are equal to
ci = (19.9) k(i,,k,t = k) although the column sums vary
0 otherwise.
depending on the spatial distribution of case
control locations at time t. The sum of all the
Define na to be the number of cases elements in the matrix is Nk.
and nb to be the number of controls. Alternative specifications of the proximity
The total number of individuals in the metric may be used the metrics do not
study is then N = na + nb . Let k indicate have to be nearest neighbor relationships in
the number of nearest neighbors to con- order for the Q-statistics to work. We prefer
sider when evaluating nearest neighbor to use nearest neighbor relationships because
they are invariant under changing population significance of the above statistics. This is
densities, unlike geographic distance and accomplished by holding the location histo-
adjacency measures. There is also some ries for the cases and controls constant, and
evidence that nearest neighbor metrics are by then sprinkling the case-control identifiers
more powerful than distance- and adjacency- at random over the residential histories. This
based measures (Jacquez and Waller, 1997). corresponds to a null hypothesis where the
Still, one then may be faced with the probability of an individual being declared a
question of how many nearest neighbors case (ci = 1) is proportional to the number
(k) should I consider? In certain instances of cases in the data set, or:
one may have prior information that suggests
that clusters of a certain size should be
n1
expected, and this can serve as a guide to p(ci = 1|H0,I ) = . (19.13)
n0 + n 1
specification of k. When prior information
is lacking one may wish to explore several
levels of k. In these instances Tango (2000, Here n1 is the number of cases and n0 is
2006) advocates using the minimum p-value the number of controls, and H0,I indicates a
obtained under each level of k considered null hypothesis corresponding to Goovaerts
as the test statistic. Jacquez et al. (2006) and Jacquezs (2004) type I neutral model
evaluated different levels of k to determine of spatial independence. This null hypothesis
sensitivity of the results to specification of k. assumes the risk of being declared a case
Each of these approaches has advantages and is the same over all of the N case and
may be preferred in different situations. controls. When covariates and risk factors
There exists a 1 T + 1 vector denoting are quantified we may wish to incorporate
those instants in time when the system is that information into the null hypothesis. Any
observed and the locations of the individuals case-clustering that is found then will be
are recorded. We can then consider the above and beyond the modeled risk factors
sequence of T nearest neighbor matrices and covariates, and will thus indicate the
defined by: possible presence of risk sources beyond
those specified under this null hypothesis.
Tk = {k,t t = 0. . .T }. (19.12)
19.7.1. Logistic model of the

This defines the sequence of k nearest probability of being a case
neighbor matrices for each unique temporal
observation recorded in the data set, and thus In order to provide a more realistic model
quantifies how spatial proximity among the of the risk of being a case, we must make
N individuals changes through time. the probability of being declared a case a
function of the covariates and risk factors
one wishes to incorporate under the null
hypothesis. We will accomplish this task
19.7. ADJUSTING FOR COVARIATES using logistic regression. Logistic models
AND OTHER RISK FACTORS are used for binary response variables. Let
x denote the vector of covariates and risk
In the absence of knowledge of covariates factors. Further, let p = Pr(c = 1|x) denote
and other risk factors simple randomization the response probability to be modeled,
may be used when evaluating the statistical which is the probability of person i being a
case given that persons vector of covariates Step 2. Sprinkle the case-control identier ci over
and risk factors. The linear logistic model the residential histories of the participants
may then be written as: in a manner consistent with the desired
null hypothesis, and conditioned on the
observed number of cases. Assume we
have n1 cases, N participants and that Pi is
logit (p) = log (p/1 p) = + x the probability of the i th participant being
(19.14) a case. Notice the Pi are provided by the
logistic equation.
and the equation for predicting the prob-

Step 2.1. Rescale the Pi as follows:
ability of being a case given the vector
N
of covariates and risk factors for the ith Pi = Pi / Pi .
i =1
individual is: Step 2.2. Map the Pi to the interval 0 . . .
1. For example, assume we have
N = 2 participants, n1 = 1
e+ xi case and that P1 = 0.7 and
p(ci = 1|xi ) = . (19.15)
1 + e+ xi

P2 = 0.8. P1 then maps to the
interval [0 . . . 0.7/1.5) and P2
maps to the interval [0.7/1.5 . . .
Here the logit function is the natural log 1.5/1.5).
of the odds, is the intercept parameter, Step 2.3. Allocate a case by drawing a
and is the vector of regression (slope) uniform random number from
the range [0 . . . 1). Set
coefficients. One then fits the regression
the case identier equal to 1
model to the vector of covariates and risk (ci = 1) where i is the identi-
factors to calculate the intercept and slope er corresponding to the study
parameters. These are then used to calculate, participant whose interval for Pi
for each individual, the probability of being a contains the random number.
Step 2.4. Rescale as shown in Step 2.1
case given that individuals known covariates
but not including the probabil-
and risk factors. ity for the participant whose
case identier was assigned in
Step 2.3.
Step 2.5. Repeat Steps 2.22.4 until all
19.7.2. Randomization accounting
of the n1 case identiers are
for risk factors and assigned.
covariates Step 2.6. Set the remaining N n1 case
identiers to 0, these are the
We use approximate randomization to eval-
controls.
uate the probability of a given Q-statistic
under the null hypothesis that the likelihood
Notice steps 2.12.6 result in one real-
of being a case is a function of the ization of the distribution of case-control
covariates and risk factors specified under the identiers.
logistic model in equation (9.14). This null Step 3. Calculate Q for the realization from Step 2.
hypothesis thus effectively accounts for the Step 4. Repeat Steps 2 and 3 a specied number
of times (e.g., 999) accumulating the
risk factors and covariates in the vector x. To
reference distribution of Q.
evaluate the reference distribution for a given Step 5. Compare Q to this reference distribution
Q-statistic we follow these steps. to evaluate the statistical probability of
observing Q under the null hypothesis
Step 1. Calculate statistic (Q ) for the observed that accounts for the known risk factors
data. and covariates.
19.8. LOCAL AND FOCUSED are measured relative to age at diagnosis

CLUSTERING OF EXPOSURE to an absolute time representation using,
TRACES for example, the Gregorian calendar. Hence
equation (9.16) denotes that portion of case
Exposure traces are defined as the residential is residential history in which s/he was in
mobility that transpires for an individual dur- the exposure window. Call this the exposure
ing the exposure window, E(Adi ). Notice trace. As noted earlier in the Introduction,
we are now subscripting the age of diagnosis effective specification of exposure windows,
with the letter i to indicate the age of and hence of exposure traces, requires a
diagnosis for the ith individual. Therefore, causal model of how the exposure(s) causes
exposure traces are those portions of a cases cancer. The exposure trace for case i(REi )
residential history that were traversed while records those places where that individual
that individual was thought to be at risk resided while s/he might have experienced
to a cancer-causing exposure where they causative exposures. Now define an indicator,
were when they were in that portion of ei,t as:
their lifespan corresponding to E(Ad,i ).
This concept of an exposure trace assumes

a natural history of carcinogenesis in which 1 if and only if t is within the
the causative exposures occur, followed by ei,t = exposure trace is for individual i,

a latency period which concludes when the 0 otherwise.
cancer is diagnosed. This is easily modified
(19.17)
to fit other models of the natural history
of carcinogenesis, including other relevant
windows such as the lag between the onset When ei,t is 1, let us say the exposure trace is
of a fully developed cancer and diagnosis. active. A local case-control test for spatial
Given the residential history for case i, Ri , clustering of exposure traces at time t is then:
denote the spacetime coordinate at time of
diagnosis as ui,tD , noting that ui,tD Ri . We
can then define that subset of the residential
N
E
Qi,k,t = ci ei,t i,j,k,t cj ej,t . (19.18)
history Ri during which causative exposures j=1
might have occurred as:
This is the count, at time t, of the number

REi ={ui,t ; (ti,D L(Ad,i ) > t of k nearest neighbors of case is exposure
> (ti,D L(Ad,i ) E(Ad,i )). trace that are also cases and whose exposure
traces are also active. This statistic will be
(19.16)
large when the active exposure traces of a
group of cases cluster about case i at time t.
Here ti,D is the time of diagnosis for We can explore whether exposure traces of
individual i. The term (ti,D L(Ad,i )) is cases tend to cluster spatially about certain
the time when the exposure window ended individuals through time. A statistic sensitive
and the latency period began. The term to this pattern is:
(ti,D L(Ad,i ) E(Ad,i )) indicates the
time prior to diagnosis when the exposure
T
window began. This allows us to move E
Qi,k = E
Qi,k,t . (19.19)
between the age representation where things t=0
E will tend to be large when active

Qi,k diagnosis assigned in (1) and the model of
exposure traces for the other cases tend latency as a function of age dened earlier.
to persistently cluster around the active Notice this function could also be based on the
exposure trace of the ith case. covariates for each participant, or on the times
We can also ask whether the exposure of occurrence of a putative exposure source of
interest. Completion of (1) and (2) will result in
traces of cases cluster about specific locations
dates of diagnosis, and denition of exposure
(e.g., point source releases of carcinogens)
windows, latency periods, and exposure traces
that we refer to as a focus: for both cases and controls.

N 3 Calculate the desired test statistic for exposure
E
QF,k,t = F ,j,k,t cj ej,t . (19.20) traces, for the original (not randomized
j=1 data), Q (e.g., equation (19.20) for focused
clustering, equation (19.19) for local clustering,
etc.).
Here F, j,k,t is 1 if individual j is a k
nearest neighbor of the focus at time t, 4 Assign case-control identiers across the resi-
E
and 0 otherwise. The statistic QF,k,t is the dential histories employing the logistic model
count of the number of cases whose exposure described earlier in order to account for
traces are k nearest neighbors of the focus known risk factors and covariates. This will
at time t. Notice these statistics can also be result in a possible arrangement of cases and
controls (a realization) that accounts for the risk
duration weighted as described by Jacquez
factors and covariates. Hence any statistically
et al. (2005).
signicant clustering observed in the exposure
traces may be attributable to causes other than
the risk factors and covariates included in the
19.8.1. Statistical probability of logistic model.
exposure traces
5 For the realization from (4), calculate the
In order to evaluate whether exposure traces desired test statistic for clustering of exposure
of the cases cluster we first must derive a traces (Q).
procedure for generating representative times
of diagnosis, latency periods, and exposure 6 Repeat (4) and (5) a desired number of times
windows for the controls. Once this is to construct the reference distribution of the
accomplished we will be able to determine statistic under the null hypothesis (the null
whether the exposure traces for the cases distribution of Q).
cluster relative to those so constructed for
7 Evaluate the probability of the observed
the controls. Given the residential history of a
clustering of exposure traces under the null
control, steps involved to accomplish this are: hypothesis by comparing the value of the test
statistic for the observed data (Q ) to the
1 Set the age at diagnosis for each control to be reference distribution for Q from (6).
their age at their time of interview for the study
(notice researchers often may subtract one year
from age at time of interview, to account
for time between diagnosis and interview for 19.9. EXAMPLE: BLADDER CANCER
cases). IN SOUTHEASTERN MICHIGAN
2 Dene the exposure window and latency period A population-based bladder cancer case-control
for each case and control using the time of study is underway in southeastern Michigan.
Cases diagnosed in the years 20002004 Characteristics of 268 industries, including,

are being recruited from the Michigan but not limited to, fabric finishing, wood
State Cancer Registry. Controls are being preserving, pulp mills, industrial organic
frequency matched to cases by age (5 chemical manufacturing, and paint, rubber,
years), race, and gender, and are being and leather manufacturing, were compiled
recruited using a random digit dialing into a database. Each industry was assigned
procedure from an age-weighted list. At a start year and end year, based on best
this stage of recruitment, controls are not available data. Industries were geocoded
adequately matched; therefore, age, race, and following the same matching procedure as for
gender are included in the logistic regression residences: 89% matched to the address, 5%
model that accounts for covariates. To be were placed on the road using best informed
eligible for inclusion in the study, participants guess, and as a last resort, 6% were matched
must have lived in the eleven county study to town centroid.
area for at least the past five years and had
no prior history of cancer (with the exception
of non-melanoma skin cancer). Participants
are offered a modest financial incentive and 19.10. BLADDER CANCER:
research is approved by the University of ANALYSIS
Michigan IRB-Health Committee. The data
analyzed here are from 219 cases and 437 Jacquez et al. (2006) addressed four hypothe-
controls. As part of the study, participants ses regarding clusters of bladder cancer in
complete a written questionnaire describing southeastern Michigan:
their residential mobility. The duration of
residence and exact street address were A0: Bladder cancer cases in southeastern Michigan
obtained, otherwise the closest cross streets are not clustered.
were provided. Approximately 66% of
cases person-years and 63% of controls A1: There is global and local spacetime clustering
person-years were spent in the study area. of bladder cancer cases.
Of the residences within the study area, 88%
were automatically geocoded or interactively A2: The clusters may be explained entirely by
known risk factors (e.g., smoking) and
geocoded with minor operator assistance.
covariates.
The unmatched addresses were manually
geocoded using self-reports of cross streets
with the assistance of internet mapping This probability was then incorporated in
services (6%); if cross streets were not the randomization procedure as described
provided or could not be identified, residence earlier, resulting in a null hypothesis that
was matched to town centroid (6%). accounts for smoking, age, gender, education,
Address histories were collected for and race. Any clustering that is observed thus
those industries believed to emit con- is above and beyond any case clustering due
taminants associated with bladder cancer. to these risk factors and covariates. Increased
These were identified using the Toxics smoking is associated with higher probability
Release Inventory (EPA, 2000) and the of being a case; this risk increases with
Directory of Michigan Manufacturers. Stan- age, and is elevated for whites and females.
dard Industrial Classification (SIC) codes Bladder cancer typically afflicts older white
were adopted, but prior to SIC coding, males to a greater extent than the remainder
industrial classification titles were selected. of the population (Silverman et al., 1996).
A3: There is clustering of bladder cancer cases to address hypotheses A0 and A1. They
about industries known to emit bladder cancer then used the logistic model to adjust
carcinogens that is not explained by known risk for smoking, age, gender, education, and
factors and covariates. race in order to evaluate hypotheses
A2A3, employing the following func-
They used the global and local Q-statistics tion to evaluate the probability of being
not adjusting for covariates and risk factors a case:

2.0359 0.0125 Agei 0.9396 Genderi + 0.1900 Educatei +
0.0557 Racei 0.2438 Cignumi
e
p (ci = 1|xi ) =

2.0359 0.0125 Agei 0.9396 Genderi + 0.1900 Educatei +
0.0557 Racei 0.2438 Cignumi
1+e
(19.21)
Here females experience a higher risk three counties. However, whether these
because controls are in the process of being clusters may be explained by smoking and the
frequency matched to cases in the ongoing covariates age, gender, race, and education
study, and in this dataset, a greater proportion remained to be evaluated.
of cases are females than controls. In this Next, the researchers evaluated hypothesis
chapter, results are presented for k = 7 A2: The clusters may be explained by
nearest neighbors. Results for additional known risk factors and covariates. To
nearest neighbors are discussed in Jacquez accomplish this they incorporated the prob-
et al. (2006). abilities calculated from the logistic model
The first hypothesis A0: Bladder cancer in equation (19.21) into the randomization
cases in southeastern Michigan are not procedure as described in section 19.7.2.
clustered was evaluated without correcting They then recalculated the probabilities of
for the known risk factors and covariates. the global Q statistic used to evaluate A0.
The Global Q statistic was 1.198437 and Because the geometry of the residential
was significant (p = 0.001), and hypothesis histories doesnt change, the values of the
A0 was rejected. Next, hypothesis A1: statistic were unchanged. After adjustment
There is spacetime clustering of bladder for smoking and covariates the P value
cancer cases in southeastern Michigan was slightly increased to 0.003 from 0.001
evaluated using the spatial and temporally before adjustment. Hypothesis A2 was not
local Q-statistics of equations (19.10) and accepted, and the authors concluded the
(19.12) in Jacquez et al. (2005). This global case clustering of residential histories
effectively decomposed the observed global was not sufficiently explained by smoking
clustering into local contributions. Persistent and the covariates. Significant local clus-
case clusters were found in Oakland, Ingham, tering also remained, and was persistent
and Jackson counties. Hypothesis A1 was through time. In all, 26 local clusters were
accepted and Jacquez et al. (2006) concluded significant after covariate adjustment. They
there is persistent case clustering in these were found in Lapeer, Ingham, Oakland, and
Jackson counties. The clusters in Lapeer key covariates. Until then, we cannot rule
and Jackson counties were comprised of out occupational exposures in explaining the
13 cluster centers, and are ephemeral. focused clustering around certain industries.
The clusters in northwestern Ingham county In the interest of public health, however, it
appeared in 1950, concentrated to the north- is worth exploring those facilities with the
west of Lansing and persisted into 2000. most extreme p-values to single out those
Numerous clusters appeared in central and that consistently are at the center of a cluster
southeastern Oakland county beginning in of cases. Once identified, additional epidemi-
the 1950s and persisted to the present day. ological investigation may be warranted to
The authors suggested that the grouping uncover a biologically plausible exposure,
of these local case clusters into two areas and to determine whether individuals in the
and their persistence through time might vicinity of the operation actually demonstrate
indicate the possible action of a causal agent a body burden for the suspected carcinogen.
or an unknown covariate. They therefore
explored hypothesis A3: There is clustering
of bladder cancer cases about industries
known to emit bladder cancer carcinogens 19.11. DISCUSSION AND FUTURE
that is not explained by known risk factors DIRECTIONS
and covariates. Bladder cancers have a
multiplicity of possible causative exposures. The case-control epidemiological study
Using a database of 268 industries that design provides a wealth of information at
emitted known or suspected bladder cancer the individual level regarding exposures,
carcinogens, they analyzed case clustering risks, risk modifiers, and covariates. When
of residential histories about these indus- designing such a study the researcher often
tries both with and without adjustment for is concerned with assessing a few putative
smoking and the four covariates. The global exposures, and in determining whether there
version of the focused test was significant at are significant differences in these exposures
the 0.015 level before covariate adjustment between the case and control populations.
and remained significant ( p = 0.035) after As such, the case-control design is not
the covariates and smoking were accounted inherently spatial, nor is it particularly well
for. Considering the 268 business address his- suited or even capable of assessing risk
tories one at a time, the researchers found 22 factors other than those specified in the
industries that were significant cluster foci, original design.
located in Oakland (19 clusters), Ingham (2), The approaches described in this chapter
and Jackson (1) counties. Clusters in central may prove to be a highly useful addition
and southeastern Oakland county appeared in to the traditional aspatial case-control design
the 1930s and persisted to the present day. because they allow researchers to identify
The prospect of environmental pollution local groups of individuals whose risk
originating from these facilities being asso- exceeds that accounted for by the known risk
ciated with bladder cancer is intriguing; factors and covariates incorporated under the
however, caution is necessary until the study designed study. Efforts in developing causal
is complete. Occupational histories are being models for latency and exposure timing are
collected and will be incorporated as risk evolving, and the approach outlined here
factors in the logistic regression model, thus will allow researchers to incorporate these
creating a neutral model that includes smok- models into future cluster analyses that
ing and occupational exposures, along with account for human mobility. In addition,
while the application presented here uses Barker, D. (1992). Fetal and Infant Origins of Adult
residential histories, this approach may also Disease. London: BMJ Publishing.
be used to investigate disease clustering using Bellander, T., Berglind, N., Gustavsson, P., Jonson, T.,
occupational histories, or other forms of Nyberg, F., Pershagen, G. and Jarup, L. (2001).
human mobility. Using geographic information systems to assess
The ability of local and focused tests to individual historical exposure to air pollution
from trafc and house heating in Stockholm.
quantify pockets of cases whose excess risk
Environmental Health Perspectives, 109(6):
might be attributable to specific locations 633639.
or point sources is a powerful addition
Besag, J. and Newell, J. (1991). The detection of
to the inferential toolbox. While such a
clusters in rare diseases. Journal of the Royal
tool can never of itself assess the dose Statistical Society Series A, 154: 143155.
response relationship necessary to attribute
Beyea, J. and Hatch, M. (1999). Geographic exposure
risk to a specific location or point source,
modeling: a valuable extension of geographic
the ability to temporally and geographically information systems for use in environmental
localize the putative exposure source makes epidemiology. Environmental Health Perspectives,
it possible to begin the assessment of 107(suppl 1): 181190.
doseresponse relationships. Once such a Bonner, M.R., Han, D., Nie, J., Rogerson, P.,
putative focus has been identified, the next Vena, J.E., Muti, P., Trevisan, M., Edge, S.B.
step may involve techniques for modeling and Fraudenheim, J.L. (2005). Breast cancer risk
exposure that will provide a more accurate and exposure in early life to polycyclic aromatic
and detailed description of the spatial and hydrocarbons using total suspended particulates as
a proxy measure. Cancer Epidemiology Biomarkers
temporal variability in exposure. And once a and Prevention, 14(1): 5360.
specific point source is identified, the task of
quantifying the type and quantity of releases Bradley, D. (1988). The scope of travel medicine. In:
First Conference on International Travel Medicine,
of agents that plausibly might give rise to the pp. 19. Zurich: Springer Verlag.
observed health outcome may begin.
Brody, J.G., Vorhees, D.J., Melly, S.J., Swedis, S.R.,
Drivas, P.J. and Rudel, R.A. (2002). Using GIS and
historical records to reconstruct residential exposure
to large-scale pesticide application. Journal of
ACKNOWLEDGMENTS Exposure Analysis and Environmental Epidemiology,
12(1): 6480.
This research was funded by grants Cliff, A.D. and Haggett, P. (2003). On changing
R43CA117171, R01CA096002, and contexts for epidemic modeling. In: Toubiana, L.,
R44CA092807 from the National Cancer Viboud, C., Flahault, A. and Valleron, A.-J. (eds),
Institute. The views expressed in this Geography and Health, pp. 118. Paris: Inserm.
publication are those of the researchers and Collia, D.V., Sharp, J. and Giesbrecht, L. (2003). The
do not necessarily represent that of the NCI. 2001 National Household Travel Survey: a look into
the travel patterns of older Americans. Journal of
Safety Research, 34(4): 461470.
Cuzick, J. and Edwards, R. (1990). Spatial clustering
REFERENCES for inhomogeneous populations. Journal of the Royal
Statistical Society Series B, 52: 73104.
Aschengrau, A., Ozonoff, D., Coogan, P., Vezina, R.,
Davies, R. (1964). A History of the Worlds Airlines.
Heeren, T. and Zhang, Y. (1996). Cancer risk
New York: Oxford University Press.
and residential proximity to cranberry cultivation in
Massachusetts, American Journal of Public Health, Elliott, P., Briggs, D., Morris, S., de Hoogh, C.,
86(9): 12891296. Hurt, C., Jensen, T.K., Maitland, I., Richardson, S.,
Wakeeld, J. and Jarup, L. (2001). Risk of Exposure Analysis and Environmental Epidemiology,
adverse birth outcomes in populations living near 11(3): 231252.
landll sites. British Medical Journal, 323(7309):
Knox, G. (1964). The detection of space time
363368.
interactions. Applied Statistics, 13: 2529.
EPA (2000). Toxics Release Inventory (TRI) Data Files,
Kuh, D. and Ben-Shlomo, Y. (1997). A Life Course
Environmental Protection Agency.
Approach to Chronic Disease Epidemiology: Tracing
Goodchild, M. (2000). GIS and transportation: status the Origins of Ill-health from Early to Later Life.
and challenges. GeoInformatica, 4: 127139. Oxford: Oxford University Press.
Goovaerts, P. and Jacquez, G.M. (2004). Account- Kulldorff, M. and Nagarwalla, N. (1995). Spatial
ing for regional background and population disease clusters: detection and inference. Statistics
size in the detection of spatial clusters and in Medicine, 14(8): 799810.
outliers using geostatistical ltering and spatial
Langholz, B., Thomas, D., Xiang, A. and Stram, D.
neutral models: the case of lung cancer in
(1999). Latency analysis in epidemiologic studies of
Long Island, New York. International Journal of
occupational exposures: application to the Colorado
Health Geographics, 3(1): 14.
Plateau uranium miners cohort. American Journal of
Hagerstrand, T. (1970). What about people in regional Industrial Medicine, 35(3): 246256.
science? Papers of the Regional Science Association,
Long, L. (1992). Changing residence: comparative
24: 721.
perspectives on its relationship to age, sex, and
Han, D., Rogerson, P.A., Nie, J., Bonner, M.R., marital status. Population Studies, 46: 141158.
Vena, J.E., Vito, D., Muti, P., Trevisan, M., Edge, S.B.
McNamee, R. and Dolk, H. (2001). Does exposure
and Freudenheim, J.L. (2004). Geographic clustering
to landll waste harm the fetus? Perhaps, but
of residence in early life and subsequent risk of breast
more evidence is needed. British Medical Journal,
cancer (United States). Cancer Causes and Control,
323(7309): 351352.
15(9): 921929.
Morfeld, P. (2004). Years of Life Lost due to
Jacquez, G.M. (2004). Current practices in the spatial
exposure: Causal concepts and empirical shortcom-
analysis of cancer: ies in the ointment. International
ings. Epidemiologic Perspectives and Innovation,
Journal of Health Geographics, 3(1): 22.
1(1): 5.
Jacquez, G.M., Kaufmann, A., Meliker, J., Goovaerts, P.,
Nuckols, J.R., Ward, M.H. and Jarup, L. (2004).
AvRuskin, G. and Nriagu, J. (2005). Global, local
Using geographic information systems for expo-
and focused geographic clustering for case-control
sure assessment in environmental epidemiology
data with residential histories. Environmental Health,
studies. Environmental Health Perspectives, 112(9):
4(1): 4.
10071015.
Jacquez, G.M., Meliker, J.R., AvRuskin, G.A.,
Nyberg, F., Gustavsson, P., Jarup, L., Bellander, T.,
Goovaerts, P., Kaufmann, A., Wilson, M. and Nriagu,
Berglind, N., Jakobsson, R. and Pershagen, G.
J. (2006). Case-control geographic clustering for
(2000). Urban air pollution and lung cancer in
residential histories accounting for risk factors and
Stockholm. Epidemiology, 11(5): 487495.
covariates. 5: 32 International Journal of Health
Geographics. OLeary, E.S., Vena, J.E., Freudenheim, J.L. and
Brasure, J. (2004). Pesticide exposure and risk
Jacquez, G.M. and Waller, L. (1997). The effect of
of breast cancer: a nested case-control study of
uncertain locations on disease cluster statistics.
residentially stable women living on Long Island.
In: Mowerer, H.T. and Congalton, R.G. (eds),
Environmental Research, 94(2): 134144.
Quantifying Spatial Uncertainty in Natural Resources:
Theory and Application for GIS and Remote Sensing. Reuscher, T., Schmoyer, R. and Hu, P.S. (2002).
Chelsea MI: Arbor Press. Transferability of Nationwide Personal Transporta-
tion Survey data to regional and local scales.
Klepeis, N.E., Nelson, W.C., Ott, W.R., Robinson, J.,
Transportation Research Record, 1817: 2532.
Tsang, A.M., Switzer, P., Behar, J.V., Hern, S. and
Engelmann, W. (2001). The National Human Activity Reynolds, P., Hurley, S.E., Gunier, R.B., Yerabati, S.,
Pattern Survey (NHAPS): a resource for assessing Quach, T. and Hertz, A. (2005). Residential proximity
exposure to environmental pollutants. Journal of to agricultural pesticide use and incidence of breast
cancer in California, 19881997. Environmental Stellman, J.M., Stellman, S.D., Weber, T., Tomasallo, C.,
Health Perspectives, 113(8): 9931000. Stellman, A.B. and Christian, R. (2003). A geo-
graphic information system for characterizing expo-
Robins, J. (1997). Causal inference from complex
sure to Agent Orange and other herbicides in
longitudinal data. In: Berkane, M. (ed.), Latent
Vietnam. Environmental Health Perspectives, 111(3):
Variable Modeling with Applications to Causality,
321328.
pp. 69117. New York: Springer.
Swartz, C.H., Rudel, R.A., Kachajian, J.R. and
Robins, J. and Greenland, S. (1991). Estimability
Brody, J.G. (2003). Historical reconstruction of
and estimation of expected years of life lost due
wastewater and land use impacts to groundwater
to a hazardous exposure. Statistics in Medicine,
used for public drinking water: exposure assessment
10(1): 7993.
using chemical data and GIS. Journal of Exposure
Rothmann, K. and Greenland, S. (1998). Modern Analysis and Environmental Epidemiology, 13(5):
Epidemiology. Philadelphia: Lippincott-Raven. 403416.
Sabel, C.E., Boyle, P.J., Lytnen, M., Gatrell, A.C., Tango, T. (1995). A class of tests for detecting general
Jokelainen, M., Flowerdew, R. and Maasilta, P. and focused clustering of rare diseases. Statistics in
(2003). Spatial clustering of amyotrophic lateral Medicine, 14(2122): 23232334.
sclerosis in Finland at place of birth and place of
death. American Journal of Epidemiology, 157(10): Tango, T. (2000). A test for spatial disease clustering
898905. adjusted for multiple testing. Statistics in Medicine,
19(2): 191204.
Sabel, C.E., Gatrell, A.C., Lytnen, M., Maasilta, P.
and Jokelainen, M. (2000). Modelling exposure Tango, T. (in press). A test with minimized p-value
opportunities: estimating relative risk for motor for spatial clustering applicable to case-control point
neurone disease in Finland. Social Science and data, Biometrics.
Medicine, 50(78): 11211137. Turnbull, B.W., Iwano, E.J., Burnett, W.S., Howe, H.L.
Schaerstrom, A. (2003). The potential for time and Clark, L.C. (1990). Monitoring for clusters of
geography in medical geography. In: Toubiana, L., disease: application to leukemia incidence in upstate
Viboud, C., Flahault, A. and Valleron, A.-J. (eds), New York. American Journal of Epidemiology, 132
Geography and Health, pp. 195207. Paris: Inserm. (Suppl 1): S136143.
Scheiner, J. and Kaspar, B. (2003). Lifestyles, choice Verhulst, P.F. (1838). Notice sur la loi que la population
of housing location and daily mobility: the lifestyle pursuit dans son accroissement. Correspondance
approach in the context of spatial mobility and Mathematique et Physique, 10: 113121.
planning. International Social Science Journal, 55: Verhulst, P.F. (1845). Recherches Mathematiques sur La
319332. Loi DAccroissement de la Population (Mathematical
Silverman, D., Morrison, A. and Devesa, S. Researches into the Law of Population Growth
(1996). Bladder cancer. In: Schottenfeld, D. and Increase). Nouveaux Memoires de lAcademie Royale
Fraumeni, J. Jr., (eds), Cancer Epidemiology and des Sciences et Belles-Lettres de Bruxelles, 18:
Prevention, pp. 11561179. New York: Oxford Art. 1, 145.
University Press.
Ward, M., Nuckols, J., Weigel, S., Maxwell, S.,
Sinha, G. and Mark, D. (2005). Measuring similarity Cantor, K. and Miller, R. (2000). Geographic
between geospatial lifelines in studies of environ- information systems. A new tool in environ-
mental health. Journal of Geographical Systems, mental epidemiology. Annals of Epidemiology,
7(1): 115136. 10(7): 477.
20
Neural Networks for
Spatial Data Analysis
Manfred M. Fischer
20.1. INTRODUCTION not do justice to the entire spectrum of

such models. Instead, attention is limited
The term neural network has its origins in to a particular class of neural networks
attempts to find mathematical representations that have proven to be of great practical
of information processing in the study of importance, the class of feedforward neural
natural neural systems (McCulloch and Pitts, networks.2
1943; Widrow and Hoff, 1960; Rosenblatt, The attractiveness of such networks is
1962). Indeed, the term has been used due to two features. First, they provide
very broadly to include a wide range of a very flexible framework to approximate
different model structures, many of which arbitrary nonlinear mappings from a set of
have been the subject of exaggerated claims input variables to a set of output variables
to mimic neurobiological reality.1 As rich where the form of the mapping is governed
as neural networks are, they still ignore by a number of adjustable parameters,
a host of biologically relevant features. called weights. Second, they are devices
From the perspective of applications in for nonparametric statistical inference. No
spatial data analysis, however, neurobiolog- particular structure or parametric form is
ical realism is not necessary. In contrast, assumed a priori. This is particularly useful
it would impose entirely unnecessary con- in the case of problems where solutions
straints. Thus, the focus in this chapter is require knowledge that is difficult to specify
on neural networks as efficient nonlinear a priori, but for which there are sufficient
models for spatial data analysis. We can observations.
The objective of this chapter is to provide of which Hertz et al. (1991), Ripley (1996)
an entry point and appropriate background, and Bishop (2006) appear to be most suitable
for those spatial analysts wishing to engage in for a spatial analysis audience. Readers
the field of neural networks, required to fully interested in spatial interaction or flow data
realize its potential. The chapter is organized analysis are referred to a paper by Fischer
as follows. In section 20.2 we begin by intro- and Reismann (2002b) to find a useful
ducing the functional form of feedforward methodology for neural spatial interaction
neural network models, including the specific modelling.
parameterization of the nonlinear transfer
functions. Section 20.3 proceeds to discuss
the problem of determining the network
parameters within a framework that involves 20.2. FEEDFORWARD NEURAL
the solution of a nonlinear optimization NETWORKS
problem. Because there is no hope of finding
an analytical solution to this optimization Feedforward neural networks consist of
problem, section 20.4 reviews some of the nodes (also known as processing units or
most important iterative search procedures simply units) that are organized in layers.
that utilize gradient information for solving Figure 20.1 shows a schematic diagram
the problem. This requires the evaluation of a typical feedforward neural network
of derivatives of the objective function containing a single intermediate layer of
known as error function in the machine processing units separating input from output
learning literature with respect to the units. Intermediate layers of this sort are
network parameters, and section 20.5 shows often called hidden layers to distinguish them
how these can be obtained computation- from the input and output layers. In this
ally efficient using the technique of error network there are N input nodes representing
backpropagation. input variables x1 , . . ., xN ; H hidden units
The section that follows addresses the issue representing hidden variables z1 , . . ., zH ;
of network complexity and briefly discusses and K output nodes representing output
some techniques (in particular regularization variables y1 , . . ., yK . Weight parameters are
and early stopping) to determine the number represented by links between the nodes. The
of hidden units. This problem is shown to bias parameters are denoted by links coming
essentially consist of optimizing the com- from additional input and hidden variables
plexity of the network model (complexity x0 and z0 . Observe the feedforward structure
in terms of free parameters) in order to where the inputs are connected only to units
achieve the best generalization performance. in the hidden layer, and the outputs of this
Section 20.7 then moves attention to the issue layer are connected only to units in the
of how to appropriately test the generaliza- output layer.
tion performance of a neural network. Some The term architecture or topology of a
conclusions and an outlook for the future are network refers to the topological arrangement
given in the final section. of the nodes. We call the network architecture
The bibliography that is included intends shown in Figure 20.1 a single hidden layer
to provide useful pointers to the literature network or a two layer rather than a three
rather than a complete record of the whole layer network because it is the number of
field of neural networks. The readers should layers of adaptive weights that is important
recognize that there are several wide rang- for determining the network properties. This
ing text books with introductory character, architecture is most widely used in practice.3
NEURAL NETWORKS FOR SPATIAL DATA ANALYSIS 377
y1 . . . yK Outputs
Second layer of
parameters w (2)
z0 z1 . . . zh . . . zH Hidden units
First layer of
parameters w (1)
. . . . . . Inputs
x0 x1 x2 xn xN
Figure 20.1 Network diagram for the single hidden layer neural network corresponding to
equation (20.6). The input, hidden and output variables are represented by nodes, and the
weight parameters by links between the nodes, where the bias parameters are denoted by
links coming from additional input and hidden variables x0 and z0 . The arrow denotes the
direction of information ow through the network during forward propagation
Kurkov (1992) has shown that one hid- biases.6 These quantities are known as
den layer is sufficient to approximate any activations in the field of neural networks.
continuous function uniformly on a compact Each of them is then transformed using
input domain. But note that it may be a differentiable continuous nonlinear or
more parsimonious to use fewer hidden units activation (transfer) function7 to give the
connected in two or more hidden layers. output:
Any network diagram can be converted
into its corresponding mapping function,
zh = (neth ) (20.2)
provided that the diagram is feedforward as
in Figure 20.1 so that it does not contain
closed directed cycles.4 This guarantees that for h = 1, . . ., H. These quantities are
the network output yk (k = 1, . . ., K) can be again linearly combined to generate the
described by a series of functional trans- input, called netk , that output unit k (k =
formations as follows. First, we form a 1, . . ., K) receives:
linear combination5 of the N input variables
x1 , . . ., xN to get the input, say neth , that
hidden unit h receives:
H
(2) (2)
netk = wkh zh + wk0 . (20.3)
h=1

N
(1) (1)
neth = whn xn + wh0 (20.1) (2)
n=1 The parameters wkh represent the connection
weights from hidden unit h (h = 1, . . ., H)
to output unit k (k = 1, . . ., K), and the
for h = 1, . . ., H. The superscript (1) indi- (2)
wk0 are bias parameters. Finally, the netk
cates that the corresponding parameters are are transformed to produce a set of network
in the first layer of the network. The outputs yk (k = 1, . . ., K):
(1)
parameters whn represent connection weights
going from input n (n = 1, . . ., N) to
hidden unit h (h = 1, . . ., H), and wh0
(1) yk = k (netk ) (20.4)
where k denotes a real valued activation space X to some K-dimensional output

function of output unit k. space Y.
Information processing in such networks Several authors including Cybenko (1989),
is, thus, straightforward. The input units Funahashi (1989), Hornik et al. (1989)
just provide a fan-out and distribute the and many others have shown that such
input to the hidden units. These units neural network models, with more or less
sum their inputs, add a constant (the general types of activation functions and
bias) and take a fixed transfer function , have universal approximation capabili-
h of the result. The output units are of ties. They can uniformly approximate any
the same form, but with output activation continuous function f on a compact input
function k . Network output yk can then domain to arbitrary accuracy, provided the
be expressed in terms of an output function network has a sufficiently large number
gk (x, w) as: of hidden units. This approximation result,
however, is non-constructive and provides
no guide to how many hidden units
yk = gk (x, w) might be needed for a practical problem
at hand.
4 4 5 5

H
(2)

N
(1) (1) (2)
This result holds for a wide range
= k wkh h whn xn +wh0 +wk0 of hidden and output layer activation
h=1 n=0
functions. The functions can be any non-
(20.5)
linearity as long as they are continuous
and differentiable. The hidden unit activation
functions h (.) are typically sigmoid, and
where x = (x1 , . . ., xN ) and w represents
almost always taken to be logistic sigmoid9
a vector of all the weights and bias terms.
(1) so that:
Note that the bias terms wh0 (h = 1, . . ., H)
(2)
and wk0 (k = 1, . . ., K) in equation (20.5)
can be absorbed8 into the set of weight 1
h (netk ) = (20.7)
parameters by defining additional input and 1 + exp (neth )
hidden unit variables, x0 and z0 , whose values
are clamped at one so that x0 = 1 and
whose outputs lie in the range (0, 1), while
z0 = 1. Then the network function (20.5)
the choice of the activation function k (.)
becomes
of the output units is generally determined
by the nature of data and the assumed dis-
tribution of the target variables. Section 20.3
yk = gk (x, w)
will show that different activation functions
4 4 55 should be chosen for different types of

H
N
= k
(2)
wkh h
(1)
whn xn . problems. For standard regression problems
h=1 n=0 the identity function appears to be an
(20.6) appropriate choice so that yk = netk . For
multiple binary classification each output unit
activation should be transformed using a
Neural networks of type (20.6) are rather logistic sigmoid function, while the stan-
general. They can be seen as a flexible way dard multi-class classification problem in
to parameterize a fairly general nonlinear which each input is assigned to one of
function from some N-dimensional input K mutually exclusive classes gives rise
to the softmax activation function (Bridle, This error function, say E, is defined in
1994): term of deviations of the network outputs
y = ( y1 , . . ., yK ) from corresponding
desired (target) outputs t = (t1 , . . ., tK ), and
exp (netk ) expressed as a function of the weight vector w
yk = k (netk ) = (20.8)

K
representing the free parameters (connection
exp (netc )
c=1 weights and bias terms) of the network. The
goal of training is then to minimize the error
function so that:

where 0 yk 1 and K k=1 yk = 1.
A neural network with a single logistic
output unit can be seen as a nonlinear min E(w) (20.9)
wW
extension of logistic regression. With many
logistic units, it corresponds to linked logistic
regressions of each class versus the others. where W is a weight space appropriate
If the transfer functions of the output units to the network architecture. The smallest
in a network are taken to be linear, we value of E(w) will occur at a point such
have a standard linear model augmented that the gradient of the error function
by nonlinear terms. Given the popularity of vanishes E(w) = 0, where E(w) denotes
linear models in spatial analysis, this form the gradient (the vector containing the partial
is particularly appealing, as it suggests that derivatives) of E(w) with respect to w.
neural network models can be viewed as A single hidden layer network of the kind
extensions of rather than as alternatives shown in Figure 20.1, with H hidden units,
to the familiar models. The hidden unit generally has many points at which the
activations can then be viewed as latent gradient vanishes. The point w is called a
variables whose inclusion enriches the linear global minimum for E(w) if E(w ) E(w)
model. for all w W. Other minima are called local
minima, and each corresponds to a different
set of parameters. For a successful applica-
tion of neural networks, however, it may not
20.3. NETWORK TRAINING be necessary to find the global minimum,
and in general it will not be known whether
So far, we have considered neural networks the minimum found is the global one or not.
as a general class of parametric nonlinear But it may be necessary to compare several
functions from a vector x of input variables minima in order to find a sufficiently good
x1 , . . ., xN to a vector y of output variables solution of the problem under scrutiny.
y1 , . . ., yK . The process of determining Training is performed using a training
the network parameters is called network set Sp = {(x p , t p ) : p = 1, . . ., P},
training or network learning. The problem consisting of P ordered pairs of vectors. x p
of determining the network parameters can denotes an N-dimensional input vector and t p
be viewed from different perspectives. We the associated K-dimensional desired output
view it as an unconstrained nonlinear func- (target) vector. The choice of a suitable error
tion optimization problem,10 the solution function depends on the problem to be per-
of which requires the minimization of formed. We follow Bishop (1995: chapter 6)
some (continuous and differentiable) error to provide a maximum likelihood motivation
function. for the choice, and start by considering
regression problems. If we assume that the K The assumption of independence can be

target variables are independent conditional dropped at the expense of a slightly more
on x and w with shared noise precision , complex optimization problem. Note that
then the conditional distribution of the target in practice the nonlinearity of the network
values is given by a Gaussian: function g(xp , w) causes the error E(w) to
be convex, and so in practice local maxima
of the likelihood may be found, which
p (t| x, w) = N(t|g(x, w), 1 I) (20.10) correspond to local minima of the error
function.
There is a natural pairing of the error
where is the precision (inverse variance) function given by the negative log likelihood
of the Gaussian noise. For the conditional and the output unit transfer function. In the
distribution given by equation (20.10), it is regression case we can view the network
sufficient to take the output unit transfer as having a transfer function that is the
function to be the identity. Given that identity, so that yk = netk . The corresponding
t = (t 1 , . . ., t P ), are independent, identically sum-of-squares error function then has the
distributed observations, we can construct the characteristic:
corresponding likelihood function:
E
= ( yk tk ). (20.14)
(
P
netk
p (t| x, w, ) = p (t p | x , w, ).
p
p=1
(20.11) This property will be used when discussing
the technique of error backpropagation in
section 20.5.
Maximizing the likelihood function is equi-
Now let us consider the case of binary
valent to minimizing the sum-of-squares
classification where we have a single target
function given by:
variable t such that t = 1 denotes
class C1 and t = 0 class C2 . We con-
sider a network with a single output whose

P
K
: :
E(w) = 1 :gk (xp , w) t p :2 . transfer function is a logistic sigmoid so
2 k
p=1 k=1 that 0 g(x, w) 1, and we can inter-
(20.12) pret g(x, w) as the conditional probability
p(C1 , x), with p(C2 , x) given by 1 g(x, w).
The conditional probability of targets given
The value of w found by solving equation inputs is then a Bernoulli distribution of
(20.9) will be denoted wML because it the form:
corresponds to the maximum likelihood
estimation. Having formed wML , the noise
precision is then provided by: p(t | x, w) = g(x, w)t {1 g(x, w)}1t .
(20.15)
1
P
: p :
= 1 :g(x , wML ) t p :2 .
ML PK
p=1 If we have a training set of independent
(20.13) observations, then the error function, given
by the negative log likelihood, is the cross- with respect to the activation for a particular
entropy error function of the form: output unit k takes the simple form (20.14)
as in the regression case.
If we have a standard multiple-class

p
classification problem to solve, where each
E(w) = {t p lny p +(1t p )ln(1y p )}
p=1
input is assigned to one of K mutually
(20.16) exclusive classes, then we can use a neural
network with K output units each of which
has a softmax output activation function.
where yp denotes g(xp , w). Note there is no The binary target variables tk {0, 1}
analogue of the noise precision because have a 1-of-K coding scheme indicating
the target values are assumed to be correctly the correct class, and the network outputs
p
labelled. are interpreted as gk (xp , w) = p(tk = 1 | x)
For classification problems, the targets leading to the error function, called the
represent labels defining class membership multiple-class cross-entropy error function
or more generally estimates of the (see Fischer and Staufer, 1999):
probabilities of class membership. If we have
K separate binary classifications to perform, 4 5

P
K
gk (x p , w)
then a neural network with K logistic sigmoid E(w) =
p
tk ln p
output units is an appropriate choice. In p=1 k=1
tk
p
this case a binary class label tk {0, 1} (20.19)
is associated with each output k. If we assume
that the class labels are independent, given
the input vector xp , then the conditional which is non-negative, and equals zero when
p
distribution is: gk (xp , w) = tk for all k and p. Once again, the
derivative of this error function with respect
to the activation for a particular output unit k
(
K
takes the familiar form equation (20.14). It is
p(t | x, w) = gk (x, w)tk [1gk (x, w)]1tk .
worth noting that in the case of K = 2 we can
k=1
(20.17) use a network with a single logistic sigmoid
output, alternatively to a network with two
softmax output activations.
Taking the negative logarithm of the corres- In summary, there is natural pairing of the
ponding likelihood function then yields the choice of the output unit transfer function and
multiple-class cross-entropy error function of the choice of the error function, according
the form: to the type of the problem that has to
be solved. For regression we take linear
outputs and a sum-of-squares error, for

P
K

E(w) =
p p
tk ln yk (multiple independent) binary classifications
p=1 k=1 we use logistic sigmoid outputs with the
corresponding cross-entropy error function,
p p
+(1 tk ) ln(1 yk ) (20.18) and for multi-class classification softmax

outputs and the multi-class cross-entropy
error function. For classification problems
p
where yk = gk (xp , w). It is important to involving two classes, we can use a single
note that the derivative of this error function logistic sigmoid output, or alternatively we
can take a network with two softmax outputs The training process is maintained on an
(Bishop, 2006: 236). epoch-by-epoch basis until the connection
weights and bias terms of the network
stabilize and the average error over the entire
20.4. PARAMETER OPTIMIZATION training set converges to some minimum.
It is good practice to randomize the order
There are many ways to solve the minimiza- of presentation of training examples from
tion problem (20.9). Closed-form optimiza- one epoch to the next. This randomization
tion via the calculus of scalar fields rarely tends to make the search in the parameter
admits a direct solution. A relatively new set space stochastic over the training cycles, thus
of interesting techniques that use optimality avoiding the possibility of limit cycles in the
conditions from calculus are based on evolution of the weight vectors.
evolutionary computation (Goldberg, 1989; Gradient descent optimization may pro-
Fogel, 1995). But gradient procedures which ceed in one of two ways: pattern mode and
use the first partial derivatives E(w), batch mode. In the pattern mode weight
so-called first order strategies, are most updating is performed after the presentation
widely used. Gradient search for solutions of each training example. Note that the error
gleans its information about derivatives from functions based on maximum likelihood for
a sequence of function values. The recursion a set of independent observations comprise a
scheme is based on the formula:11 sum of terms, one for each data point. Thus:
w( + 1) = w( ) + ( ) d( ) (20.20)
P
E(w) = Ep (w) (20.22)
p=1
where denotes the iteration step. Different
procedures differ from each other with regard
to the choice of step length ( ) and search where Ep is called the local error while E
direction d( ), the former being a scalar the global error, and pattern mode gradient
called learning rate and the latter a vector descent makes an update to the parameter
of unit length. vector based on one training example at a
The simplest approach to using gradient time so that:
information is to assume ( ) being constant
and to choose the parameter update in w( + 1) = w( ) Ep (w( )). (20.23)
equation (20.20) to comprise a small step in
the direction of the negative gradient so that:
Rumelhart et al. (1986) have shown that
d( ) = E(w( )). (20.21) pattern based gradient descent minimizes
equation (20.22), if the learning parameter
is sufficiently small. The smaller , the
After each such update, the gradient is smaller will be the changes to the weights
re-evaluated for the new parameter vector in the network from one iteration to the
w( + 1). Note that the error function is next and the smoother will be the trajectory
defined with respect to a training set SP to in the parameter space. This improvement,
be processed to evaluate E. One complete however, is attained at the cost of a slower
presentation of the entire training set during rate of training. If we make the learning rate
the training process is called an epoch. parameter too large so as to speed up the
rate of training, the resulting large changes where ( ) is a time varying parameter.
in the parameter weights assume such a form There are various rules for determining ( )
that the network may become unstable. in terms of the gradient vectors at time
In the batch mode of training, parameter and + 1 leading to the FletcherReeves and
updating is performed after the presentation PolakRibire variants of conjugate gradient
of all the training examples that constitute algorithms (see Press et al., 1992). The
an epoch. From an online operational point computation of the learning rate parameter
of view, the pattern mode of training is ( ) in the update formula (20.20) involves
preferred over the batch mode, because it a line search, the purpose of which is to find
requires less local storage for each weight a particular value of for which the error
connection. Moreover, given that the training function E(w( )+ d( )) is minimized, given
patterns are presented to the network in a fixed values of w( ) and d( ).
random manner, the use of pattern-by-pattern The application of Newtons method to
updating of parameters makes the search in the training of neural networks is hindered
parameter space stochastic in nature12 which by the requirement of having to calcu-
in turn makes it less likely to be trapped in late the Hessian matrix and its inverse,
a local minimum. On the other hand, the use which can be computationally expensive.
of batch mode of training provides a more The problem is further complicated by
accurate estimation of the gradient vector the fact that the Hessian matrix H would
E. Finally, the relative effectiveness of the have to be non-singular for its inverse
two training modes depends on the problem to be computed. Quasi-Newton methods
to be solved (Haykin, 1994: 152 pp). avoid this problem by building up an
For batch optimization there are more approximation to the inverse Hessian over
efficient procedures, such as conjugate gra- a number of iteration steps. The most
dients and quasi-Newton methods, that are commonly variants are the Davidson
much more robust and much faster than FletcherPowell and the BroydenFletcher
gradient descent (Nocedal and Wright, 1999). GoldfarbShanno procedures (see Press
Unlike steepest gradient, these algorithms et al., 1992).
have the characteristic that the error function Quasi-Newton procedures are today the
always decreases at each iteration unless most efficient and sophisticated (batch)
the parameter vector has arrived at a local optimization algorithms. But they require the
or global minimum. Conjugate gradient evaluation and storage in memory of a dense
methods achieve this by incorporating an matrix H( ) at each iteration step . For
intricate relationship between the direction larger problems (more than 1,000 weights)
and gradient vectors. The initial direction the storage of the approximate Hessian
vector d(0) is set equal to the negative can be too demanding. In contrast, the
gradient vector at the initial step = 0. conjugate gradient procedures require much
Each successive direction vector is then less storage, but an exact determination of the
computed as a linear combination of the learning rate ( ) and the parameters ( )
current gradient vector and the previous in each iteration , and, thus, approximately
direction vector. Thus: twice as many gradient evaluations as the
quasi-Newton methods.
When the surface modelled by the error
d( + 1) = E(w( + 1)) + ( ) d( ) function in its parameter space is extremely
rugged and has many local minima, then a
(20.24) local search from a random starting point
tends to converge to a local minimum simply termed backprop uses a local mes-
close to the initial point and to a solution sage passing scheme in which information
worse than the global minimum. In order is sent alternately forwards and backwards
to seek out good local minima, a good through the network. Its modern form stems
training procedure must thus include both from Rumelhart et al. (1986), illustrated
a gradient based optimization algorithm for gradient descent optimization applied
and a technique like random start that to the sum-of-squares error function. It
enables sampling of the space of minima. is important to recognize, however, that
Alternatively, stochastic global search pro- error backpropagation can also be applied
cedures might be used. Examples of such to error functions other than just sum-of-
procedures include Alopex (see Fischer et al., squares and to a wide variety of opti-
2003, for an application in the context of mization schemes for weight adjustment
spatial interaction data analysis), genetic other than gradient descent, in pattern or
algorithms (see Fischer and Leung, 1998, batch mode.
for another application in the same context), We describe the backpropagation algo-
and simulated annealing. These procedures rithm for a general network of type (20.6)
guarantee convergence to a global solution that has a single hidden layer, arbitrary
with high probability, but at the expense of differentiable activation functions with a
slower convergence. corresponding local error function Ep (w).
Finally, it is worth noting that the question For each pattern p in the training data set,
whether neural networks can have real- we shall assume that we have supplied the
time learning capabilities is still challenging corresponding input vector xp to the network
and open. Real-time learning is highly and calculated the activations of all the
required by time-critical applications, such hidden and output units in the network by
as for navigation and tracking systems in a applying equations (20.1)(20.4). Recall that
p
GIS-T context, where the data observations each hidden unit h has input neth and output
p p
are arriving in a continuous stream, and zh = h (neth ), and each output unit k has
p p p
predictions have to be made before all the input netk and output yk = k (netk ).
data seen. Even for offline applications, This process is called forward propagation
speed is still a need, and real-time learning because it can be seen as a forward flow
algorithms that reduce training time are of of information (signals) provided by x p
considerable value. through the network. For the rest of this
section we consider one example and drop
the superscript p in order to keep the notation
uncluttered.
We evaluate the gradient Ep with respect
(2)
20.5. ERROR BACKPROPAGATION to a hidden-to-output parameter wkh first, by
(2)
noting that Ep depends on the weight wkh
One of the greatest breakthroughs in neural only via the summed input, netk , to the output
network modelling has been the introduction unit k. Thus, we can apply the chain rule for
of the technique of error backpropagation13 partial derivatives to get:
in that it provides a computationally effi-
cient technique to calculate the gradient
vector of an error function for a feedfor-
Ep Ep netk
ward neural network with respect to the = (20.25)
(2)
wkh netk w(2)
parameters. This technique sometimes kh
where are found as:
netk
H
(2) k = yk (1 yk )( yk tk ). (20.30)
(2)
= (2)
wkh zh = zh . (20.26)
wkh wkh h=0
For the input-to-hidden connections we

If we define: must differentiate the chosen error function
(1)
with respect to the parameters whn , which
Ep Ep are more deeply embedded in the error
k := = (netk ) (20.27) function. Using again the chain rule for
netk yk
partial derivatives, we get:
where the s are often referred to as errors,

and substitute equations (20.26) and (20.27) Ep neth
(1)
= h (1)
= h xn (20.31)
into equation (20.25), we obtain: whn whn
Ep
(2)
= k zh . (20.28) with:
wkh

K
h := h (neth )
(2)
This equation tells us that the required k wkh (20.32)
(2)
partial derivative with respect to wkh is k=1
obtained simply by the multiplication of two
expressions: the value of for unit k at
where the use of the prime signifies differ-
the output end of the connection concerned
entiation with respect to the argument. In
and the value of z at the input end h of
the case of logistic hidden units we get the
the connection. Thus, in order to evaluate
following backpropagation formula:
the partial derivatives with respect to the
second layer parameters we need only to
compute the value of k for each output unit
K
h = h (neth )
(2)
k = 1, . . ., K in the network, and then apply k wkh
equation (20.28). k=1
For linear outputs associated with the
sum-of-squares error function, for logistic
K
(2)
= (neth )(1 (neth )) k wkh
sigmoid outputs associated with the cross- k=1
entropy error function and for softmax
outputs associated with the multiple-class
K
(2)
cross-entropy error function, the s are = zh (1 zh ) k wkh . (20.33)
given by: k=1
k = yk tk (20.29) Since the formula for h contains only terms

in a later layer, it is clear that it can
be calculated from output to input on the
while for logistic sigmoid outputs associated network. Thus, the basic idea behind the
with the sum-of-squares error function the s technique of error backpropagation is to use a
forward pass through the network to calculate of finding a parsimonious model for a real
the zh and yk values by propagating the world problem is critical for all models but
input vector, followed by a backward pass particularly important for neural networks
to calculate k and h , and hence the partial because the problem of overfitting is more
derivatives of the error function. Note that for likely to occur.
the presentation of each training example the A neural network model that is too
input pattern is fixed throughout the message simple (i.e., small H), or too inflexible,
passing scheme, encompassing the forward will have a large bias and smooth out
pass followed by the backward pass. some of the underlying structure in the data
The backpropagation technique can be (corresponding to high bias), while one that
summarized in the following four steps: has too much flexibility in relation to the
particular data set will overfit the data and
Step 1 Apply an input vector x p to the network and have a large variance. In either case, the
forward propagate through the network, performance of the network on new data (i.e.,
using equations (20.1)(20.4), to generate generalization performance) will be poor.
the hidden and output unit activations
based on current weight settings.
This highlights the need to optimize the
Step 2 Evaluate the k for all the output units complexity in the model selection process
(k = 1, . . ., K ) using equation (20.29) or in order to achieve the best generalization
equation (20.30), depending on the problem (Bishop, 1995: 332; Fischer, 2000). There are
type to be studied. some ways to control the complexity of a
Step 3 Backpropagate the deltas, using equation
neural network, complexity in terms of the
(20.33), to get h for each hidden unit
h(h = 1, . . ., H ) in the network. number of hidden units or, more precisely,
Step 4 Use equations (20.28) and (20.31) to in terms of the independently adjusted
evaluate the required derivatives. parameters. Practice in spatial data analysis
generally adopts a trial and error approach
For batch procedures the gradient of the that trains a sequence of neural networks
global error can be obtained by repeating with an increasing number of hidden units
Step 1 to Step 4 for each pattern p in and then selects that one which gives the
the training set, and then summing over all predictive performance on a testing set.14
patterns. There are, however, other more principled
ways to control the complexity of a neural
network model in order to avoid overfitting.15
One approach is that of regularization, which
20.6. NETWORK COMPLEXITY involves adding a regularization term R(w)
to the error function in order to control
So far we have considered neural networks overfitting, so that the total error function to
of type (20.6) with a priori given numbers be minimized takes the form:
of input, hidden and output units. While the
number of input and output units in a neural
network is basically problem dependent, the E(w) = E(w) + R(w) (20.34)
number H of hidden units is a free parameter
that can be adjusted to provide the best
testing performance on independent data, where is a positive real number, the so-
called testing set. But the testing error is not called regularization parameter, that controls
a simple function of H due to the presence of the relative importance of the data dependent
local minima in the error function. The issue error E(w) and the regularization term R(w),
sometimes also called complexity term. This otherwise one solution is arbitrary favoured
term embodies the a priori knowledge about over an equivalent solution. In particu-
the solution, and therefore depends on the lar, the weights should be scale invariant
nature of the particular problem to be solved. (Bishop, 2006: 257258). A regularized
Note that Ep (w) is called the regularized error error function that satisfies this property is
function. given by:
One of the simplest forms of regularizer is
defined as the squared norm of the parameter : :m : :m
vector w in the network, as given by: E(w) + 1 :wq1 : + 2 :wq2 : (20.37)
R(w) = w2 . (20.35) where wq1 denotes the set of the weights in
(1) (1)
the first layer, that is w11 , . . . , wh1 , . . .
(1)
, wHN , and wq2 those in the second layer,
This regularizer16 is known as a weight (2) (2) (2)
that is w11 , . . ., wkh , . . ., wKH . Under
decay function that penalizes large weights.
linear transformations of the weights, the
Hinton (1987) has found empirically that a
regularizer will remain unchanged, provided
regularizer of this form can lead to significant
that the parameters 1 and 2 are suitably
improvements in network generalization.
rescaled.
Sometimes, a more general regularizer is
The more sophisticated control of com-
used, for which the regularized error takes
plexity that regularization offers over adjust-
the form:
ing the number of hidden units by trial
and error is evident. Regularization allows
E(w) + wm (20.36) complex neural network models to be trained
on data sets of limited size without severe
overfitting, by limiting the effective network
where m = 2 corresponds to the quadratic complexity. The problem of determining the
regularizer (20.35). The case m = 1 is appropriate number of hidden units is, thus,
known as the lasso in the statistics literature shifted to one of determining a suitable value
(Tibshirani, 1996b). It has the property that for the regularization parameter(s) during the
if is sufficiently large some of the training process.
parameter weights are driven to zero in The principal alternative to regularization
sequential learning algorithms, leading to as a way to optimize the model complexity
a sparse model. As is increased, so an for a given training data set is the procedure
increasing number of parameters are driven of early stopping. As we have seen in the
to zero. previous sections, training of a nonlinear
One of the limitations of this regular- network model corresponds to an iterative
izer is inconsistency with certain scaling reduction of the error function defined with
characteristics of network mappings. If one respect to a given training data set. For
trains a network using original data and many of the optimization procedures used for
one network using data for which the input network training (such as conjugate gradient
and/or target variables are linearly trans- optimization) the error is a nondecreasing
formed, then consistency requires obtaining function of the iteration steps . But the
equivalent networks which differ only by error measured with respect to independent
a linear transformation of the weights. Any data, called the validation data set, often
regularizer should possess this characteristic, shows a decrease first, followed by an
increase as the network starts to overfit, as those used for training is known as general-
illustrated in Fischer and Gopal (1994) for ization (see, e.g., Moody, 1992). To assess
a spatial interaction data analysis problem. the generalization performance of a neural
Thus, training can be stopped at the point of network model is of crucial importance.
smallest error with respect to the validation The performance on the training set is not
data, in order to get a network that shows a good indicator due to the problem of
good generalization performance. But, if the overfitting. As often in statistics, there is a
validation set is small, it will give a relatively trade-off between accuracy on the training
noisy estimate of generalization performance, data and generalization. This is a well-
and it may be necessary to keep aside studied dilemma (see, e.g., Bishop, 1995:
another data set, the test set, on which the chapter 9).
performance of the network model is finally The simplest way to assess the gener-
evaluated. alization performance is the use of a test
This approach of stopping training before set. Here, of course, it is assumed that
a minimum of the training error has been the test data are drawn from the same
reached is another way of eliminating population used to generate the training data.
the network complexity. It contrasts with If the test set is too small, an accurate
regularization because the determination of assessment cannot be obtained. Test set
the number of hidden units does not require validation becomes practical only if the data
convergence of the training process. The sets are very large or new data can be
training process is used here to perform generated cheaply. As the training and test
a directed search of the weight space for sets are independent samples, an unbiased
a neural network model that does not estimate of the prediction risk is obtained.
overfit the data and, thus, shows superior But the estimate can be highly variable across
generalization performance. Various theo- different data splittings.
retical and empirical results have provided One way to overcome this problem is by
strong evidence for the efficiency of early cross-validation. Cross-validation is a sample
stopping (see, e.g., Weigend et al., 1991; re-use method for assessing generalization
Baldi and Chauvin, 1991; Finnoff, 1991). performance. It makes maximally efficient
Although many questions remain, a picture use of the available data. The idea is to
is starting to emerge as to the mechanisms divide the available data set into generally
responsible for the effectiveness of this equally sized D parts, and then to use one
procedure. In particular, it has been shown part to test the performance of the neural
that stopped training has the same sort of network model trained on the remaining
regularization effect (i.e., reducing model (D 1) parts. The resulting estimator is again
variance at the cost of bias) that penalty terms unbiased, and we can average the D such
provide. estimates. Leave-one-out cross-validation is
a special case, in which each observation is
tested on the remaining (P 1) observations.
This version evidently requires a large
number of computations. Choosing D = P
20.7. GENERALIZATION should give the most accurate assessment,
PERFORMANCE as the true size of the training set is
most closely mimicked, but also involves
The ability of a neural network to predict the most computation. In addition, cross-
correctly new observations that differ from validation estimates of performance for large
D might be expected to be rather variable. of the form:

Taking a smaller D can give a larger bias,
but smaller variance and mean-square error.
P3 :
:
This is an argument in favour of smaller D :g(x p3 , w) t p3 :2
(Ripley, 1996: 7071). Bootstrap estimates p3=1
(w) = (20.38)
of bias can be used for bias correction P3 :
:
:t p3 t :2
(Efron, 1982). p3=1
With small samples of data precisely
when structural uncertainty is greatest
cross-validation may not be feasible, because which is a function of w, obtained by solving
there are too few data values with which the minimization problem (20.9). t is defined
to carry out the estimation, validation and to be the average test set target vector.
testing activities in a stable way. Bootstrap- Care has to be taken in interpreting the
ping the neural network modelling process results obtained as accurate estimates of the
creating bootstrap copies of the available generalization performance.
data to generate copies of training, validation Randomness enters into this standard
and test sets may be used instead as approach to neural network modelling in two
a general framework for evaluating gener- ways: in the splitting of the data samples, and
alization performance. The idea underlying in choices about the parameter initialization.
the bootstrap is appealingly simple. For an This leaves one question wide open. What
introduction see, for example, Efron (1982), is the variation of test performance as one
Efron and Tibshirani (1993) or Hastie et al. varies training, validation, and test sets?
(2001). This is an important question, since there
Suppose, we are interested in a single is not just one best split of the data or
hidden layer neural network together with obvious choice for the initial weights. Thus,
a sum-of-squares error function to solve a it is useful to vary both the data partitions
regression problem. The standard procedure and parameter initializations to find out
for estimating and evaluating the neural more about the distributions of generalization
network is to split the available data set, errors. One way is to use a computer
say SP = {zp = (xp , t p ) : p = intensive bootstrapping approach to evaluate
1, . . ., P}, into three parts: a training set the performance, reliability, and robustness
SP1 = {zp1 = (xp1 , t p1 ) : p1 = of the neural network model, an approach
1, . . ., P1}, a validation set SP2 = {zp2 = that combines the purity of splitting the data
(xp2 , t p2 ) : p2 = 1, . . ., P2} and a test into three disjoint data sets with the power
set SP3 = {zp3 = (xp3 , t p3 ) : p3 = of a resampling procedure. Implementing this
1, . . ., P3}, with P1 + P2 + P3 = P. The approach involves the following steps (see
training set serves for parameter estimation Fischer and Reismann, 2002a, b, for an appli-
such as by means of gradient descent cation in the context of spatial interaction
on the sum-of-squares error function. The modelling).
validation set is used, for example, to deter-
mine the stopping point before overfitting Step 1: Generation of bootstrap training, valida-
occurs, and the test set to evaluate the tion and test sets
Using the sample SP , we rst build a test
generalization performance of the model,
set by choosing P 3 patterns randomly,17
using some measure of error between a with replacement. The patterns used in
prediction and an observed value, such this specic test set are then removed
as the familiar root-mean-square error from the pool SP . From the remainder,
we then randomly set aside P 2 patterns equation (20.38), in the same manner as
for the bootstrap validation set. They are (w ) but with resample SPb3 replacing
picked randomly without replacement and SP 3 and b w replacing w . This yields
removed from the pool. The remaining a sequence of bootstrap statistics,
patterns constitute the training set. This 1 , . . ., B .
process is repeated B times (typically 20 < Step 4: Estimation of the standard deviation
B < 200) to generate b = 1, . . ., B The statistical accuracy of the perfor-
training data sets of size P 1, SPb1 = mance statistic can then be evaluated by
{b z p1 : p1 = 1, . . ., P 1}, called bootstrap looking at the variability of the statistic
training sets; b = 1, . . ., B validation data between the different bootstrap test sets.
sets of size P 2, SPb2 = {b z p2 : p2 = Estimate the standard deviation, , of
1, . . ., P 2}, called bootstrap validation as approximated by bootstrap:
sets; and b = 1, . . ., B test data sets of
size P 3, SPb3 = {b z p3 : p3 = 1, . . ., P 3},
called bootstrap test sets. 1 B 2
PB3 = b (b w )(.)
Step 2: Computation of the bootstrap parameter B 1
estimates b=1
Each bootstrap training set SPb1 is used (20.41)
to compute a new parameter vector by
minimizing:
where
arg min {E (b w ) : b w W ,
B
(.) = 1 b (b w ). (20.42)
B
W RQ } (20.39) b=1
The true standard error of is a function of

where Q is the number of parameters, and the unknown density function F of , that
E (b w ) the (global) sum-of-squares error is (F ). With the bootstrapping approach
for the bth bootstrap training sample. This described above one obtains FP3 which is
is given by: supposed to describe closely the empirical
probability distribution FP 3 , in other words
PB3 (FP 3 ). Asymptotically, this means
E (b w ) that as P 3 tends to innity, the estimate
PB3 tends to (F ). For nite sample sizes,
P1 :
:2 however, generally there will be deviations.
: b p1 b :
= 12 :g( x , w ) b t p1 : Step 5: Bias estimation
p1=1 The bootstrap scheme can be used to
(20.40) estimate not only the variability of the
performance statistic , but also its bias
(Zapranis and Refenes, 1998). Bias can be
where the sum runs over the bth boot- thought of as a function of the unknown
strap training set, and b = 1, . . ., B. probability density function F of that is
The corresponding bootstrap validation set = (F ). The bootstrap estimate of bias
is used to determine the stopping point is simply:
before overtting occurs and/or to set addi-
tional parameters or hyperparameters. This
yields B bootstrap parameter estimates B = (FP 3 ) = E [(FP3 ) (FP 3 )]
b w (b=1,. . ., B).
(20.43)
Step 3: Estimation of the bootstrap statistic of
interest
From SPb3 calculate b (b w ), the where E indicates expectation with
bootstrap analogue of (w ) given by respect to bootstrap sampling and FP3
the bootstrap empirical distribution. possibilities offered by neural networks.

The bootstrap estimate of bias is: To mention some additional models treated
in the field of neural networks, we note
that competitive learning networks have been
1 b b
B
= ( w ) (w ) . much studied with applications, for example,
B to the travelling salesman problem and
b=1
(20.44) remote sensing classification problems, and
that radial basis function networks in which
the activation for a hidden unit is determined
The bias is removed by subtracting B from
by the distance between the input vector
the estimated .
and a prototype vector are also standard
objects of investigation. Leung (1997), for
example, illustrates the use of radial basis
20.8. SUMMARY AND OUTLOOK function networks for rule learning. We,
moreover, did not consider neural networks
In one sense neural networks are nonlinear for unsupervised feature discovery which in
models having a methodology of their own. statistical terms correspond to cluster analysis
From a spatial analysis point of view neural and/or latent structure analysis.
networks can generally be used anywhere one For neural networks to find a place in
would ordinarily use a linear or nonlinear spatial data analysis they need to overcome
specification, with estimation proceeding via their current limitations, mainly due to the
appropriate techniques. The now rather well- relative absence of established procedures
developed theory of estimation of misspeci- for model identification, comparable to those
fied models applies immediately to provide for spatial econometric modelling techniques.
appropriate interpretations and inferential In particular, providing tests specifically
procedures. designed to test the adequacy of neural
Neural networks have essentially a broader models is a research issue on its own
utility that has yet to be fully appreciated by right. Despite significant improvements in
spatial analysts, but which has the potential our understanding of the fundamentals of
to significantly enhance scientific under- neural network modelling, there are many
standing of spatial phenomena and spatial open problems and directions for future
processes subject to neural network mod- research. From a spatial analytic perspective
elling. In particular, the estimates obtained an important avenue for further investigation
from neural network learning may serve is the incorporation of spatial dependency
as a basis for formal statistical inference, in the network representation that received
making possible statistical tests of specific less attention in the past than it deserves.
hypotheses of interest. Because of the ability Another is the application of Bayesian
of neural networks to extract complex inference techniques to neural networks.
nonlinear effects, the alternatives against A Bayesian approach would provide an
which such tests can have power may extend alternative framework for dealing with the
usefully beyond those with the reach of more issues of network complexity and would
conventional methods, such as linear models avoid many of the problems discussed in
for regression and classification. this chapter. In particular, error bars and
Although we have covered a fair amount confidence intervals can easily be assigned to
of ground in this chapter we have only the predictors generated by neural networks,
scratched the surface of the modelling without the need of bootstrapping.
NOTES that this leads to results equivalent to the logistic

function.
1 Neural networks can model cortical local 10 This viewpoint directs attention to the lit-
learning and signal processing, but they are not the erature on numerical optimization theory, with
brain, neither are many special purpose systems to particular reference to optimization techniques that
which they contribute (Weng and Hwang, 2006). use higher-order information such as conjugate
2 Feedforward neural networks are sometimes gradient procedures and Newtons method. The
also called multilayer perceptrons even though the methods use the gradient vector (rst-order partial
term perceptron is usually used to refer to a network derivatives) and/or the Hessian matrix (second-order
with linear threshold gates rather than with contin- partial derivatives) of the error function to perform
uous nonlinearities. Radial basis function networks, optimization, but in different ways. A survey of rst-
recurrent networks rooted in statistical physics, self- and second-order optimization techniques applied
organizing systems and ART (Adaptive Resonance to network training can be found in Cichocki and
Theory) models are other important classes. For a Unbehauen (1993).
fuzzy ARTMAP multispectral classication see Gopal 11 When using an iterative optimization algo-
and Fischer (1997). rithm, some choice has to be made of when to stop
3 A generalization of this network architecture is the training process. There are various criteria that
to allow skip-layer connections from input to output, may be used. For example, training may be stopped
each of which is associated with a corresponding when the error function or the relative change in
adaptive parameter. But note that a network with the error function falls below some prespecied
sigmoidal hidden units can always mimic skip- value.
layer connections for bounded input values by 12 The particular form of ( ) most commonly
using sufciently small single hidden layer weights. used is described by ( ) = c / where c is a
Skip-layer connections, however, can be easier to small constant. Such a choice is sufcient to guar-
implement and interpret in practice. antee convergence of the stochastic approximation
4 Networks with closed directed cycles are called algorithm (Ljung, 1977).
recurrent networks. There are three types of such 13 The term backpropagation is used in the
networks: rst, networks in which the input layer is literature to mean very different things. Sometimes,
fed back into the input layer itself; second, networks the feedforward neural network architecture is called
in which the hidden layer is fed back into the input a backpropagation network. The term is also used
layer, and third, networks in which the output layer to describe the training of a feedforward neural
is fed back into the input layer. These feedback network using gradient descent optimization applied
networks are useful when input variables represent to a sum-of-squares error function.
time series. 14 Note that limited data sets make the determi-
5 Note, we could alternatively use product nation of H more difcult if there is not enough data
rather than summation hidden units to supplement available to hold out a sufciently large independent
the inputs to a neural network with higher-order test sample.
combinations of the inputs to increase the capacity 15 A neural network is said to be overtted
of the network in an information capacity sense. to the data if it obtains an excellent t to the
These networks are called product unit rather than training data, but gives a poor representation of
summation unit networks (see Fischer and Reismann, the unknown function which the neural network is
2002b). approximating.
6 This term should not be confused with the term 16 In conventional curve tting, the use of this
bias in a statistical sense. regularizer is termed ridge regression.
17 Note that a reliable pseudo-random number
7 The inverse of this function is called link
generator is essential for the valid application of the
function in the statistical literature. Note that radial
bootstrap approach.
basis function networks may be viewed as single
hidden layer networks that use radial basis function
nodes in the hidden layer. This class of neural
networks asks for a two stage approach for training.
In the rst stage the parameters of the basis functions
are determined, while in the second stage the basis REFERENCES
functions are kept xed and the second layer weights
are found (see Bishop, 1995: 170 pp.). Anders, U. and Korn, O. (1999). Model selection in
8 This is the same idea as incorporating the neural networks. Neural Networks, 12(2): 309323.
constant term in the design matrix of a regression
by inserting a column of ones. Baldi, P. and Chauvin, Y. (1991). Temporal evolution
9 In some cases there may be some practical of generalization during learning in linear networks.
advantage to use a tanh function instead. But note Neural Computation, 3(4): 589603.
Bck, T., Fogel, D.B. and Michaelewicz, Z. (eds) (1997). Fischer, M.M. (1998). Computational neural networks
Handbook of Evolutionary Computation. New York A new paradigm for spatial analysis. Environment
and Oxford: Oxford University Press. and Planning A, 30(10): 18731892.
Bishop, C.M. (1995). Neural Networks for Pattern Fischer, M.M. (2000). Methodological challenges in
Recognition. Oxford: Clarendon Press. neural spatial interaction modelling: the issue of
model selection. In: Reggiani, A. (ed.), Spatial
Bishop, C.M. (2006). Pattern Recognition and Machine
Economic Science: New Frontiers in Theory and
Learning. New York: Springer.
Methodology, pp. 89101. Berlin, Heidelberg and
Breiman, L. (1996). Heuristics of instability and New York: Springer.
stabilization in model selection. The Annals of
Fischer, M.M. (2002). Learning in neural spatial
Statistics, 24(6): 23502383.
interaction models: A statistical perspective, Journal
Bridle, J.S. (1994). Probabilistic interpretation of of Geographical Systems, 4(3): 287299.
feedforward classication network outputs, with
relationships to statistical pattern recognition. In: Fischer, M.M. (2005). Spatial analysis. In Longley, P.,
Fogelman Souli, F. and Hrault, J. (eds), Neurocom- Goodchild, M.F., Maguire, D.J. and Rhind, D.W.
puting. Algorithms, Architectures and Applications, (eds), Geographical Information Systems. Princi-
pp. 227236. Berlin, Heidelberg and New York: ples, Techniques, Management and Applications.
Springer. Second Edition, Abridged, (CD-ROM). Hoboken,
New Jersey: Wiley.
Carpenter, G.A. (1989). Neural network models for
pattern recognition and associative memory. Neural Fischer, M.M. (2006a). Neural networks. A general
Networks, 2(4): 243257. framework for non-linear function approximation.
Transactions in GIS, 10(4): 521533.
Carpenter, G.A., Grossberg, S. and Reynolds, J.H.
(1991). ARTMAP supervised real-time learning and Fischer, M.M. (2006b). Spatial Analysis and Geocom-
classication of nonstationary data by a self- putation. Selected Essays. Berlin, Heidelberg and
organizing neural network. Neural Networks, 4(5): New York: Springer.
565588. Fischer, M.M. and Getis, A. (eds) (1997). Recent
Cichocki, A. and Unbehauen, R. (1993). Neural Developments in Spatial Analysis. Spatial Statistics,
Networks for Optimization and Signal Processing. Behavioural Modelling, and Computational Intelli-
Chichester: Wiley. gence. Berlin, Heidelberg and New York: Springer.
Corne, S., Murray, T., Openshaw, S., See, L. and Fischer, M.M. and Gopal, S. (1994). Articial neural
Turton, I. (1999). Using computational intelligence networks: A new approach to modelling interre-
techniques to model subglacial water systems. gional telecommunication ows. Journal of Regional
Journal of Geographical Systems, 1(1): 3760. Science, 34(4): 503527.
Cybenko, G. (1989). Approximation by superpositions Fischer, M.M. and Leung, Y. (1998). A genetic-
of a sigmoidal function. Mathematics of Control algorithm based evolutionary computational neural
Signals and Systems, 2: 303314. network for modelling spatial interaction data. The
Annals of Regional Science, 32(3): 437458.
Efron, B. (1982). The Jackknife, the Bootstrap and
Other Resampling Plans. Philadelphia, PA: Society Fischer, M.M. and Leung, Y. (eds) (2001). GeoCom-
for Industrial and Applied Mathematics. putational Modelling: Techniques and Applications.
Berlin, Heidelberg and New York: Springer.
Efron, B. and Tibshirani, R. (1993). An Introduction to
the Bootstrap. New York: Chapman and Hall. Fischer, M.M. and Reismann, M. (2002a). Evaluating
neural spatial interaction modelling by bootstrap-
Finnoff, W. (1991). Complexity measures for classes
ping. Networks and Spatial Economics, 2(3):
of neural networks with variable weight bounds.
255268.
Proceedings of the International Geoscience and
Remote Sensing Symposium (IGARSS94, Volume 4), Fischer, M.M. and Reismann, M. (2002b). A method-
pp. 18801882. Piscataway, NJ: IEEE Press. ology for neural spatial interaction modeling.
Geographical Analysis, 34(2): 207228.
Finnoff, W., Hergert, F. and Zimmerman, H.G.
(1993). Improving model selection by nonconvergent Fischer, M.M. and Staufer, P. (1999). Optimization in an
methods. Neural Networks, 6(6): 771783. error backpropagation neural network environment
with a performance test on a spectral pattern Grossberg, S. (1988). Nonlinear neural networks.
classication problem. Geographical Analysis, 31(2): Principles, mechanisms and architectures. Neural
89108. Networks, 1(1): 1761.
Fischer, M.M., Hlavckov-Schindler, K. and Hassoun, M.H. (1995). Fundamentals of Articial
Reismann, M. (1999). A global search procedure for Neural Networks. Cambridge, MA and London,
parameter estimation in neural spatial interaction England: MIT Press.
modelling. Papers in Regional Science, 78(2):
119134. Hastie, T., Tibshirani, R. and Friedman, J. (2001). The
Elements of Statistical Learning. Berlin, Heidelberg
Fischer, M.M., Reismann, M. and Hlavckov- and New York: Springer.
Schindler, K. (2003). Neural network modelling
of constrained spatial interaction ows: Design, Haykin, S. (1994). Neural Networks. A Comprehensive
estimation and performance issues. Journal of Foundation. New York: Macmillan College Publish-
Regional Science, 43(1): 3561. ing Company.
Fischer, M.M., Gopal, S., Staufer, P. and Steinnocher, K. Hertz, J., Krogh, A. and Palmers, R.G. (1991).
(1997). Evaluation of neural pattern classiers for a Introduction to the Theory of Neural Computation.
remote sensing application. Geographical Systems, Redwood City, CA: Addison-Wesley.
4(2): 195226.
Hinton, G.E. (1987). Learning translation invariant
Fogel, D.B. (1995). Evolutionary Computation: Toward recognition in massively parallel networks. In:
a New Philosophy of Machine Intelligence. Bakker, J.W. de, Nijman, A.J. and Treleaven, P.C.
Piscataway, NJ: IEEE Press. (eds), Proceedings PARLE Conference on Parallel
Architectures and Languages Europe, pp. 113.
Fogel, D.B. and Robinson, C.J. (eds) (1996). Com-
putational Intelligence. Piscataway: IEEE Press and
Wiley-Interscience. Hlavckov-Schindler, K. and Fischer, M.M. (2000). An
Foody, G.M., and Boyd, D.S. (1999). Fuzzy mapping incremental algorithm for parallel training of the size
of tropical land cover along an environmental and the weights in a feedforward neural network.
gradient from remotely sensed data with an articial Neural Processing Letters, 11(2): 131138.
neural network. Journal of Geographical Systems, Hornik, K., Stinchcombe, M. and White, H. (1989).
1(1): 2335. Multilayer feedforward networks are universal
Funahashi, K. (1989). On the approximate realization approximators. Neural Networks, 2(5): 359368.
of continuous mappings by neural networks. Neural
Huang, G.-B. and Siew, C.-K. (2006). Real-time learning
Networks, 2(3): 183192.
capability of neural networks. IEEE Transactions on
Gahegan, M. (2000). On the application of inductive Neural Networks, 17(4): 863878.
machine learning tools to geographical analysis.
Janson, D.J. and Frenzel, J.F. (1993). Training product
unit neural networks with genetic algorithms. IEEE
Gahegan, M., German, G. and West, G. (1999). Expert, 8(5): 2633.
Improving neural network performance on the clas-
sication of complex geographic datasets. Journal of Kohonen, T. (1988). Self-Organization and Associative
Geographical Systems, 1(1): 322. Memory. Berlin, Heidelberg and New York: Springer.
Goldberg, D.E. (1989). Genetic Algorithms. Reading, Kuan, C.-M. and White, H. (1991). Articial neural
MA: Addison-Wesley. networks: An econometric perspective. Econometric
Reviews, 13(1): 191.
Gopal, S. and Fischer, M.M. (1996). Learning in
single hidden-layer feedforward network models. Kurkov, V. (1992). Kolmogorovs theorem and
Geographical Analysis 28(1): 3855. multilayer neural networks. Neural Networks, 5(3):
501506.
Gopal, S. and Fischer, M.M. (1997). Fuzzy ARTMAP a
neural classier for multispectral image classication. Leung, K.-S., Ji, H.-B. and Leung, Y. (1997). Adap-
In: Fischer, M.M. and Getis, A. (eds), Recent tive weighted outer-product learning associative
Developments in Spatial Analysis, pp. 306335. memory. IEEE Transactions on Systems, Man, and
Berlin, Heidelberg and New York: Springer. Cybernetics Part B, 27(3): 533543.
Leung, Y. (1997). Feedforward neural network models Processing Systems 5, pp. 607614. San Mateo, CA:
for spatial data classication and rule learning. Morgan Kaufmann.
In: Fischer, M.M. and Getis, A. (eds), Recent
Nocedal, J. and Wright S.J. (1999). Numerical
Developments in Spatial Analysis, pp. 289305.
Optimization. Berlin, Heidelberg and New York:
Springer.
Leung, Y. (2001). Neural and evolutionary computation
Openshaw, S. (1993). Modelling spatial interaction
methods for spatial classication and knowledge
using a neural net. In: Fischer, M.M. and
acquisition. In: Fischer, M.M. and Leung, Y.
Nijkamp, P. (eds), GIS, Spatial Modelling, and Policy,
(eds), GeoComputational Modelling. Techniques and
pp. 147164. Berlin, Heidelberg and New York:
Applications, pp. 71108. Berlin, Heidelberg and
Springer.
New York: Springer.
Openshaw, S. (1994). Neuroclassication of spatial
Leung, Y., Chen, K.-Z. and Gao, X.-B. (2003).
data. In: Hewitson, B.C. and Crane, R.G. (eds),
A high-performance feedback neural network for
Neural Nets: Applications in Geography. pp. 5370.
solving convex nonlinear programming problems.
Boston: Kluwer Academic Publishers.
IEEE Transactions on Neural Networks 14(6):
14691477. Openshaw, S. and Abrahart, R.J. (eds) (2000).
GeoComputation. London and New York: Taylor &
Leung, Y., Dong, T.-X. and Xu, Z.-B. (1998). The optimal Francis.
encodings for biased association in linear associative
memory. Neural Networks 11(5): 877884. Openshaw, S. and Openshaw, C. (1997). Articial
Intelligence in Geography. Chichester: Wiley.
Leung, Y., Gao, X.-B. and Chen, K.-Z. (2004). A dual
neural network for solving entropy-maximising Plutowski, M., Sakata, S. and White, H. (1994).
models. Environment and Planning A, 36(5): Cross-validation estimates IMSE. In: Cowan, J.D.,
897919. Tesauro, G. and Alspector, J. (eds), Advances
in Neural Information Processing Systems 6,
Leung, Y., Chen, K.-Z., Jiao, Y.-C., Gao, X.-B. pp. 391398. San Francisco: Morgan Kaufmann.
and Leung, K.S. (2001). A new gradient-based
neural network for solving linear and quadratic Poggio, T. and Girosi, F. (1990). Networks for
programming problems. IEEE Transactions on Neural approximation and learning. Proceedings of the IEEE,
Networks, 12(5): 10741083. 78(9): 91106.
Ljung, L. (1977). Analysis of recursive stochastic Press, W.H., Teukolky, S.A., Vetterling, W.T. and
algorithms. IEEE Transactions on Automatic Control, Flannery, B.P. (1992). Numerical Recipes in C. The
AC-22: 551575. Art of Scientic Computing. 2nd edn. Cambridge:
Cambridge University Press.
McCulloch, W.S. and Pitts, W. (1943). A logical calculus
of the ideas immanent in nervous activity. Bulletin of Ripley, B.D. (1994). Neural networks and related
Mathematical Biophysics, 5: 115133. methods for classication (with discussion). Journal
of the Royal Statistical Society B, 56(3): 409456.
Mineter, M.J. and Dowers, S. (1999). Parallel
Ripley, B.D. (1996). Pattern Recognition and Neural
processing for geographical applications: A layered
Networks. Cambridge: Cambridge University Press.
approach. Journal of Geographical Systems, 1(1):
6174. Rosenblatt, F. (1962). Principles of Neurodynamics.
Washington DC: Spartan Books.
Moody, J.E. (1992). The effective number of para-
meters: An analysis of generalization and regulari- Rumelhart, D.E., Hinton, G.E. and Williams, R.J. (1986).
zation in nonlinear learning systems. In: Moody, J.E., Learning internal representations by error propa-
Hanson, S.J. and Lippman, R.P. (eds), Advances gation. In: Rumelhart, D.E., McClelland, J.L. and
in Neural Information Processing Systems 4, the PDP Research Group (eds), Parallel Distributed
pp. 683690. San Mateo, CA: Morgan Kaufmann. Processing: Explorations in the Microstructure
of Cognition, pp. 318362. Cambridge, MA:
Murata, N. Yoshizawa, S. and Amari, S. (1993).
MIT Press.
Learning curves, model selection and complexity of
neural networks. In: Hanson, S.J., Cowan, J.D. and Schwefel, H.-P. (1994). Evolution and Optimum
Giles, C.L. (eds), Advances in Neural Information Seeking. New York: Wiley.
Specht, D.F. (1991). A general regression neural Widrow, B. and Hoff, M.E. Jr. (1960). Adaptive
network. IEEE Transactions on Neural Networks, switching circuits. IRE Western Electric Show and
2(6): 568576. Convention Record, Part 4: 96104.
Stepniewski, W. and Keane, J. (1997). Pruning Wilkinson, G.G. (1997). Neurocomputing for
backpropagation neural networks using modern earth observation recent developments and
stochastic optimization techniques. Neural Comput- future challenges. In: Fischer, M.M. and Getis, A.
ing and Applications, 5(2): 7698. (eds), Recent Developments in Spatial Analysis,
pp. 289305. Berlin, Heidelberg and New York:
Tibshirani, R. (1996a). A comparison of some error Springer.
estimates for neural network models. Neural
Computation, 8(1): 152163. Wilkinson, G.G. (2001). Spatial pattern recognition in
remote sensing by neural networks. In: Fischer, M.M.
Tibshirani, R. (1996b). Regression shrinkage and and Leung, Y. (eds), GeoComputational Modelling.
selection via the lasso. Journal of the Royal Statistical Techniques and Applications, pp. 145164. Berlin,
Society B, 58: 267288. Heidelberg and New York: Springer.
Wedge, D., Ingram, D., McLean, D., Mingham, C. Wilkinson, G.G., Fierens, F. and Kanellopoulos, I.
and Bandar, Z. (2006). On global-local arti- (1995). Integration of neural and statistical
cial neural networks for function approximation. approaches in spatial data classication. Geograph-
IEEE Transactions on Neural Networks, 17(4): ical Systems, 2(1): 120.
942952.
Yao, X. (1996). A review of evolutionary artical
Weigend, A.S., Rumelhart, D.E. and Huberman, B.A. neural networks. International Journal of Intelligent
(1991). Generalization by weight elimination with Systems, 8(4): 539567.
application to forecasting. In: Lippman, R., Moody, J.
and Touretzky, D. (eds), Advances in Neural Yao, X. (2001). Evolving computational neural networks
Information Processing Systems 3, pp. 875882. San through evolutionary computation. In: Fischer, M.M.
Mateo, CA: Morgan Kaufmann. and Leung, Y. (eds), GeoComputational Modelling.
Techniques and Applications, pp. 3570. Berlin,
Weng, J. and Hwang, W.-S. (2006). From neural Heidelberg and New York: Springer.
networks to the brain: Autonomous mental devel-
opment. IEEE Computational Intelligence Magazine, Yao, X., Fischer, M.M. and Brown, G. (2001).
1(3): 1531. Neural network ensembles and their application
to trafc ow prediction in telecommunication
White, H. (1989). Learning in articial neural networks: networks. In: Proceedings of the 2001 IEEE-INNS-
a statistical perspective. Neural Computation, 1(4): ENNS International Joint Conference on Neural
425464. Networks, pp. 693698. Piscataway, NJ: IEEE Press.
White, H. (1992). Articial Neural Networks. Approx- Zapranis, A. and Refenes, A.-P. (1998). Principles of
imation and Learning Theory. Oxford, UK and Neural Model Identication, Selection and Adequacy.
Cambridge, USA: Blackwell. With Applications to Financial Econometrics. London:
Springer.
White, H. and Racine, J. (2001). Statistical infer-
ence, the bootstrap, and neural-network modeling Zhang, G., Patuwo, B.E. and Hu, M.Y. (1998).
with applications to foreign exchange rates. Forecasting with articial neural networks: The state
IEEE Transactions on Neural Networks, 12(4): of the art. International Journal of Forecasting,
657673. 14(1): 3562.
21
Geocomputation
Harvey J. Miller
21.1. INTRODUCTION of computational science and the complexity

of natural processes. It also reviews the
Geocomputation concerns the application of motivations for geocomputation, its rela-
high-performance computers to explore and tionship to spatial analysis and geographic
analyze digital representations of the Earth information systems (GIS), and elements
and related phenomena. Geocomputation of a theory of geocomputation. The next
is predicated on the belief that computa- major section reviews selected techniques to
tional techniques are useful for explaining illustrate the application of geocomputation
and predicting geographic phenomena. This principles in spatial analysis. The final
is due to three reasons: (1) astonishing section concludes this chapter by briefly
increases in computational power allowing discussing the future of GC.
new forms of modeling, analysis, and simula-
tion; (2) greater capabilities for collecting and
storing geographic data, allowing unprece-
dented detailed representations of geographic
phenomena, and; (3) a postulate that compu- 21.2. COMPUTATIONAL SCIENCE
tation is meaningful for understanding reality AND COMPLEXITY
beyond the computational process itself, and
21.2.1. Computational science
perhaps better than traditional, analytical
approaches. In contrast with computer science or
The next section of this chapter reviews the the study of computers and computation,
conceptual foundation for geocomputation. computational science (CS) uses comput-
This includes discussions of the broader field ers to study other scientific problems.
CS involves the development and appli- Moores Law

cation of computational techniques using The now famous Moores Law of Integrated
high performance computers to explore Circuits best describes the incredible growth
massive databases and to simulate com- in computing power. In 1965, Gordon Moore
plex and intricate processes. This com- (one of the inventors of the integrated circuit
plements the use of traditional scientific and then Chair of Intel, Inc.) noted that the
techniques such as experimentation, ana- surface area of each transistor being etched
lytical modeling, and statistical techniques. on an integrated circuit was being reduced
These techniques are limited since they by about 50% every 12 months. In 1975,
can only explore a small portion of the he revised this to 18 months. This is known
vast information spaces implied by some as Moores Law: the processing capacity of
phenomena and often require harsh assump- the integrated chip doubles every six months.
tions that are not met in reality, or by Figure 21.1 provides evidence of Moores
the data acquired through measuring reality Law (Kurtzweil, 1999).
(Openshaw, 2000). Moores Law implies exponential growth
One motivation behind CS is that comput- in computational power. We have orders of
ers have become incredibly powerful and will magnitude more computing power available
continue to do so for the foreseeable future. to us than to researchers as recently as
These potent platforms can provide new, rev- 10 years ago, and many orders of magnitude
olutionary tools for scientific investigation. more than the time when many of our analyt-
Another motivation is the collapse in costs of ical, statistical and experimental techniques
data collection and storage. were developed. Perhaps we should re-think
8 000 000
7 000 000
6 000 000
Number of transistors
5 000 000
4 000 000
3 000 000
2 000 000
1 000 000
0
1972 1974 1978 1982 1985 1989 1993 1995 1997
Year
Figure 21.1 Evidence of Moores Law the number of transistors on Intel IC chips (based on
Kurtzweil, 1999).
GEOCOMPUTATION 399
our tools and methods given this growth in This may sound far-fetched. But until
computing power (Openshaw, 2000). recently, science has worked under a similar
but equally pervasive metaphor, namely,
universe as machine. This assumed that the
Data collection and storage universe behaved much like an engine, with
Paralleling the astonishing increases in com- continuous and well-behaved processes with
putational power is an equally stunning effects that are proportional to causes. Most
collapse in the cost of collecting and storing importantly, this implied that the whole is
digital data. The computerization of many equal to sum of the parts, and we can
government and business transactions as understand the whole by studying its parts
well as the increasing capabilities for direct independently. The tools for this exploration
digital data capture through devices such were algebra and calculus: these are tools that
as bar code scanners and environmental examine quantities (magnitudes) in continu-
sensors has greatly reduced the cost of data ous mathematical space (Flake, 1998).
collection. At the same time, database, and There is a fundamental reason why com-
data warehousing techniques have become putation may be a better description of nature
more powerful and affordable (Chen et al., than mechanics: frugality. Natural processes
1996). The hardware costs of storing data have a remarkable ability to extract much
are also a minute fraction of the costs a from a minimal investment of resources:
decade or two ago. These trends are shifting consider, for example, the surface area
science from a data-poor to a data-rich generated by the leaves of a tree or in
environment. the interface between the lungs and the
circulatory system. Similarly, a great deal of
biological complexity results from a code that
consists of only four symbols, namely, DNA.
21.2.2. Nature and complexity
Similarly, computing tries to obtain the most
Just because we are drowning in CPU with the least investment of computational
cycles and data does not mean we should resources and, similar to biological growth,
apply them to understanding non-computer simple computational rules can result in
related phenomena. Computers may not be complex behavior that is not predictable.
appropriate tools for gaining new scientific The property of complexity from simplicity
understanding about reality apart from the in both nature and computation means that
computational process itself. Perhaps com- the whole is greater than the sum of the
puters should just be used to manage our parts: phenomena cannot be understood
data and documents, run our personal digital entirely by independent analysis of their
assistants and cell phones, and coordinate components. This implies a middle path to
transportation and logistics. In other words, scientific knowledge: instead of looking at
just because computers are great for solving the individual or aggregate, we see how the
engineering problems, does not mean that aggregate emerges from the interactions of
they are useful to discover knowledge individuals (Flake, 1998).
about the Earth. However, computational There are some powerful mechanisms in
science is also predicated on a belief that both nature and computation that facilitate
computation can mimic natural processes. complexity from simplicity. These principles
Nature may behave much like a computer; are parallelism, iteration, and adaptation
in fact, perhaps the universe is a computer (Flake, 1998). Complex systems are often
(Kelley, 2002). highly parallel collections of relatively
simple units, for example, consider an there are surprising and useful patterns
ant colony (ants), a brain (neurons), an in these data and representations that are
ecosystem (animals), or a city (people). not being discovered by the analytic and
Parallel systems are more efficient and statistical methods in traditional spatial
robust than sequential systems since they analysis.
can specialize, explore a wider range of Another motivating force for geocompu-
solutions simultaneously, and survive the tation is the increasing recognition of the
failure of many components. Iteration over complexity of the spatio-temporal systems
time allows feedback from the environment of concern in geography and the Earth
to determine the success or failure of sciences (Fischer and Leung, 2001). For
different units and their strategies. Iteration example, the dynamic evolution of an urban
also supports the closely related concept of system emerges from the individual agents
recursion or self-reference. Finally, adapta- of change, their interactions and the co-
tion is a consequence of parallelism and evolution of the context in which these
iteration within an environment with scarce interactions occur. This suggests not only
resources and therefore competition. Many that these systems are more complicated
of the techniques used in computational than previously supposed, but also that we
science incorporate some or all of these cannot engineer their growth; rather, we
principles. can only influence or shape their evolution.
We have seen this time and time again
when a relatively modest change in infras-
tructure or policy (e.g., a new highway
interchange, a change in zoning regulations)
21.3. GEOCOMPUTATION leads to wildly disproportionate outcomes
(e.g., traffic congestion, urban sprawl). This
21.3.1. Motivation
is not defeatist; rather, it suggests humility
Similar to CS, a factor motivating geo- and the need for sophisticated, nuanced
computation is the increasing ability to approaches to understanding and directing
capture, store, and process digital geographic these systems to efficient, equitable and
data. In particular, it is increasingly pos- sustainable outcomes.
sible to capture geo-data at high levels In addition to increasing recognition of the
of spatial and temporal resolution as well complexity of geographic phenomena, it is
as manipulate very detailed representations also likely that intrinsic complexity of these
of geography using geographic information systems is increasing. As the world continues
systems (GIS) and related technologies. to become more crowded, mobile and
Geo-spatial data capture technologies include connected, small local actions can have large-
intelligent transportation systems, hyper- scale outcomes. Saturated road networks
spectral, and laser-based remote sensing mean that an accident in one location results
systems, environmental monitoring devices, in traffic jams across town. Airline networks
and location-aware technologies (LATs) that distribute diseases across continents and
can report their geo-location densely with around the world within hours. The Internet
respect to time. GIS allow analysis of spreads innovative ideas and wild rumors
geographic relationships and morphology throughout the globe nearly at the speed of
at levels of detailed hardly imaginable light. Interconnected financial networks mean
even a short time in the past (Miller that decisions made in a conference room can
and Wentz, 2003). It is possible that have huge economic consequences for large
GEOCOMPUTATION 401
regions thousands of miles away. Population Two major properties conditioned by a

pressure and consequent desertification in geospatial framework are spatial dependency
Central Asia creates air quality problems in and spatial heterogeneity; these refer to the
North America. A crowded planet with high tendency of attributes at locations that are
consumption, mobile and connected lifestyles proximal with respect to shortest paths to
creates complex spatio-temporal dynamics at be related and the tendency of processes to
all geographic scales. vary by location in geospace (respectively).
Many in geography and other Earth Rather than confounding, these properties
sciences have known for quite some time are rich sources of information about spatial
that the world is more sophisticated, and processes (Fotheringham, 2000). In addition,
becoming even more sophisticated, than the many geographic entities cannot be repre-
mechanical metaphor bequeathed to us by sented as simple points in an information
18th century science. The problem is that we space without significant loss. Geographic
did not have the tools or data for dealing entities have morphological properties such
with basic and applied scientific problems as size and shape that can have non-trivial
from a complex system perspective. The effects on their evolution and interactions
geocomputational revolution is shattering (see Miller and Wentz, 2003). GC is the
this barrier. development and application of compu-
tational techniques that are sensitive to
spatial relationships and spatial morphology
inherent in geo-spatial phenomena.
21.3.2. Distinctive features of
geocomputation
An increasing recognition of the complexity
21.3.3. Relationship to spatial
of geographic systems and the collapse in
analysis and geographic
the costs of capturing and storing geographic
information systems
data does not necessarily justify a separate
field of study. It is possible that standard CS At its core, the argument for GC being
techniques could be applied to geographic unique with respect to CS is identical to
phenomena with the same success with which the arguments for the uniqueness of spatial
they are applied to other scientific questions. analysis with respect to statistics, and GIS
What is it about GC that makes it distinct being a unique subset of information systems.
from CS? What, if anything, makes GC unique with
A major distinction is the emphasis on respect to spatial analysis and GIS?
the geospatial framework that conditions the Fotheringham (2000) distinguishes bet-
phenomenon under investigation (Openshaw, ween computer-based spatial analysis and
2000). A geo-space is a set of locations with GC. Computer-based spatial analysis uses the
defined shortest path relationships between computer merely as a convenient tool (i.e.,
all pairs (Beguin and Thisse, 1979). These nothing more than, say, a very fast slide rule
locations often correspond to the Earths or abacus). GC refers to the case where the
surface, although this is not necessarily computer drives the form of spatial analysis
the case. Shortest path relationships are instead of just being a convenient vehicle.
usually based on physical distance, but In other words, in GC the computer plays a
other relationships such as time, direction pivotal role.
and connectivity, or some combination, are GC is more concerned with finding
possible (see Miller and Wentz, 2003). numerical approximations rather than precise
analytical solutions. While this may sound We have also been constructing, storing
like a drawback (and it is), this is a and using maps for 5000 years. In other
necessary trade-off. Traditional modeling words, we can do GIS even without
methods rely on simplistic representations computers, although it would be very slow
of space and behavior in order to facilitate and tedious. GC is about what we could
precise analytical solutions. GC determines not do before the development of powerful
numerical approximations of solutions for computers.
systems with more complex representations In sum, GC uses the traditional techniques
of space and behavior. The argument is that of spatial analysis (statistics, mathematical
it is better to have an approximate solution modeling) and GIS as parts of a more flexible
to a richly represented system than an exact and expansive tool kit. GC is concerned
solution to a sterile representation. Numerical with the use of computational techniques
approximations are necessary consequences and technologies within a scientific frame-
of richer, more accurate representations of work. This involves GIS as the data and
geographic phenomena. information manager, computational methods
Much of the digital geographic data as the tools, and high performance com-
available to researchers no longer meets puting as the driver (Fischer and Abrahart,
many of the assumptions of inferential 2000; Fischer and Leung, 2001; Openshaw,
statistics, including the more relaxed assump- 2000).
tions of spatial analysis. Geographic data is
increasingly no longer carefully structured
and limited samples from a much larger
21.3.4. A theory of
population. Rather, digital geographic data
geocomputation?
are often monitored entire populations (in the
statistical sense) collected using ill-structured Couclelis (1998a, b) provides a more skep-
and noisy methods. Computational tech- tical view of GC. She argues that GC is
niques that do not require strict assumptions in fact a loosely connected grab-bag of
are better suited for these rich but sloppy data techniques rather than a focused scientific
(Atkinson and Martin, 2000). endeavor. She challenges the GC community
GIS provides a source of data and a to develop a rigorous computational theory
toolkit environment for GC. GC is distinct of spatiotemporal processes that justifies the
since it emphasizes dynamic processes over prefix geo.
static form and user interaction over passive Couclelis points out that computational
receipt of information. GC is about matching science is based on the theory of computation,
technology with environment, process with a highly developed and rigorous theory
data model, geometry with application, anal- of what can (and cannot) be computed
ysis with local context, and, philosophy of and how things that can be computed
science with practice (Longley, 1998). We should be. This involves questions such
can also make a distinction between GIS as determining which processes in the
and GC that is similar to Fotheringhams world can be described in the precise
(2000) distinction between computer-based manner required by computation, and what
spatial analysis and GC. In many respects, the is the appropriate language for describing
computer is nothing more than a convenient specific processes. These are much deeper
vehicle for GIS. For example, the overlay questions than what available computational
operation pre-dates much of the development technique is best for a particular data set
of computer-based GIS (see McHarg, 1969). or problem. (For an excellent introduction
GEOCOMPUTATION 403
to the theory of computation, see Sipser 21.4. GEOCOMPUTATIONAL

(1997).) TECHNIQUES
Noting the formal equivalence between
the theory of computing and the theory of This section reviews selected geocompu-
algebra, Couclelis (1998b) also develops a tational techniques, specifically, fractals,
rough typology that demarcates the types of dynamical systems and chaotic behavior,
techniques that can legitimately be called cellular automata, agent-based modeling,
geocomputation. In Table 21.1, the upper and artificial neural networks. This is not
left quadrant contains techniques that are an exhaustive list. Other techniques such as
definitely geocomputational. The lower right geographic knowledge discovery, visualiza-
quadrant contains techniques that are defi- tion, local spatial statistics, and optimization
nitely not geocomputational. The upper right techniques such as genetic algorithms could
and lower left quadrants contain borderline also be included since these are data and
cases. computation-hungry techniques where the
Couclelis also distinguishes between hard computer plays a pivotal role. However, some
GC and soft GC, paralleling a similar of these techniques are better treated indepen-
distinction between hard and soft artificial dently, as indeed they are elsewhere in this
intelligence. Hard GC involves efforts to volume. The survey below intends to provide
understand and represent complex geograph- illustrative examples of geocomputation, as
ical processes using computational tech- well as demonstrate the pervasive theme of
niques. Soft GC refers to the development of complexity from simplicity that underlies
geographic problem representations and solu- most geocomputational techniques.
tions using spatially oriented computational
techniques.
Couclelis (1998a, b) challenge has
21.4.1. Fractals
yet to be met by the GC community.
Answering these fundamental questions Platonic objects such as points, lines, poly-
can indicate the deep connections between gons and solids are simple, smooth and
different geocomputational tools as well as ideal, and typically only result from delib-
between different geographic phenomena; erate design. In contrast, naturally occurring
this will undoubtedly advance the field as objects such as coastlines, clouds, trees and
well as provide it with a more rigorous the human circulatory system are highly self-
foundation. similar, hierarchical and irregular. Each part
Table 21.1 A general classication of geocomputational

and non-geocomputational techniques (Couclelis, 1998b)
Operators
Spatial Nonspatial
Operands Spatial Cellular automata Map classication
Shape grammars Neural networks
Fractals Multimedia imaging
Nonspatial Cartographic labeling Traditional modeling

Global statistics
appears to be a scaled-down version of the becoming important in spatial analysis and

entire object, these self-similar features form geocomputation (Goodchild and Mark, 1987;
a hierarchy, and the boundaries of these Longley, 2000). Fractals can also serve as
objects are highly irregular (Mandlebroit, the basis for spatial sampling strategies and
1983). Also, many human-made objects such other forms of spatial analysis (e.g., Appleby,
as transportation networks and cities that 1996; De Cola, 1991; Lam and Liu, 1996).
grow in a bottom-up, organic manner also Because of their natural look, fractals are
appear irregular, hierarchical and self-similar also becoming popular in computer graphics,
(Batty and Longley, 1994). particularly for rending natural landscapes
A fractal is a class of complex geometric such as mountains or other complex terrain
shapes that have fractional dimension, a con- (Clarke, 1993; Illingworth and Pyle, 1997).
cept first introduced by the mathematician Fractals also provide principles for spatial
Felix Hausdorff in 1918. Benoit Mandelbrot data structures that map two- and three-
coined the term fractal from the Latin word dimensional data to the one-dimensional data
fractus (fragmented, or broken) since structures in computers; examples include
the shapes are irregular rather than smooth space-filling curves such as the Peano
(Batty and Longley, 1994; Encyclopedia curve and the Hilbert curve (Goodchild and
Britannica, 2000). For example, points, lines Mark, 1987).
and polygons have dimensions of zero, one
and two, respectively. In contrast, a fractal
curve has a dimension between one and Measuring the fractal dimension
two, and has a highly irregular and complex As the properties discussed above suggest,
shape, while a fractal region has a dimension fractals are more complex than Platonic
between two and three. objects. The fractal dimension is a measure
Fractals were discovered well over a of this complexity: Mandlebroit (1983) noted
century ago, but were considered to be that the often paradoxical properties of
pathological and monsters (Mandlebroit, fractals (such as enclosing a finite space with
1983). The rise of the digital computer has in infinite boundary, or packing an infinite
facilitated the analysis and appreciation of length into a finite space) are a result of their
these monsters since they seem to be based dimensional discordance.
on the computational principles of iteration The complexity of a fractal object relates
and recursion. Indeed, it is possible that to scale-dependency when measuring its
many natural objects and processes exhibit size. The first person to notice this was
fractal properties since iteration and recursion Lewis Richardson, although this recognition
are efficient ways to grow objects. Many may go back to the ancient Greeks (Batty
fractals can be generated through recursive and Longley, 1994). In 1967, Mandelbroit
functions that are very compact and require published a paper based on Richardsons
little information to encode their algorithm. insight entitled How long is the coast of
Fractals are also very good at maximizing Britain? (Mandlebroit, 1967). The apparent
functionality with minimal resource inputs. length of a coastline seems to increase
Fractals such as the Koch snowflake and whenever the resolution of the measurement
Peano curve can cram an incredible amount unit is increased: higher resolutions mean
of length (in fact, infinite) into a finite area that smaller and smaller features become
(Flake, 1998). relevant, increasing the measured length.
Since many natural and geographic phe- At the extreme, using an infinitely precise
nomena display fractal properties, they are measure, the coast will appear to be infinite
GEOCOMPUTATION 405
in length. Therefore, we must conclude that De Keersmaecker et al., 2003; Shen, 2002)
the length of this naturally occurring object land cover patterns (De Cola, 1989), land-
is meaningless, independent of the scale of scape analysis (Burrough, 1993; Clarke and
measurement. Schweizer, 1991) and riparian networks
We can estimate the fractal dimension of (Phillips, 1993a). Wentz (2000) uses a fractal
an entity by comparing the growth in its dimension measure as a component of a
apparent length or size with the change in the general, trivariate shape measure.
scale of the measurement. Essentially, this is
an attempt to estimate the following power
law (Peitgen et al., 2004): Simulating fractal growth
In addition to fractal analysis of geographic
patterns, it is also possible to simulate fractal
yx d
growth using rule sets and iterated systems.
Simulating fractal growth from finite systems
where y is the size of the object, x is the such as rule sets and iterated systems
measurement scale, and d is an empirical captures a key property of fractal growth
parameter related to the dimension of the in the real world: the ability to generate
object. In practice, estimating this relation- highly complex entities using very simple
ship is complex: there are several definitions processes. Physical, biological and human
and measures of the fractal dimension, not all systems evolve from some baseline appar-
of which agree (see Lam and De Cola, 1993; ently without encoding complex information
Moon, 1992; Peitgen et al., 2004). Common such as systems of simultaneous equations,
fractal dimensions include the similarity, constrained optimization problems, or partial
capacity, and HausdorffBesicovich dimen- differential equations to govern their growth.
sions (Batty and Longley, 1994; Goodchild Rather, real world phenomena may emerge
and Mark, 1987; Williams, 1997). Methods through simple growth mechanisms applied
for calculating these dimensions include recursively. Many methods for simulating
box-counting, compass, area-perimeter, and fractal growth use the powerful technique of
variogram methods (Burrough, 1993; Peitgen recursion to generate complex structures with
et al., 2004). minuscule base information.
Measuring the fractal dimension of geo- Two well-known recursive methods for
graphic phenomena allows determination generating fractals are iterated functional
of its scale-invariance (self-similarity at systems (IFS) and L-systems. The IFS algo-
different scales) as well as other fractal rithm starts with a seed object and maps a
properties such as space-filling and irregu- point on that object back onto itself through
larity. The increasing availability of digital some randomly chosen affine transformation.
geographic data as well as GIS tools This recursive process iterates and the object
for handling these data can support these approaches a fractal object consisting of the
analyses, allowing more detailed examination union of smaller copies of the seed object (see
of the relationships between spatial process Batty and Longley, 1994; Barnsley, 1988;
and geographic form (Batty and Longley, Flake, 1998). L-systems simulate biological
1994; Longley, 2000). Applications include growth through a rule-based system that gen-
spatial population distributions (Appleby, erates progressively complex strings through
1996), transportation network morphology recursively applying the production rules to
(Benguigui and Daoud, 1991), urban mor- the axioms and the strings generated through
phology (Batty and Longley, 1987, 1994; these applications. This results in structures
with fractal properties that can be visualized discrete-time dynamics, with rates of change
using systems such as turtle graphics (Flake, expressed in terms of differences in the
1998; Peitgen et al., 2004). values of variables at different points in time.
Other methods that simulate fractal growth For many years it was assumed dynam-
include tessellation methods such as cellular ical systems exhibited one of three types
automata (White and Engelen, 1993; also of behavior with respect to time (Flake,
see below) and diffusion-limited aggrega- 1998): (1) fixed point (static); (2) periodic
tion (Batty, 1991; Fotheringham et al., (orbit), and; (3) quasi-periodic (orbits that
1989); these methods have been applied never quite repeat themselves). However,
to simulating urban dynamics. Brownian it was also known that certain types of
motion methods have been applied to sim- dynamical systems exhibited behaviors that
ulate natural objects with fractal proper- were intractable analytically. In particular,
ties such as riparian networks, geological non-linear dynamical systems were known
time series and terrain (Goodchild and to be notoriously difficult. Since the rise of
Klinkenberg 1993). the digital computer, it has become easier
Fractal analysis and fractal simulation to study non-linear dynamical systems using
appear to be powerful methods that can numerical simulation. This has led to the
reveal or mimic the structure and processes discoveries that these systems are not just
underlying many natural and human systems. intractable: they show very complex behavior
The critical question remains whether explicit now referred to as chaos.
linkages can be identified between fractal Chaos is not randomness: completely
processes and the natural and behavioral deterministic systems can exhibit chaotic
mechanisms identified from the domain sci- behavior. Yet this behavior is seemingly
ences. It is important to note that some fractal random with respect to prediction: fore-
algorithms are heuristics that imply unrealis- casts about these systems over the long-
tic growth processes. To this end, correspon- run are poor, even though the mechanisms
dence between fractal processes and central of the system are known. In particular,
place theory (Arlinghaus, 1985; Arlinghaus chaotic systems are highly sensitive to
and Arlinghaus, 1989) and von Thnian initial conditions: small differences in the
theories of urban structure (Cavailhs et al., starting points can lead to huge differences
2004) have been established. in their trajectories later in time. Chaotic
behavior seems to be inherent in many
types of nonlinear dynamical systems, even
those with very simple structures: population
21.4.2. Dynamical systems and
dynamics, predatorprey dynamics, weather,
chaotic behavior
and the stock market are all examples of real
A dynamical system is a system that world processes that can be difficult to predict
experiences some change or motion. Many even if we know the underlying mechanics
(if not most) natural and human made (Flake, 1998).
systems are dynamic. The traditional way
to study dynamical systems is through
differential equations and difference equa- Chaotic behavior and strange attractors
tions. Differential equations are continuous- Two well-known non-linear systems that
time equations where one or more of the generate chaotic behavior are the Lorenz
variables are rates of change expressed attractor and generalized LotkaVolterra
as derivatives. Difference equations capture systems. The Lorenz attractor consists of
GEOCOMPUTATION 407
three linked differential equations that (Flake, 1998; Peitgen et al., 2004; Williams,
model convection flow in weather sys- 1997).
tems. Generalized LotkaVolterra systems
model predatorprey relationships through
n linked differential equations, where n is Spatial chaos
the number of species. This system displays The non-linear dynamical systems we have
of wide range of dynamic behavior under discussed thus far exhibit temporal chaos,
different parameterizations, including chaotic that is, chaotic behavior in the dynamic
behavior (Flake, 1998). evolution of aggregate system parameters.
The Lorenz attractor and generalized A reasonable question is whether temporal
LotkaVolterra systems capture many of the chaos can lead to spatial chaos or complex
characteristics of chaotic dynamical systems. spatial patterns that exhibit a high degree
Both are non-linear and incorporate feed- of sensitivity to conditions at particular
back: for example, in LotkaVolterra systems locations. Theoretically, it turns out that
the number of predators affects the number spatial chaos can emerge from temporal
of prey through culling the latter, while in chaos under very broad conditions: unless
turn the number of prey affects the predators the system is perfectly isotropic with respect
that can be supported. Both systems are very to space, spatial chaos will emerge and
simple, but generate very complex behavior increase over time (Phillips, 1993b, 1999a).
behavior that often cannot be distinguished Given the broad conditions under which
from randomness. However, the trajectories spatial chaos can emerge, it is not surprising
of these systems contain order, at least in that spatial chaos has been detected in
a global sense. Finally, these systems are physical and human geographic models
hypersensitive to initial conditions, with the and data. These include physical systems
consequence that while short-term behavior such as geomorphologic, hydrological, and
can be predicted, long-term predictions are ecological systems (see Phillips, 1999a, b),
meaningless (Williams, 1997). retail dynamics (Wilson, 2006), economic
An attractor is the bounded region within systems (White, 1990), urban systems (Wong
the phase space towards which dynamic and Fotheringham, 1990), and spatial choice
systems evolve: examples include the fixed processes (Nijkamp and Reggiani, 1990).
point, period, and quasi-periodic behaviors There are three general approaches to
mentioned above. Chaotic systems are char- detecting spatial chaos (Phillips, 1993b). One
acterized by strange attractors. The system method is to test for sensitivity to initial
evolves within a finite space, but with an conditions by analyzing the Lyapunov expo-
infinite period: visiting every location within nents: these describe the average rate of con-
the region but never the same location twice. vergence or divergence of two neighboring
Consequently, the calculated dimension of trajectories in phase space (Williams, 1997).
chaotic trajectories will often be fractional: A second method is numerical simulation:
contained within a finite area, but space- simulate and plot the behavior of the system
filling. These trajectories are often infinitely in phase space, and analyze the plot using
self-similar: increasing the resolution of the graphical techniques. A third approach is to
calculations and subsequent trajectory plots examine an empirical temporal or spatial
will reveal the same structure repeatedly. series for signatures of chaos, with the
Thus, there is a deep linkage between fractals latter series derived by generating a spatial
and chaos: both exhibit the computational gradient by choosing some transect across
principle of complexity from simplicity space (see Phillips, 1993b).
Although techniques exist for detecting organization, configuration, pattern, dynam-

spatial chaos, this is nevertheless challenging ics, transformation, and change (White and
since chaos often co-exists with stochastic Engelen, 1997).
uncertainty in real-world systems. Therefore,
it can be difficult to extract the chaotic
signature, particularly with systems that CA components
have a large number of variables and/or A cellular automaton consists of the fol-
with datasets that are small and imperfectly lowing components (Batty, 2000). The basic
measured. Consequently, a major challenge element of a CA is the cell. A cell is
is to develop detection techniques that work a memory element that stores different
given these real-world conditions (Phillips, states. In the simplest case, each cell can
1993b; Williams, 1997). have the binary states 1 or 0. In more
complex simulation the cells can have several
different states. The cells are arranged in a
regular, discrete spatial configuration, usually
a lattice. However, the grid configuration is
21.4.3. Cellular automata
not required; see OSullivan (2001) and Shi
Cellular automata (CA) are discrete spatio- and Pang (2000). The state of each cell for
temporal dynamic systems based on local the next time step is based on the states
rules. Using relatively simple rule sets, CA of the cells in its neighborhood. Common
can generate very complex spatio-temporal definitions of neighborhoods include von
dynamics, including chaotic behavior (Flake, Neumann (a neighborhood with radius = 1
1998). The potential for CA in spatial following the rooks case), Moore (an
analysis has been recognized for quite some enlargement of the von Neumann neigh-
time. In a pioneering paper, Waldo Tobler borhood to contain diagonal cells), and
describes the theoretical foundation for a extended Moore (a Moore neighborhood
cellular geography and defines five general with radius = 2). It is also possible to
classes of models, with CA being one relax the assumption of neighborhood to
of these classes (Tobler, 1979). Couclelis allow non-local effects (see Takeyama and
(1985, 1988) followed this with discussions Couclelis, 1997).
of the potential of CA for capturing micro Transition rules determine the state of a
macro spatial dynamics and the emergence cell at time t + t based on the pattern
of complex geographic systems from simple of cell states within its neighborhood at
behaviors. time t. The set of transition rules are finite
CA are becoming very popular in geo- and constant across all cells. The number of
graphic research for a number of reasons. possible transition rules can be enormous.
One is that they are inherently and explicitly If is the number of possible states and
spatio-temporal. A second reason is that h is the size of the neighborhood, then the
they are computationally efficient and can number of possible cell patterns is p = h .
be applied to problems with very high Given these patterns, there are R = p
spatial resolution. There is also a natural different transition rules for the cell. For
link to GIS. GIS provides a platform for example, in a Moore neighborhood with
managing the spatial data required for CA. binary states, there are 29 = 512 possible
In return, CA allow GIS to go beyond static, cell patterns and 2512 = 1.340780793 E 154
geometric representations to include non- possible rule sets; a number that is larger
localized spatial processes such as spatial than the number of elementary particles
GEOCOMPUTATION 409
in the universe (Batty, 2000). However, Despite the increasing popularity of CA

although the number of possible transition in geocomputational modeling, standard CA
rules is enormous, in practice the rules are have some restrictions that can cause some
typically very simple and refer to aggregate concerns when modeling geographical pro-
properties rather than detailed patterns within cesses. One limitation is the assumption of
neighborhoods. timespace stationarity: a cells future state
Empirical studies by Wolfram and others is completely characterized by the states in
show that even the simple linear automata its neighborhood according to static transition
above behave in ways reminiscent of rules. Cells have no inherent characteristics
complex biological systems. For example, that can affect its transitions. Therefore,
the fate of any initial configuration of a given configuration of cells in the neigh-
a cellular automaton is to (1) die out; borhood of that cell will result in the same
(2) become stable or cycle with fixed transition regardless of that cells location in
period; (3) grow indefinitely; (4) grow and space and time (White and Engelen, 1997).
contract in a complex manner (Wolfram, Phipps and Langlois (1997) note that time
1984). The important implication of these space stationarity is particularly problematic
properties is that models of complex systems when modeling geographic processes such as
need not be complex themselves: simple land use dynamics. The location of a parcel
rules can generate the complex behav- of land with respect to the rest of the system
ior we associated with biological entities, can affect its land use over time. Similarly,
ecosystems, economic systems, and cities the conditions that affect land use at one
(Couclelis, 1988). time can change; for example, zoning laws
may change as vacant parcels are filled and
pressure builds to relax zoning restrictions.
Global from local Another problem is the assumption of
A very important property of cellular unconstrained transitions: the number of
automata is the emergence of global patterns cells in each state is determined endoge-
from local rules. The transition rules that nously by the application of transition rules
drive the system are purely local: each cells to the current configuration with no recog-
future state is based on neighboring cells nition of potential exogenous constraints.
only. Yet, higher-level global patterns and Li and Yeh (2000) address the unconstrained
structure emerge from these purely local transition problem by incorporating environ-
rules. The system self-organizes at the global mental constraints into their CA-based urban
level: there are no overarching rules yet dynamics model.
global patterns emerge. Applications of CA A third weakness of using CA to model
span a wide range of emergent geographic geographic processes is that a deterministic
phenomena: these include urban dynamics rule set is unrealistic. Other unobserved fac-
and land-use (Clarke and Gaydos, 1998; tors (such as individual choice) can influence
Clarke et al., 1997; White and Engelen, state transitions. Phipps and Langois (1997)
2000; Xie, 1996; also see Benenson and develop a stochastic framework for CA-based
Torrens, 2004, chapter 4), wildfire propaga- modeling of land-use dynamics. Also see
tion (Clarke et al., 1994), traffic simulation de Almeida et al. (2003), Batty (2000), and
(Esser and Schreckenberg, 1997) as well Wu (2002) for discussions and examples of
as physical geographic phenomena such as stochastic CA.
forest succession, land cover, and species Scale is also a critical issue in CA model-
composition (see Parker et al., 2003). ing of geographic phenomena. Scale issues
are inherent in the choice of cell size as The objective is to simulate the dynamics of
well as the neighborhood definition. Mnard complex systems through the behaviors and
and Marceua (2005) perform a sensitivity interactions of its individual agents. Agents
analysis of scale and the resulting spatial can represent people, households, animals,
patterns and dynamics in a CA model of land- firms, organizations, regions, countries, and
cover change. They discover substantial, so on, depending on the scale of the analysis
non-linear relationships between these scale and the elemental units hypothesized for
issues and the simulation results. that scale.
Similar to CA, ABM in many respects
exemplifies the geocomputational approach.
ABM is motivated by the view that many
21.4.4. Agent-based modeling
geographic phenomena are emergent: sim-
An agent is some independent unit that ple processes generate complex structure
tries to fulfill a set of goals in a complex, and patterns. In addition, the increasing
dynamic environment. These goals can be availability of high-resolution data and GIS
end goals or ultimate states that the agents tools for handling these data facilitate
try to achieve, or they can be some type ABM in geographic research (Benenson and
of reinforcement or reward that the agent Torrens, 2004).
attempts to maximize. The environment can
be very general, and often includes other
agents. An agent is autonomous if its actions Generative geographic science
are independent, i.e., it makes decisions ABM is a critical tool in a distinct, generative
based on its sensory inputs and goals. An approach to science that focuses on the
agent is adaptive if its behavior can improve following question: How could the decen-
over time through some learning process tralized local interactions of autonomous
(Maes 1995). Agents interact by exchanging agents generate a given pattern? The analyst
physical or virtual (informational) resources. attempts to answer this question by situ-
These interactions are typically very simple: ating an initial population of autonomous
they can be described by a small set of agents in a relevant spatial environment
rules. From the pattern and intensity of these and allowing them to interact according
interactions emerge complex behavior that is to simple rules, thereby generating the
not completely predictable or controllable: macroscopic regularity from the bottom up.
it materializes from the interactions of these If the analyst can reproduce the macro-
rules (Flake, 1998). scopic pattern, than the microspecifica-
The agent perspective is very general: tion is a candidate explanation (Epstein,
many systems can be viewed as collections of 1999).
autonomous, adaptive, and interacting agents. ABM is well-suited as a central tool in gen-
In agent-based modeling (ABM), we are erative science due to the following realistic
concerned with simulated agents (software characteristics (Epstein, 1999): (1) hetero-
representations) as opposed to embodied geneity agents represent individual entities
agents (such as humans). Multi-agent systems with unique characteristics that can change
(MAS) are ABMs that contain a distribution over time, as opposed to the static, aggregate
of simulated and interacting agents (Boman representative agents in traditional social
and Holm, 2004). ABM and MAS are and other sciences; (2) autonomy there
bottom-up, individual-based approaches to is no central control over individuals in
simulating physical and human phenomena. agent-based models, except for feedback
GEOCOMPUTATION 411
between macrostructures to microstructures and human movement at micoscales (Batty

(such as newborn agents learning social et al., 2003).
norms or shared culture), and therefore no Automata-based modeling such as ABM
need to postulate an abstract central authority and CA does have some substantial weak-
or governing equations to facilitate the nesses and challenges (Epstein, 1999). First,
modeling; (3) explicit space agent behavior automata-based modeling lacks standards
and events occur in an explicit space, whether for model comparisons and replication of
it is real (e.g., geographic) or abstract (e.g., results (Axtell et al., 1996). Unlike analytical
social networks); (4) local interactions modeling, subtle design differences (such
agents interact only with other agents and as asynchronous versus synchronous agent
environmental factors within some bounded updating) can make huge differences in
regional of space and time; and (5) bounded the results, and there are no standards for
rationality agents have limited information, reporting these decisions. Second, solution
often based on their local neighborhoods, and concepts are weak: a simulation run is
limited abilities to process this information. only one possible path of a (typically)
Similar to CA, explicit representations of stochastic process, not a general solution.
space and local neighborhoods for interaction Consequently, there is need for careful
make ABM a natural tool for analyzing experimental design to fully explore or
geographic phenomena. Unlike CA, the sample from the information space implied
neighborhood over which an agent interacts by the model. However, this leads to a
can be more fluid and flexible. In addition, third challenge. While the parameter space
an agent can be mobile: in addition to given a postulated rule set is usually
changing its state, it can change its location small, the space defined by combinations
in space over time. In some respects, these of possible agent rules can be enormous
distinctions are arbitrary and historical: CA and difficult to explore fully (recall the
and ABM can be seen as special cases earlier discussion regarding the number of
of a more general geographic automata potential CA rules). However, this can be
system containing spatially fixed (CA) and mitigated to some degree by theory: similar
non-fixed (ABM) automata (Benenson and to any good modeling, theoretical correctness
Torrens, 2004). Similarly, Boman and Holm should help distinguish between plausible
(2004) argue that time geography (see and implausible rules.
Hgerstrand, 1970) can serve as a unifying
principle for ABM and the older tradition
of microsimulation by providing a more
21.4.5. Articial neural networks
explicit representation of real-world spatial
and temporal constraints on agent behaviors Continuing development and deployment
and interactions. of technologies for capturing geographic
ABM have been applied in diverse information such as remotely sensed imagery,
domains such as economics (see Epstein, vehicle-based GPS receivers, flow gauges,
1999; Tesfatsion and Judd, 2006), environ- and automated weather reporting stations
mental management (Gimblett et al., 2002; are generating huge but error-prone datasets.
Hare and Deadman, 2004), land-use/land- To exploit this noisy data requires tech-
cover change (Parker et al., 2003), urban niques that are robust (fault-tolerant) and
dynamics (Benenson and Torrens, 2004), scalable (process large databases in rea-
societies and culture (Epstein and Axell sonable, perhaps even real, time). Artificial
1996), transportation (Balmer et al., 2004), neural networks (ANNs) are an important
class of computational techniques that can Also, integrating ancillary information into
exploit noisy data, as well as solve difficult remote sensing to aid in classification also
optimization problems. increases the complexity of the problem
ANNs are an analog to biological neu- (Fischer and Abrahart, 2000). ANNs have
ral networks such as the brain. Biologi- considerable promise as pattern classifiers
cal neurons adjust their firing frequencies that can effectively handle the vast and noisy
over time to other neurons in response information in remotely sensed imagery and
to the firing frequencies from their input imagery combined with ancillary data (see
neurons. Some of these neurons are con- Foody, 1995; Gong et al., 1996; Hepner
nected to external sensors (such as eyes). et al., 1990).
Through a learning process, the biological In contrast to pattern classification, we
neural networks adjust firing frequencies often have the case where we do not have any
until an appropriate response is achieved pre-specified categories for the data. Instead,
(e.g., ideas, behavior). An ANN replicates we wish to find natural groupings or clusters
(on a very limited scale) the behavior of the data based on inherent similarities
and connectivity among biological neurons and dissimilarities. Cluster analysis refers
in a brain. ANNs adapt their structure to attempts to classify a set of objects into
based on subtle regularities in the input classes or clusters such that objects within
data. They are robust with respect to error a cluster are similar while objects between
and can find patterns in noisy data in a clusters are dissimilar. Unsupervised ANNs
short amount of time. ANNs offer these such as Kohonen Maps are a type of neural
advantages over brittle statistical methods clustering where weighted connectivity after
that require strict, well-behaved and known training reflects proximity in the information
error distributions (Fischer and Abrahart, space of the input data (see Flexer, 1999).
2000). ANNs have been used to cluster river flow
data into different event types (Fischer and
Abrahart, 2000).
ANN application modes ANNs can also be viewed as a type
ANNs are very flexible and can be of universal function approximation tech-
applied in many different modes, including nique. Assume a large stream of paired
pattern classification, clustering, function inputs and outputs generated from some
approximation, forecasting, and optimization unknown noisy function. We can view
(Fischer and Abrahart, 2000). ANNs as an attempt to approximate the
Pattern classification involves assigning unknown function with an approximate
input patterns into one of several prespecified function determined by the pattern of
categories. Supervised classification is one of weights in the ANN (Fischer and Abrahart,
the central problems in remote sensing: each 2000). Applications of ANNs as function
pixel must be classified into one of several approximations include spatial interaction
known land cover classifications based on its (Fischer and Gopal, 1994; Gopal and Fischer,
spectral signature and perhaps other spectral 1996; Mozolin et al., 2000; Nijkamp et al.,
signatures in its neighborhood. However, tra- 1996) and spatial interpolation (Rizzo and
ditional methods for supervised classification Dougherty, 1994).
in remote sensing are failing relative to The problem of function approximation is
the vast amount of information available in very similar to the problem of forecasting
emerging remote sensing technologies that events over space and time. Formally, the
have high spatial and spectral resolution. problem is: given a set of n samples of
GEOCOMPUTATION 413
a time series, predict the value(s) of n + 1, processes across networked computers,

n + 2, . . . . (Fischer and Abrahart, 2000). exploiting unused resources in these
See Hill et al. (1996) for more detail on clients. Grid environments can rival
ANNs in time series forecasting. Applications the performance of a high-performance
in physical geography include rainfall-runoff mainframe at a fraction of its cost (Armstrong
responses (Smith and Eli, 1995; Fischer et al., 2005).
and Abrahart, 2000). Gopal and Scuderi Over the longer run, computational power
(1995) use ANNs to predict sunspot cycles should continue to increase at its exponential
and solar climate conditions. ANNs are rate for several more decades. Moores
also being used to predict traffic conditions Law was developed specifically to describe
and flow within transportation networks; see electronic computing based on the integrated
Dougherty et al. (1994). circuit. However, Kurtzweil (1999) notes
ANNs have also proven effective at that an exponential increase in computing
solving complex optimization problems. This capabilities has been occurring for over a
requires transforming a given optimization century. This includes the mechanical com-
problem to a neural network representation. puter (19001930), electromagnetic comput-
Applications to classic optimization problems ers (1930mid 1940s), vacuum tube comput-
include the traveling salesman problem, ers (mid-1940s to 1956), transistor computers
scheduling, and the knapsack problem (see (19561968), and the current paradigm of
Peterson and Sderberg, 1993). integrated circuit based computing (1968
2030?). Thus, Moores Law of integrated
circuits is only a special case of a more
general trend that may continue through the
21.5. CONCLUSION: THE FUTURE 21st century. Limits to integrated circuit
OF GEOCOMPUTATION engineering techniques, as well as the
laws of physics, could, however, mean
The future of geocomputation can be summa- an end to this growth sometime within
rized in the following sentence: more data, the next few decades. But even with a
more power, and greater access to both. conservative estimate of reaching the limits
Data collection and storage costs continue in the year 2030, we are still looking
to fall, and computational power continues at over twenty more years of continued
to increase and will likely increase through exponential growth in computing. It is also
the mid-part of the 21st century, perhaps possible that another computing paradigm
beyond, and the nature of computer interfaces may emerge that may shatter these limits.
is changing. For example, quantum computing would not
Currently emerging are new computing only shatter these limits, but would also
environments that have great potential for require an entire new theory of simultaneous
geocomputation. Parallel processing has computation.
promise in geocomputation and GIS since The nature of the interface between
most procedures can be decomposed into computation, data collection, and information
parallel tasks or data streams. However, access is also changing. We are currently
this decomposition is not trivial due to in an era of ubiquitous or pervasive com-
the overhead involved (see Healy et al., puting characterized by the connection of
1998; Mineter and Dowers, 1999; Turton, things in the world through computational
2000). Grid computing environments devices that are small, lightweight, embed-
software (called middleware) can distribute ded in other things (such as automobiles,
cell phones, and home appliances) and often information analysis: A reconnaissance. Professional
Internet-enabled. The continuation of this Geographer, 57: 365375.
trend is the nanoclients that are extremely Atkinson, P. and Martin, D. (2000). Introduction.
small and specialized. Nanoclients include In: Atkinson, P. and Martin, D. (eds), GIS
wearable computers, smart dust, and wire- and Geocomputation, pp.17. London: Taylor and
less geo-sensor networks. These extremely Francis.
thin clients combined with very fat Axtell, R., Axelrod, R., Epstein, J.M. and Cohen, M.D.
high-performance servers can revolutionize (1996). Aligning simulation models: A case study
geocomputation. Not only do nanoclients and results. Computational and Mathematical
Organization Theory, 1: 123141.
allow for ambient geographic data collection,
but the environment itself can become a type Balmer, M., Nagel, K. and Raney, B. (2004). Large-scale
of computer. Space becomes a metaphor for multi-agent simulations for transportation applica-
tions. Journal of Intelligent Transportation Systems:
itself, landscapes or maps become models of Technology, Planning, and Operations, 8: 205221.
themselves, and geographic objects become
context-aware and know their own positions Barnsley, M. (1988). Fractals Everewhere. London:
Academic Press.
and relationships to other geographic objects
(Clarke, 2003). Batty, M. (1991). Cities as fractals: Simulating growth
The continuing increase in computing and form. In: Crilly, T., Earnshaw, R.A. and Jones, H.
(eds), Fractals and Chaos, pp. 4169. Berlin:
power, capabilities for collecting and stor- Springer-Verlag.
ing geo-spatial data, and the merging of
computation with the geographic environ- Batty, M. (2000). Geocomputation using cellular
automata. In: Openshaw, S. and Abrahart, R.J. (eds),
ment will require entirely new modes of
GeoCompuatation, pp. 95126. London: Taylor and
thinking about computation in general and Francis.
geocomputation in particular. While there
Batty, M., Desyllas, J. and Duxbury, E. (2003). The
will always be limits to computing (at least
discrete dynamics of small-scale spatial events:
as we now understand it) the phenomena and Agent-based models of mobility in carnivals and
problems that can be analyzed and under- street parades. International Journal of Geographical
stood through geocomputational methods Information Science, 17: 673697.
are limited as much by our creativity and Batty, M. and Longley, P. (1987). Fractal dimensions of
imagination. urban shape. Area, 19: 215221.
Batty, M. and Longley, P. (1994). Fractal Cities. London:
Academic Press.
Beguin, H. and Thisse, J.-F. (1979). An axiomatic
REFERENCES approach to geographical space. Geographical
Analysis, 11: 325341.
Appleby, S. (1996). Multifractal characterization of Benenson, I. and Torrens, P. (2004). Geosimulation:
the distribution pattern of the human population. Automata-based Modeling of Urban Phenomena.
Geographical Analysis, 28: 147160. Chichester, UK: John Wiley.
Arlinghaus, S.L. (1985). Fractals take a central place.
Benguigui, L. and Daoud, M. (1991). Is the suburban
Geograska Annaler, 67B: 8388.
railway a fractal? Geographical Analysis 23:
Arlinghaus, S.L. and Arlinghaus, W.C. (1989). The frac- 362368.
tal theory of central place geometry: A Diophantine
Boman, M. and Holm, E. (2004). Multi-agent
analysis of fractal generators for arbitrary Loschian
systems, time geography and microsimulations. In:
numbers. Geographical Analysis, 21: 103121.
Olsson, M.-O. and Sjstedt, G. (eds), Systems
Armstrong, M.P., Cowles, M.K. and Wang, S. Approaches and their Applications, pp. 95118.
(2005). Using a computational grid for geographic Dordrecht: Kluwer Academic.
GEOCOMPUTATION 415
Burrough, P.A. (1993). Fractals and geostatistical Couclelis, H. (1998b). Geocomputation and space.
methods in landscape studies. In: Lam, N.S.-N. Environment and Planning B: Planning and Design,
and De Cola, L. (eds), Fractals in Geography, 25: 4147.
pp. 187121. Englewood Cliffs, NJ: Prentice-Hall.
De Cola, L. (1989). Fractal analysis of a classied
Cavailhs, J., Frankhauser, P., Peeters, D. and Landsat scene. Photogrammetric Engineering and
Thomas, I. (2004). Where Alonso meets Sierpinski: Remote Sensing, 55: 601610.
An urban economic model of a fractal metropolitan
De Cola, L. (1991). Fractal analysis of multiscale spatial
area. Environment and Planning A, 36: 14711498.
autocorrelation among point data. Environment and
Chen, M.S., Han, J. and Yu, P.S. (1996). Data mining: Planning A, 23: 545556.
An overview from a database perspective. IEEE
de Almeida, C.M., Batty, M., Monteiro, A.M.V.,
Transactions on Knowledge and Data Engineering,
Cmara, G., Soares-Filho, B.S., Cerqueira, G.C.
8: 866883.
and Pennachin, C.L. (2003). Stochastic cellular
Clarke, K.C. (1993). One thousand Mount Everests? automata modeling of urban land use dynamics.
In: Lam, N.-S. and De Cola, L. (eds). Fractals in Computers, Environment and Urban Systems, 27:
Geography, pp. 265281. Englewood Cliffs, NJ: 481509.
Prentice-Hall.
De Keersmaecker, M.-L., Frankhauser, P. and Thomas, I.
Clarke, K.C. (2003). Geocomputations future at (2003). Using fractal dimensions for characterizing
the extremes: High performance computing and intra-urban diversity: The example of Brussels.
nanoclients. Parallel Computing, 29: 12811295. Geographical Analysis, 35: 310328.
Clarke, K.C., Brass, J.A. and Riggan, P.J. (1994). Dougherty, M.S., Kirby, H.R. and Boyle, R.D. (1994).
A cellular automaton model of wildre propagation Using neural networks to recognise, predict and
and extinction. Photogrammetric Engineering and model trafc. In: Bielle, M., Ambrosino, G. and
Remote Sensing, 60: 13551367. Boero, M. (eds), Articial Intelligence Applications
to Trafc Engineering, pp. 233250. Utrecht, The
Clarke, K.C. and Gaydos, L.J. (1998). Loose-
Netherlands: VSP.
coupling a cellular automaton model and GIS:
Long-term urban growth prediction for San Epstein, J.M. (1999). Agent-based computational
Francisco and Washington/Baltimore. International models and generative social science. Complexity,
Journal of Geographical Information Science, 12: 4(5): 4160.
699714.
Epstein, J.M. and Axtell, R. (1996). Growing Articial
Clarke, K.C., Hoppen, S. and Gaydos, L. (1997). Societies: Social Science from the Bottom Up.
A self-modifying cellular automaton model of Cambridge, MA: MIT Press.
historical urbanization in the San Francisco Bay area.
Esser, I. and Schreckenberg, M. (1997). Microscopic
Environment and Planning B: Planning and Design,
simulation of urban trafc based on cellular
24: 247261.
automata. International Journal of Modern Physics C,
Clarke, K.C. and Schweizer, D.M. (1991). Measuring the 8: 10251036.
fractal dimension of natural surfaces using a robust
Fischer, M.M. and Abrahart, R.J. (2000). Neurocom-
fractal estimator. Cartography and Geographic
puting: Tools for geographers. In: Openshaw, S. and
Information Systems, 18: 3747.
Abrahart, R.J. (eds), GeoComputation, pp. 187127.
Couclelis, H. (1985). Cellular worlds: A framework for London: Taylor and Francis.
modeling micro-macro dynamics. Environment and
Fischer, M.M. and Gopal, S. (1994). Articial neural
Planning A, 17: 585596.
networks: A new approach to modeling interre-
Couclelis, H. (1988). Of mice and men: What rodent gional telecommunication ows. Journal of Regional
populations can teach us about complex spatial Science, 34: 503527.
dynamics. Environment and Planning A, 20: 99109.
Fischer, M.M. and Leung, Y. (2001). Geocomputational
Couclelis, H. (1998a). Geocomputation in context. modeling techniques and applications: Prologue.
In: Longley, P.A., Brooks, S.M. McDonnell, R. and In: Fischer, M.M. and Leung, Y. (eds), GeoCom-
Macmillan, B. (eds), Geocomputation: A Primer, puational Modeling: Techniques and Applications,
pp.1729. New York: Wiley. pp. 112. Berlin: Springer.
Flake, G.W. (1998). The Computational Beauty of study of sunspot prediction and solar climate trends.
Nature: Computer Explorations of Fractals, Chaos, Geographical Analysis, 27: 4259.
Complex Systems and Adaptation. Cambridge, MA:
Hgerstrand, T. (1970). What about people in regional
MIT Press.
science? Papers of the Regional Science Association,
Flexer, A. (1999). On the use of self-organizing maps 24: 721.
for clustering and visualization. In: Zytkow, J.M.
Hare, M. and Deadman, P.J. (2004). Further towards
and Rauch, J. (eds), Principles of Data Mining and
a taxonomy of agent-based simulation models
Knowledge Discovery, Lecture Notes in Articial
in environmental management. Mathematics and
Intelligence 1704, 8088.
Computers in Simulation, 64: 2540.
Foody, G.M. (1995). Land cover classication by an
articial neural network with ancillary information. Healey, R., Dowers, S., Gittings, B. and Mineter, M.
International Journal of Geographical Information (eds) (1998). Parallel Processing Algorithms for GIS.
Systems, 9: 527542. London: Taylor and Francis.
Fotheringham, A.S. (2000). GeoComputation analysis Hepner, G.F., Logan, T., Ritter, N. and Bryant, N.
and modern spatial data. In: Openshaw, S. and (1990). Articial neural network classication using
Abrahart, R.J. (eds), GeoComputation, pp. 3348. a minimal training set: comparison to conven-
London: Taylor and Francis. tional supervised classication. Photo- grammetric
Engineering and Remote Sensing, 56: 469473.
Fotheringham, A.S., Batty, M. and Longley, P. (1989).
Diffusion-limited aggregation and the fractal nature Hill, T., OConner, M. and Remus, W. (1996).
of urban growth. Papers of the Regional Science Neural network models for time series forecasts.
Association, 67: 5569. Management Science, 42: 10821092.
Gimblett, H.R., Richards, M.T. and Itami, R.M. Illingworth, V. and Pyle, I. (1997). Dictionary of
(2002). Simulating wildland recreation use and Computing, New York: Oxford University Press.
conicting spatial interactions using rule-driven intel- Kelley, K. (2002). God is the machine. Wired, 10.12.
ligent agents. In: Gimblett, H.R. (ed.), Integrating Available at www.wired.com.
Geographic Information Systems and Agent-based
Modeling Techniques for Simulating Social and Kurtzweil, R. (1999). The Age of Spiritual Machines:
Ecological Processes, pp. 211243. Oxford, UK: When Computers Exceed Human Intelligence.
Oxford University Press. New York: Penguin
Gong, P., Pu, R. and Chen, J. (1996). Mapping eco- Lam, N.-S. and De Cola, L. (1993). Fractal measure-
logical land systems and classication uncertainties ment. In: Lam, N.S.-N. and De Cola, L. (eds), Fractals
from digital elevation and forest-cover data using in Geography, pp. 2355. Englewood Cliffs, NJ:
neural networks. Photogrammetric Engineering and Prentice-Hall.
Remote Sensing, 62: 12491260. Lam, S.-N. and Liu, K. (1996). Use of space-lling
Goodchild, M. and Klinkenberg, B. (1993). Statistics of curves in generating a national rural sampling frame
channel networks on fractional Brownian surfaces. for HIV/AIDS research. Professional Geographer, 48:
In: Lam, N.S.-N. and De Cola, L. (eds), Fractals 321332.
in Geography, pp. 122141. Englewood Cliffs, NJ:
Li, X. and Yeh, A.G.-O. (2000). Modelling sustainable
Prentice-Hall.
urban development by the integration of constrained
Goodchild, M. and Mark, D. (1987). The fractal nature cellular automata and GIS. International Journal of
of geographic phenomena. Annals of the Association Geographical Information Science, 14: 131152.
of American Geographers, 77: 265278.
Longley, P (1998). Foundations. In: Longley, P.A.,
Gopal, S. and Fischer, M.M. (1996). Learning in Brooks, S.M., McDonnell, R. and MacMillan, B. (eds),
single hidden-layer feedforward network mod- Geocomputation: A Primer, pp. 315. New York:
els: Backpropagation in a spatial interaction John Wiley.
modeling context. Geographical Analysis, 28:
Longley, P. (2000). Fractal analysis of digital spatial
3855.
data. In: Openshaw, S. and Abrahart, R.J. (eds),
Gopal, S. and Scuderi, L. (1995). Application of GeoComputation, pp. 293312. London: Taylor and
articial neural networks in climatology: A case Francis.
GEOCOMPUTATION 417
Maes, P. (1995). Modeling adaptive autonomous Peitgen, H.-O., Jrgen, H. and Saupe, D. (2004). Chaos
agents. In: Langton, C. (ed.), Articial Life: An and Fractals: New Frontiers of Science, 2nd edn.
Overview, pp. 135162. Cambridge, MA: MIT Press. New York: Springer.
Mandlebroit, B.B. (1967). How long is the coast Peterson, C. and Sderberg, B. (1993). Articial neural
of Britain? Statistical self-similarity and fractional networks, in Reeves, C. R. (ed.) Modern Heuristic
dimension. Science, 155: 636638. Techniques for Combinatorial Problems, New York:
John Wiley, 197242.
Mandlebroit, B.B. (1983). The Fractal Geometry of
Nature, New York: W.H. Freeman. Phillips, J.D.. (1993a). Interpreting the fractal dimension
of rivers. In: Lam, N.S.-N. and De Cola, L. (eds),
McHarg, I.L. (1969). Design with Nature, 1st edn.
Fractals in Geography, pp. 142157. Englewood
Garden City, NY: Natural History Press.
Cliffs, NJ: Prentice-Hall.
Mnard, A. and Marceau, D.J. (2005). Exploration
Phillips, J.D. (1993b). Spatial-domain chaos in land-
of spatial scale sensitivity in geographic cellular
scapes. Geographical Analysis, 25: 101117.
automata. Environment and Planning B: Planning
and Design, 32: 693714. Phillips, J.D. (1999a). Earth Surface Systems: Complex-
ity, Order and Scale. Oxford, UK: Blackwell.
Miller, H.J. and Wentz, E.A. (2003). Representation and
spatial analysis in geographic information systems. Phillips, J.D. (1999b). Spatial analysis in physical
Annals of the Association of American Geographers, geography and the challenge of deterministic
93: 574594. uncertainty. Geographical Analysis, 31: 359372.
Mineter, M.J. and Dowers, S. (1999). Parallel processing Phipps, M. and Langlois, A. (1997). Spatial dynamics,
for geographic applications: A layered approach. cellular automata and parallel processing computers.
Journal of Geographical Systems, 1: 6174. Environment and Planning B: Planning and Design,
24: 193204.
Moon, F.C. (1992). Chaotic and Fractal Dynamics: An
Introduction for Applied Scientists and Engineers. Rizzo, D.M. and Dougherty, D.E. (2004). Characteri-
New York: John Wiley. zation of aquifer properties using articial neural
networks: Neural kriging. Water Resources Research,
Mozolin, M., Thill, J.-C. and Usery, E.L. (2000). Trip
30: 483498.
distribution forecasting with multilayer perceptron
neural networks: A critical evaluation. Transportation Shen, G. (2002). Fractal dimension and fractal
Research B, 34: 5373. growth of urbanized areas. International Journal of
Geographical Information Science, 16: 419437.
Nijkamp, P. and Reggiani, A. (1990). Logit models and
chaotic behaviour: A new perspective. Environment Shi, W. and Pang, M.Y.C. (2000). Development of
and Planning A, 22: 14551467. Voronoi-based cellular automata: An integrated
dynamic model for geographical information
Nijkamp, P., Reggiani, A. and Tritapepe, T. (1996).
systems. International Journal of Geographical
Modelling inter-urban transport ows in Italy: A com-
Information Science, 14: 455474.
parison between neural network analysis and logit
analysis. Transportation Research C, 4C: 323338. Sipser, M. (1997). Introduction to the Theory of
Computation. Boston, MA: PWS Publishing.
Openshaw, S. (2000). Geocomputation. In: Openshaw, S.
and Abrahart, R.J. (eds), GeoComputation, Smith, J. and Eli, R.N. (1995). Neural-network models of
pp. 131, 293312. London: Taylor and Francis. rainfall-runoff processes. Journal of Water Resources
Planning and Management, 121: 499509.
OSullivan, D. (2001). Exploring spatial process
dynamics using irregular cellular automaton models. Takeyama, M. and Couclelis, H. (1997). Map dynamics:
Geographical Analysis, 33: 118. Integrated cellular automata and GIS through
geo-algebra. International Journal of Geographical
Parker, D.C., Manson, S.M., Janssen, M.A.,
Science, 11: 7391.
Hoffmann, M.J. and Deadman, P. (2003).
Multi-agent systems for the simulation of land- Tesfatsion, L. and Judd, K.L. (2006). Handbook
use and land-cover change: A review. Annals of Computational Economics, Volume 2: Agent-
of the Association of American Geographers, Based Computational Economics, Amsterdam:
93: 314337. North-Holland.
Tobler, W. (1979). Cellular geography. In: Gale, S. and regional systems. Computers, Environment and
and Olsson, G. (eds), Philosophy in Geography, Urban Systems, 24: 383400.
pp. 379386. Dordrecht: D. Reidel.
Williams, G.P. (1997). Chaos Theory Tamed.
Turton, I. (2000). Parallel processing in geography. Washington, DC: Joseph Henry Press.
In: Openshaw, S. and Abrahart, R.J. (eds), GeoCom-
Wilson, A.G. (2006). Ecological and urban systems
puatation. pp. 4966. London: Taylor and Francis.
models: Some explorations of similarities in the
Wentz, E.A. (2000). A shape denition for geographic context of complexity theory. Environment and
applications based on edge, elongation and perfora- Planning A, 28: 633646.
tion. Geographical Analysis, 32: 95112.
Wolfram, S. (1984). Universality and complexity in
White, R.W. (1990). Transient chaotic behaviour in cellular automata. Physica D, 10: 135.
a hierarchical economic system. Environment and
Wong, D.W.S. and Fotheringham, A.S. (1990).
Planning A, 22: 13091321.
Urban systems as examples of bounded chaos:
White, R. and Engelen, G. (1993). Cellular automata exploring the relationship between fractal dimen-
and fractal urban form: a cellular modelling approach sion, rank-size and rural-to-urban migration.
to the evolution of urban land-use patterns. Geograska Annaler, 72B: 8999.
Environment and Planning A, 25: 11751199.
Wu, F. (2002). Calibration of stochastic cellular
White, R. and Engelen, G. (1997). Cellular automata as automata: The application to ruralurban land
the basis of integrated dynamic regional modeling. conversions. International Journal of Geographical
Environment and Planning B: Planning and Design, Information Science, 16: 795818.
24: 235246.
Xie, Y. (1996). A generalized model for cellu-
White, R. and Engelen, G. (2000). High-resolution lar urban dynamics. Geographical Analysis, 28:
integrated modeling of the spatial dynamics of urban 350373.
22
Applied Retail Location
Models Using Spatial
Interaction Tools
Morton E. OKelly
22.1. RETAIL LOCATIONAL have a demand for services also, but for
ANALYSIS1 that business it could be some combination
of over-the-counter sales (light fixtures) and
22.1.1. Spatial retail location more substantial electrical equipment sold to
contractors and builders. A business with a
The demand by consumers for retail goods traditional central market place location (in
and services is a function of the attributes an older mixed use inner city neighborhood
of the commodity, household income, and for example) might conceivably want to
other factors such as home ownership status. branch out its locations to catch the growth
For example, a home improvement store is in the suburbs and even the outlying
likely to target a market with a housing communities in the hinterland of that main
stock that has lots of possibilities for repair, market. In fact there are so many different
upgrades, and remodels. Both home-owning ways to imagine the dynamics of retail site
and renting populations might yield adequate location that there is a real need for a general
density of demand, but the effective demand purpose simulation tool that might enable the
for goods and services by homeowners is estimation of the merit of various growth
much more likely to be attractive to this par- proposals (Baker, 2000; Munroe, 2001). In all
ticular service. An electrical supplier would these cases, it is important to have an accurate
estimate of the spatial distribution of effective where they are located, and that they have
demand as arising out of a combination of income that covers the price and market
preferences, and disposable income. segment of the goods. In common with many
Central place theory has long held levels of retail operation, many of the most
that there is a hierarchy of goods, from successful chains study a massive amount
frequently demanded inexpensive items to of geo-demographic profile data that enables
high-end expensive goods. There is both a rich portrait of customers and consumer
a higher spatial frequency of demand for behavior to inform the merchandize and
(and provision of) the so-called lower-order market planning of their operations.
goods, and a corresponding scarcity on the
landscape of higher-order goods. Thus for
every Mercedes or Lexus dealership in the
22.1.2. Consumer demand and
city there might be numerous Ford and
behavior 3
Toyota dealerships. The higher the order of
goods provided, one assumes that there is Measuring the total income and pool of
a wider market scope required to provide expenditure is accomplished by combining
sufficient demand to cover the operating costs a count of households by geo-demographic
of the business (the so-called threshold). cluster (e.g., Claritas PRIZM, MapInfo
Similarly, the higher-order goods, because PSYTE, ESRI Tapestry, AGS Mosaic etc.),
of their relative scarcity on the landscape, the index value for each group (m), the pen-
require longer trip lengths; the break even etration rate and some index of average
calculation for the retailer is whether the per household expenditure.4 To calculate
spatial extent of the market required to the potential pool of expenditure for the
cover costs is matched by a corresponding zone i and commodity c a formula such as
willingness of consumers to travel to the the following might be used:
center for the goods (see the classic study
by Berry (1967)).
Inexpensive low order goods are some- Oic = Nim ymc
over all groups
times sold in combinations with higher priced
items from superstores that do not necessarily
have a small range: they can in fact be where Oic is the demand in zone i for
attractive over a large distance, provided commodity c, Nim is the number of group m
the assortment and price point allows the households in zone i, and ymc is an
large agglomerated retailer to undercut the expenditure rate per household cluster m on
smaller more widely dispersed providers of commodity c.
retail services. This formula is used by From this aggregate, demand shares
Wal-Mart or other big box retailers; they allocated to a particular store have two
have a large assortment of goods, and price components: on the one hand the share is
points that are competitive, and locations that smaller for the more distant competitive
in themselves act as a magnet for spatial stores (holding other factors constant), but
interaction (Munroe, 2001). Customers travel additionally it is felt that the demand for
to stores and therefore the spatial interaction a store increases with the accessibility of
of the purchasers must be recognized as that origin zone to any shopping destination.
an important behavioral factor. The retail Zonal accessibility, and hence aggregate
and trade area service location problem demand is a function of where the stores
requires knowledge of what customers want, are located, and so unlike conventional
APPLIED RETAIL LOCATION MODELS 421
location models, we should not treat demand foot, then a spatially varying parameter
as an exogenous factor (OKelly, 1999). might trend significantly with location given
While it would be naive to say build the socio-economic patchwork of the city,
it and they will come, it is certainly by analogy with a similar argument in
reasonable to think that the provision of the context of house prices (Fotheringham
retail services can induce demand for that et al., 2002).
service that would otherwise be allocated One way to make operational estimates
to other discretionary uses. Some insight is through spatial interaction models. These
and market-based intelligence is needed to models are the topic of this chapter, which
capture the correct demand parameters and covers a variety of models largely inspired by
sensitivity to locational access. The basic several years experience as both an applied
accessibility of each zone can be predefined, and as a theoretical exploration of retail sales
and the demand in the immediate area of and interaction.
a new potential store opening can increase,
as a result of improved accessibility. One
practical estimation approach that can be
22.1.3. The role for models
effective is to have a variety of alterative
sources of judgment (like the so-called Among the most basic general questions
Delphi method, and a variant of the for spatial interaction modelers are the
judgmental methods advocated for several following: Where do the customers come
years by Seldin (1995)) with perhaps one from? What are the spatial interaction
figure coming from an estimate of per capita patterns governing the distribution of
expenditure and saturation, one coming from distance and attraction parameters? What is
pro forma estimates of expected sales per the probability that a customer at i patronizes
square foot and yet another estimate coming a store at j? Conditional upon the location of
from an experienced local commercial real i what is the probability of being a customer
estate professional. The best model is likely of destination j?
to use some aspect of these data as controls on
the judgmental estimate. In other words, no
analyst will simply apply a sales per square Example
foot figure to an arbitrary new built store and A grocery store has an upper income target
say that the expected sales are a product of consumer. Their research shows that these
the coefficient and the store size. Much more are very likely to be loyal customers of
likely is an analysis that takes the current the produce and fresh foods departments
sales situation of the competition into account (which in turn are highly profitable assuming
and then projects how much of these existing that stock can be turned over rapidly to
sales can be captured by the new proposed avoid waste/spoilage). In seeking new store
location. Even more significant is recent locations, where are there sufficient pockets
research that has shown that whatever the of un-met demand among this target popula-
general relationship between the variables, tion? In the analysis of existing own-brand
the strong likelihood of spatial variability in stores, which may or may not be currently
such a relationship ought to be taken into well-located vis--vis the standard customer
account (Fotheringham et al., 2002; Rust profile, is there any need to modify their type
and Donthu, 1995). Thus, if a cross-sectional of store to meet consumer needs?
regression analysis provided evidence of a These kinds of questions can be answered
coefficient of say $30 weekly sales per square with spatial models. Before we get into the
details of how to formulate and apply such a Models can also be used as a tool in
model, it may be very helpful to get a preview assessment of complex strategic questions.
of some of the uses to which a model might For example, a chain that is considering
be put. One common usage is in impact opening a new branch in a growing suburb
analysis. With a fitted model, purporting might be faced with the question of whether
to describe the allocation of consumers to to keep an existing older store in a nearby
demand centers, we can estimate impact on location. The question is then one of strategy:
remaining stores if a branch is to be closed, do stores A and B together make a better
or indeed if we open a new one. Both of combined profitable solution than the option
these changes have impacts across the system of closing B, presumably giving A an
of stores, but of course the first law of even greater new opening sales level, but
geography (Tobler, 1970) which holds that possibly exposing the chain to the risk that
things are more highly interrelated when they a competitor might take the abandoned site?
are in close proximity, leads us to expect that Not only does the decision hinge on the
the impacts are greatest on the centers and aggregate sales of the various combinations
competitors closest to the site of the change. of open stores but it also must answer
Other uses for fitted models in locational questions about the probable impact of
analysis include assessment of the desirability competitors. Retailers engage in strategic
of overhauling various stores or facilities. behavior, and open or close locations as part
The applied retail analyst is often asked of a system of decisions; such analyses often
to estimate the impact of a change on the include issues of pre-emption and blocking
expected sales of the store: thus having competition, and beating competitors to the
a model which has as its independent punch in new areas of expansion (Ghosh and
variables some measures which can be Craig, 1983).
adjusted to reflect the new attraction of the Models are also useful in assessing
store can be useful to estimate the change ongoing measures of store performance and
in the retail trade area, expected sales, and may be used in this way as an early
so on. By estimating a well-fitted model to warning of emerging shifts in the market.
these data, we replace the specifics of the Assuming that the chain can collect data
data instance with a model that has effects throughout its system on the performance of
these are systematic influences on the trends each store, and some appropriately calculated
in the levels of spatial interaction, and are variables to describe the stores site and
likely to include roles for distance and situation the analyst can embark on the
retail attraction (typical basic variables in SI kind of analog assessment made popular
models see Guy (1991)). In more elaborate in the early days of quantitative analysis.
settings these models can also include many This method, in its modern guise, uses the
other independent variables (see especially stores sales (as a dependent variable) and
the Multiplicative Competitive Interaction a selection of measurements of the trade
Models MCI Nakanishi and Cooper area characteristics, and develops a multiple
(1974)). Once these models are fitted, the regression model to assess the expected
analyst can then dial in various changes in (or predicted) sales vs. the actual observed
the driver variables, and assuming that the levels. Fundamental to this operation is a
model is reasonably robust to changes in meaningful definition of trade area: it makes
these data, the impact of the changes on the no sense to include measurement of the
expected sales and interaction levels can be attributes of areas far away from a store,
determined. if indeed it is known that few if any shoppers
come from that area. So, in other words, or attempting to reinvigorate the system by
the measurement of the trade area of the investing a lot of money into the regional
store becomes the first and most important advertising campaign. It all boils down to
operation. There is no hard and fast consensus choices, and these choices are best informed
on how to define a trade area, and much by analytic models.
more will be said on this matter later. For The ease of obtaining a good fit
now, suffice it to say that the trade area to the model will clearly vary across
could be objectively defined as an area within sectors. Department store sales volumes are
5 min drive time of the store. That leads notoriously difficult to predict, in that their
to a computation of the demand that exists aggregate sales volume is a combination
within that area, and that could be one of of the various heterogeneous departments,
the independent variables. (Clearly if we use and the extent of competition for spe-
more sophisticated definitions of trade areas, cific categories in these stores could very
the trade area demand calculation would have well vary in an unsystematic way across
to be re-computed.) locations. On the other hand goodness-
Independent variables collected for all the of-fit for convenience related stores such
stores are saved as the columns of a table. as grocery chains are likely to be quite
GIS is especially helpful to calculate features acceptable, in that there are a few predictable
of trade area and give a quantitative descrip- variables that are very highly correlated
tive nature of a trade area. The dependent with the aggregate performance of the
variable is the actual sales performance: there store. For example, the stores size, its
is often a challenge obtaining these data population base, and the immediate com-
(i.e., weekly sales) in academic research; but petitive environment undoubtedly account
it is important to know how these data would for the bulk of the store-to-store variation
be used in an applied case study. in sales levels. Thus, it is expected that
Basing store closing decisions on this kind the coefficient of store size, and population
of result places a lot of faith in the fitted and competition will be significant, and
model (and so the importance of regression that the resulting fitted model will have a
diagnostics, measures of goodness of fit, strong R-square.5 Refinements to the model
and significance levels on the estimated to include regional dummy variables and
coefficients). What looks like an under- other more precise measures of target market
performer may not actually be an instant demand (through surrogates such as parking
argument for store closure: for instance a studies, or traffic flow) are likely to help to
store is projected to draw $400,000 per week, improve the model.
but actual sales come in at $350,000 (i.e., Some sectors lend themselves readily to
$50,000 below the regression line). While analysis by multivariate regression models
all might agree that it could be doing better (grocery stores) but others require a different
(i.e., it is performing below potential) there approach. If a shoe store, book store, or
may be good reasons that the store has branch of a chain of clothing stores is
not yet reached its full potential. It might typically located in shopping centers, then the
be under attack from particularly aggressive analyst might use the center as a surrogate for
competitors, be poorly managed, or it might the size of the market in which the individual
be built over-sized in anticipation of further store is located (see also Prendergast et al.,
population growth in the area. The store may 1998). Similarly if a chain of this type is
suffer from a depressed regional economy, planning to enter a new regional market,
and so a chain may consider shutting it down it could very well limit its attention to
the shopping centers. This type of work is flows provides an idea of the likely inflow
useful because it is frequently necessary to to each of the unconstrained destination
manage thousands of location across many trip ends:
areas/regions.
It is hard to get information on gross sales
(what is also called turnover in the British Dj = Pij = Ai Oi Wj exp(bCij ).
i i
literature) in academic case studies, though
practitioners and consultants can of course
gain access to their clients data as part of The production constrained model leaves the
their confidentiality agreement. Many of the amount and type of flow arriving at each
ideas in this chapter have been framed as a center or store open to calculation. With
result of real world experience. In practice, such calculated inflows, the analyst has an
one has access to lots of data; in theory one access to a predictive model for the likely
might have to learn these techniques in a data composition and size of any centers for its
vacuum, recognizing that the proprietary data capture area. Think of a column of the spatial
would become available to a consultant doing interaction matrix that leads to a specific
these analyses for a private sector client. This destination as a listing of the contributions
perhaps accounts for the lack of precision in to that particular destination. Of all the flows
the published literature a lot of literature in that arrive at the destination, we may estimate
retailing location modeling is quite imprecise the percentage that comes from each one of
mathematically and the details are often not the surrounding regional sources. From all
published in a way that makes verification of those, the core or primary contributors
and validation easy. may be determined by sorting the origins
from largest to smallest and cumulating their
contributions until arriving at a subset that
contributes a very significant fraction of
22.1.4. Consumer choice
the total business of the store of interest.
The probabilistic assignment of consumers This is none other than Applebaums (1966)
to retail destinations can be formulated as concept of primary trade area being the
a production constrained spatial interaction region from which a particular store draws
model: a high percentage (say 75%) of its business.
Pij = Ai Oi Wj exp(bCij ).
22.2. ANALYSIS WITH RETAIL
Such models calculate the probability that a
TRADE AREA MODELS
user at a specific origin location will select
22.2.1. Spatial interaction6
one from a number of available alternative
attractive destinations. If these destinations Spatial interaction models in general assume
are shopping centers, for example, the that interaction is determined by the
attraction of those centers can be represented attraction of the alternative facilities and
by a measure of their total retail square by the distance separating the consumer
feet of selling area. Once a calibrated from those alternatives. Huff (1962, 1963,
production constrained spatial interaction 1964) and Lakshmanan and Hansen (1965)
model has been formulated for a specific set are credited with developing specialized
of destinations, the estimated table of such retail variants of the spatial interaction
based allocation model. From an operational geographic market area. Instead each stores
perspective, Huff introduced a practical market area is a probabilistic surface that
approach to defining the attraction of a shows the probability of a customer from
center as the amount of floor space, rather each small geographic area patronizing that
than the population of the surrounding area as facility. The exact nature of this probabil-
was commonly used in previous models. This ity surface depends on the parameters of
opened up the interpretation of attractiveness the spatial interaction model. Incorporating
and allowed it not only to be determined spatial interaction models into a location
by a number of variables (e.g., number allocation model represents the state of the
of functions, parking capacity, etc.) but art in modeling retail site selection.
also allowed attractiveness to be treated
as an independent variable that could be
estimated in its own right. Another major
22.2.2. Primary trade area
operational consideration was that Huff fitted
the exponent for distance in trip-making Imagine a store attracting customers from
behavior (the influence that distance has surrounding census tracts or city blocks. Such
on a consumers store choice) to particu- data have long been analyzed by proponents
lar circumstances. Finally, he introduced a of the applied school of retail trade area anal-
balancing term that constrained the sum of ysis (Applebaum, 1966). As a starting point,
individual or zonal travel or sales to fit within examine the distribution of the customers of
an overall travel or sales limit. a particular store, with regard to their origins.
With respect to the attractiveness or If the store has a weekly volume of V , then
drawing power of a facility, Huffs use of the customer distribution is used to spread
retail floor space has been widely adopted around that demand to the originating areas,
and adapted to include other important in proportion to their draw of customers.
characteristics. Most important, though, this That spatially distributed demand in turn can
model demystified the idea of drawing power be compared to the potential pot of money
or attraction and allowed its direct estimation that exists in those zones available to be
by focusing on the weight associated with it. spent somewhere, in order to compute a
Nakanishi and Cooper (1974) were particu- measure of store penetration of the market.
larly effective at utilizing Huffs probabilistic From the data, the top 75% (say) of the
choice framework and operational perspec- sales area may be devised, followed by
tive to develop a linearization procedure for the next 20% and the rest (all these are
direct estimates of attractiveness. The MCI hypothetical numbers). Unless some added
model is one of the best tools available spatial constraints are added, it is important
for the allocation of consumer demand to to note that it is not essential for the top
facilities. The main advantages of this model contributing area to a store to be compact
is that it can incorporate a variety of attributes (having for example disconnected outliers).
of the facilities under consideration by the Analytically, the primary trade area, P, is

consumer, yet it is easy to estimate. In cases defined such that iP Pi | j = 0.75 and
where more data on the influence of various the secondary trade area, S, is defined such

store attributes are available, the MCI model that iS Pi | j = 0.20. The remaining or
is apt to provide a more accurate estimation tertiary trade area, captures the remainder of
of market share than the original Huff model. the customers, often sparsely dispersed over
With spatial interaction models, then, a very wide area. For most practical purposes
facilities no longer have a well-defined in the convenience sector, tertiary areas are
irrelevant to routine operations. On the other tools to diagnose practical issues in the
hand, significant shopping centers drawing trade areas effectiveness, for example, by
from a large region may well have to treat indicating untapped sales potential, the need
the marginal sales to the edge of their for more intense marketing, or special
tertiary area as significant icing on the sales circumstance arising from unique factors
forecast, and may in fact be the key to (ethnicity, mobility, etc.).
understanding top-performing locations.
Retail executives are especially interested
in market share, strength versus direct 22.2.4. Connecting retail location
competitors, and in the yield of customers models and competing
from a pool of potential sales dollars. It destinations
seems that the only thing worse than a
store that has a small sales level is one Retail locational analysis is frequently carried
with a large volume but under-performing out with the aid of spatial interaction
its projected potential! These analyses are modeling. Many features of the trade area
directed to the question: how well are are derived from calculations based on either
our stores capturing the market? Are we actual customer origins (from a survey) or
leaving potential sales untapped? Or are our from a model of such a distribution that has
competitors out-maneuvering us? Penetration been fitted from observations. In either case
of the market area hinges on an assessment assume that the probability that a customer
of how much demand is available there, and in area i shops in store j is given by Pij . This
how much our particular branch is capturing. joint probability can be further manipulated
to give Pi | j and Pj | i , respectively these are:

22.2.3. Characterization of the Pi | j = Pij / Pij is the conditional
i
demography of the
trade area probability that a customer who
shops in j originates from i,
The attributes and weights of demand from
the particular types of respondents in the
trade area can then be recovered. Say, for and:
example, that the numbers of household
in the various tracts that have particular

levels of household income are given. Many Pj | i = Pij / Pij is the conditional
j
useful statistics can be computed from
these data. Among these are the expected probability that a customer from
values of customer characteristics over the origin i shops in zone j.
primary, secondary, and tertiary trade areas
respectively. For example, if we have a
defined area that encloses the primary trade It is this later probability that is highly
area, and the total volume of expenditure useful as it allows a prediction from a given
in that area is X, then the total volume zone i, of how much traffic or business might
attracted to the store of interest from within be expected to arrive at a destination in
that same are is Z, the ratio of X to Z zone j, and this of course can be applied
is very useful information about penetration either to pre-existing stores (to check model
of the market. These analyses provide the fit and validity) as well as the use of the
model to forecast the likely patronage of a is corrected, the expectation might be that
new or proposed location at j. In that these peripheral residents might show a willingness
probabilities are analytically derived from to travel to distant alternatives at a rate
data that are exogenously available (travel that exceeds those of the comparatively well
times, demand expenditure parameters, and served central residents.
so on) they are quite easily manipulated to This notion of a process at one density
give forecasts of what if for cases where regime being adapted for other situations
there are expected changes in the data or the was nicely foreshadowed in Berrys (1967)
parameters. This kind of sensitivity analysis classic work on commercial centers when
can provide a useful cross check on the the expected sales territory size was con-
validity of the model for example, a trasted in low density rural Iowa with the
sensitivity analysis should predict changes more commercially dense built up areas of
that make sense. Further, extreme values Chicago. Thus there is some interest in
of the parameters often provide consistency whether this theory might be adapted to a
checks in that the model collapses to other more dense urban retail scenario. In the retail
easily recognized forms in these special scenario the central or core resident has lots
circumstances: thus a model with a distance of alternatives within short range, and these
decay parameter collapses to an all-or- can provide opportunity for multipurpose
nothing nearest center allocation model in trips and shopping on a scale that combines
the case that the beta parameter is driven multiple activities. As Eaton and Lipsey have
to the extreme value. In this case the trade shown, such retail agglomerations then gain
area should take on characteristics such more from their collocation than they lose
as that seen in the Voronoi diagram or from the presence of intensified competition.
Thiessen polygons. Thus the theory of competing destinations
In macro spatial analysis (e.g., at the scale developed at a primarily interurban scale
of interregional interactions) the peripheral might be refined for the case of flows within
areas have, by definition, lower access to the an urban area, and indeed the opportunity
dense cluster of the urban core. So, for a to make multi-purpose trips to clusters of
resident of the periphery the number of com- shops in a city might lead to an expected
petitive alternatives in short range is com- agglomeration effect: what we might coin the
paratively small, and according to the theory cooperative destinations effect arising from
of competing destinations (Fotheringham, spillovers in retail demand (see early theory
1983), the demand is therefore spread over of Eaton and Lipsey, 1982).
few alternatives (hence is not divided up
so thinly). It would be expected therefore
that interaction levels over short distances
are enhanced (and comparably the interaction 22.3. CALCULATIONS
over the longer distances is spread thinly,
22.3.1. Data issues
and hence the slope of the flow vs. distance
curve is steeper than it would be expected An interesting aspect of retail trade area
to be, absent a spatial structure effect). At analysis is that the most commonly col-
macro scales then the large beta for peripheral lected data (choice-based samples) are not
zones results from mis-specification, and especially well suited to direct manipulation
does not correctly imply that there are in calibration (see a series of papers on
larger distance decay impacts for peripheral choice based samples by OKelly (1999) and
residents; in fact, once the mis-specification Ding and OKelly (2008)). Choice based data
from frequent shopper cards at the point of Very large energy costs cause a contraction
sale or from check based data can tell us in peoples willingness to travel long distance
the distribution of actual demand around a or make excess discretionary trips; instead
current store. Clearly the interest in these one would expect two countervailing forces:
data from a predictive point of view is to be to make a smaller number of multipurpose
able to use them to devise some origin based trips to major agglomerations would serve
parameters such that the trade area attributes to support the development of a small
that determine the store success/failure can number of heavily clustered mega malls;
be studied and translated into parameters that on the other hand the smaller willingness
can predict how a proposed new location to travel might cause a stronger tendency
(assuming that represented stores provide a to use the closer alternatives and activate
decent analog for the new operation) might the incentive to build a series of small
be expected to perform. One could expect decentralized regional centers. This trade-off
to take data about existing operations, and between agglomeration and convenience is
develop a list of those parameters of the trade an interesting empirical question.
area that are expected to correlate heavily
with good retail performance. The interaction
model is simply an improved way to gather
22.3.2. Determination of market
data and summarize standardized aspects of
effectiveness and
these trade areas to provide data about the
penetration
branches. In applications, these data can then
be entered into regression or other models to The idea in retail interaction modeling
determine the different aspects of the trade is to use a probabilistic estimate of the
areas that are especially highly correlated demand originating in each sub-area, and
with successful operations. its likelihood of being spent at a particular
An important step in managing a retail store of interest. It is convenient, though
trade area data set is to understand the scope perhaps increasingly less realistic, to assume
and reach of the center to the areas sur- that the pool of available money is all
rounding the store. In fundamental economic allocated to bricks and mortar stores, and
geography we learn concept of the range of that the demand is a simple function of
the good: this is the maximum distance a the population, its income, and expenditure
customer would be willing to travel to reach habits. With that assumption it is possible
the store. This maximum radius or reach has to take readily available census expenditure
relevance for the concept of spatial interac- data and predict how much would be
tion and trade areas as there is clearly no available for particular product categories in
necessity to include demand from a place that each micro-demographic area. Such micro
is so far from the store as to be unable to reach marketing data have been used with great
that stores trade area. Distance impedance precision by the package goods industry,
and maximum travel radius are critical to the car industry, banks, and retailers in general.
accurate specification of gravity models. In These applications represent one of the most
the case of a maximum travel radius, one powerful uses of the gravity model. Some
has to be sure to set up a spare or dummy industry specific intelligence is needed with
destination to allow for demand that has no regard to the reasonable range of potential
feasible option within range to be parked destinations from the point of view of an
there pending either some additional site, or origin. This is because it is necessary to
some relaxation of the maximum range. be able to make an all-inclusive list of the
probabilistic choice sets that exist or that do with such data alone is to talk about
might provide opportunities for the shoppers the residents in a particular subareas and
to make choices. To adapt this base case their probability of being a customer. For
to the more realistic case of alternative those who are customers (and for those who
non-spatial alternatives (in competition with are not) we need some additional way to
conventional alternatives), we need to be measure reasons as to why or why not. To
able to estimate leakage from an origin get at these added questions we either need
area to electronic, catalog, and on line prior theoretical expectations, or to employ
purchases. From the retailers point of view a survey to ask residents in a residential
at a specific location, it is necessary to be area about their reasons for shopping or not
able to circumscribe the potential originating shopping at our chain. As surveys tend to
zones from which the trip makers might be very expensive, a controlled theoretical
be attracted. For a convenience-oriented choice experiment is perhaps a worthwhile
store like a supermarket, one can imagine future framework for such destination choice
a reasonably compact service area. For problems (see Eagle, 1984).
department stores, or retailers co-located with From these two sources of data detailed
attractions that can draw from farther places intelligence about the trade areas of the
(think of Mall of America as a destination), it various branches can be accumulated and
is perhaps a little more difficult to know the the results used to characterize the stores; if
universe of the attraction, and hence difficult there are added data from the retailer about
to make computations of the share of the which stores are under- or over-performing,
attraction provided for by local or further we could do some correlation analysis, or
away origins. perhaps data envelopment analysis (Donthu
and Yoo, 1998) which allows a gauge of
performance vis--vis peer benchmarks.
22.3.3. Performance assessment of

existing stores
It is reasonable to assume that the primary 22.3.4. Impact assessment
trade area, which accounts for say 70% of
One of the most frequently asked question
the branch business is key to characterizing
from an applied perspective is to determine
the stores potential customers. In an applied
the loss of sales at existing stores to new
context, working for a retailer, we would
entrants or competitive analysis for the diver-
need them to provide us with some measure
sion of existing dollars to the store of interest
for each store of the total retail volume
either from ones own chain (cannibalism) or
and perhaps some breakdown by product
preferably from competition.
line or class, and also an indication from
Impacts of changed conditions are quite
the stores perspective if the chain regards
well accommodated by the gravity model,
the branch as successful. With the sales data
because the difference between two scenarios
we can produce measures in the surrounding
may be quite instructive. The impact of new
zip codes for sales/household and this could
store k on existing store j, from the point of
give some indication of penetration rate.
view of zone i, is measured as:
From that we can characterize the trade area
make up for the store (Hispanic, middle
class, etc.). While these data are a very

Ii,j,k = (Pik / P )[Pij Pij ]
big part of the puzzle, what we cannot over all new sites ik
where Ii,j,k is the impact of new store k on factors to scale up or down the sales for
is the new
existing store sales in zone i, Pik specific months.
allocation to center k from zone i, and Pij is A simple time series model, with a set of
the allocation to center j from zone i. monthly or seasonal dummy variables can
The types of scenario that can be handled be used to make an empirically fitted set
using the methodology are as follows: of correction factors. Another way that trade
area models need to be corrected is for the
excess in demand that often accompanies a
analyze the trade areas of current stores (run with
new store opening as the novelty of that
just xed locations)
location is added to the mix of existing
pick sites from candidates (run with xed and stores and, at least initially, there may be
potential locations) large incentives or advertising efforts made
to attract customers. Clearly, it would be
re-consider current sites (make currently xed advisable to temper these initial sales figures
sites exible or optional) with some kind of decay or dilution effect that
would bring the stores sales into alignment at
examine specic proposed sites (lock in particular moderate levels (see Kaufmann et al., 2000).
new sites) Rules of thumb abound in this area, and
equilibrium sales after opening may settle
analyze specic closings (lock out particular site
and see what happens)
down to say 60% of the initial week sales.
analyze the opening of a known competitor (add

xed locations).
22.4. LOCATION ALLOCATION
MODELS
All of these versions of the problem
have been deployed in practice with good 22.4.1. Introduction to location
empirical and quantitative results. allocation models
The use of the location allocation model in
retail site selection has greatly advanced over
the past 15 years. Examples include the use
22.3.5. Temporal and seasonal
of interaction models to develop optimal site
variations in trade areas
locations for stores in a variety of different
Clearly, the volume of business is not simply types of retailing including supermarkets,
related to the local demand, and the seasonal department stores, big box retailers, and retail
adjustment for external visitors is something banking.
that would have to be taken into account in Successful use of these models led to their
developing accurate sales forecasts. Imagine commercial acceptability and widespread
a seaside resort such as Hilton Head, South adaptation in retail outlet location study (see
Carolina: its sales would be quite variable the Thompson site selection book (Buckner,
over the seasons, in a cycle tied to the 1998)). Commercial examples in Britain
peak tourist demand in the northern winter. include the G-MAP package (see Longley
One way to do this is to examine sales and Clarke, 1996). Specialized programs in
records and develop a set of monthly seasonal business-GIS packages now provide routine
adjustments. Whatever the base level of access to methods that were previously
demand, the modeler could then devise only obtainable in customized software and
research publications.7 This diffusion of the hypothesis about spatial behavior. Instead,
innovation of retail trade area analysis from we now expect that consumer behavior
specialized journals such as Environment and may be examined with the same tools
Planning A, into many applied sectors has that econometricians have devised for the
been a major success for analysts. These analysis of discrete choice. Databases in
models serve as a critical underpinning of turn provide a wealth of data. Geographers
the site selection analysis that goes into many have derived a representation of consumer
large format stores in almost every urbanized behavior with a model that locates services;
area in the U.S. and Europe. The reason that this involves a breakthrough in the use of
such models are widely used is that they are spatial interaction models. The key idea was
essential to the rapid pro-forma evaluation to replace the nearest center assignment of
of numerous site proposals. The models customers in central place theory, with a
provide the kinds of rapid computations that more realistic gravitationally based estimate
would ordinarily have taken a great deal of likely destination choice (OKelly, 1987).
of manual computation; and certainly when Thus, the customer might have a certain
a chain is screening as many as 10 sites probability of visiting a large center that is
for every actual chosen location, the need a bit further away than a small center close
for rapid analysis is obvious. For example to the consumer. In gauging these trade-
the early studies by Applebaum (1966), offs, the model makes a carefully calibrated
directly predate the computation of trade area estimate of the impact of size and distance
penetration models that may now be made on the consumers willingness to travel to
using spatial interaction models. particular destinations. Once this calibrated
One of the goals of this chapter is to model is available to us, the analyst can
provide the analytical background to the propose specific new site locations and gauge
models that are now a commercial fact the expected level of consumer patronage
of life for retail analysis. The idea that a at those sites. So called turnover or retail
model of retail attraction could be deployed sales volume is a critical first step in the
as a model for retail site location is an analysis of any commercial property deal
extension over the simple, earliest work as the sales levels helps to support the
in central place theory, where consumers go/no go decision on rental, lease, re-model,
were assumed to patronize closest centers or closing.
(see also Ghosh, 1986). In turn the central Locationallocation models generally
place approach defined a region in close involve the simultaneous selection of
proximity to the store from which it would be locations and the assignment of demand
reasonable to expect that the demand would to those locations in order to optimize
be assigned to that particular store. Following some specified objective or goal (usually
a large amount of study of consumer behavior to maximize market share or profit; see,
indicating dispersal of choices over many for example, Craig, et al., 1984). These
alternatives beyond just the most convenient models have several advantages. They can
(Clark, 1968; Hanson, 1980; OKelly, 1981), determine the optimal (or near optimal)
market researchers and others devised more location of several stores simultaneously by
precise means of estimating likely consumer systematically analyzing the system-wide
behavior. The deterministic all-or-nothing interactions among all stores in the market
allocation of demand to the nearest or area. They are capable of utilizing a wide
most convenient branch is no longer a range of objectives that could be used
necessary or indeed acceptable simplifying in siting stores. In addition, the models
are flexible in that they can incorporate all the important aspects of retail site
the behavior of retailers, consumers and/or selection which must be addressed in order
the retailing environment. Finally, heuristics to provide the decision maker with the best
are available for these models which provide set of locations for any particular market area
good (optimal or near optimal) solutions in which the stores will be located.
and yet are easy to implement. The use Some aspects of these models are
of locationallocation models typically developed in more detail in the following
involves empirical research to determine the section.
important store attributes for the population
within the market area and a mathematical
model to determine the optimal locations
22.4.2. Retail location models and
for retail outlets based on the pattern of
spatial interaction
market demand, store chains and existing
competing outlets.8 MULTILOC (Achabal et al., 1982) was one
Even though it is recognized that many of the first locationallocation models to
consumers engage in multi-purpose, multi- simultaneously locate more than one store.
stop shopping, models of multi-purpose The model optimizes the location of stores
shopping behavior have not been thoroughly using the knowledge that consumers will
integrated into facility location analysis, choose among the alternatives according
though early efforts by OKelly (1981, to a probabilistic interaction model (the
1983a, b) have been recently reconsidered MCI model). Such models maximize total
as the basis for new location models profit for a retail chain (or a single store)
(Leszczyc et al., 2004). So the assumption after subtracting the fixed costs of estab-
of single-purpose trips is made in order to lishing a store at the determined location
devise practical (usable) store-location mod- (i.e., location-specific fixed costs). It has later
els. Nevertheless, the fact that our analysis been given a more mathematical treatment in
is primarily designed around shopping center OKelly (1987).
destinations ensures that the attraction of The major problem facing the manager
a destination for a specific store is partly of site selection is the large number of
determined by the attraction of the cluster of options from which to choose, although the
stores as a whole. conceptual bases for this model are very
There are several types of retail location simple. A set of potential locations is defined
models in the literature. Some representative and from this set P facilities are to be chosen.
examples include models which combine The so-called N choose P problem clearly
locationallocation with spatial interaction involves a large number of combinatorial
(for example, the MULTILOC model by options. Not all of these choices need to be
Achabal et al., 1982); models which can examined, however, in order for the model
deal with multiple objectives (for example, to make a reasonable estimate of the ideal
Min, 1987); models that consider the uncer- subset of P facilities. Two major strategies
tainty inherent in the retailing environment are available. First, if the model can be
(such as the scenario planning model by posed as an optimization task, computer
Ghosh and McLafferty, 1982); and models programs using mathematical techniques
which involve the decision maker in the such as mixed integer programming (MIP) or
decision-making process (for example, the Lagrangian relaxation to select optimal loca-
STORELOC model by Durvasula et al., tions (OKelly, 1987). Second, and in many
1992). No one model is capable of handling ways more robustly, the modeler can set up
the problem and employ heuristics in order the context of the surrounding demographics
to make a quick and reliable estimate of the and competition. These models have become
core portion of the preferred site selections. very sophisticated because of the availability
An example may help to make this of detailed micro demographic profiles of
concept clear. Suppose a clothing retailer spatial areas that may be assigned to each
is considering siting stores in some of the potential location.
many available shopping centers in a large As the model explores the number of
metropolitan region such as Atlanta. It is locations, the analyst can keep track of
unlikely that the retailer would want to place the performance of those proposed sites.
a store in every available shopping center. For example a set of five stores distributed
Budget constraints would limit this option throughout the metropolitan region might
and simple common sense would indicate very well succeed in capturing the selected
that the market could not bear the saturation demographic submarkets that are sought and
coverage of too many stores. The question desired by this retailer. In contrast, some
of the optimal number of stores will be other combination of five stores could easily
addressed presently, for now assume that be eliminated from consideration because the
the retailer has a limited number of sites sites do not deliver the expected mix and den-
that are under consideration. Therefore the sity of demand to make this package feasible.
retailer seeks to prioritize a subset of all A great deal depends on a reasonable and
the available centers that might be expected accurate projection of the impact of each new
to perform well given their products and store and its performance both against exist-
customer profile. This latter point is a key ing competitors and any stores that the chain
one. In order for the retailer to prioritize the might already have located in the district.
store locations, the retailer needs to use an
accurate model of the underlying demand
for the service. Thus many geo-demographic
22.4.3. Combinatoric issues
case studies use profiles of existing customers
to create a measure that reflects the attraction A key to the efficient implementation of
of the store for particular populations. This interaction based location models is a data
in essence is a computerized version of the structure that enables the computerized eval-
classic idea by Applebaum (1966) of using uation of sites to be made relatively quickly.
analogs to project the trade area success of The following notes provide a guide to the
a proposed new store location. If the chain collection and organization of data in such a
already has a set of stores in a wide variety way as to make such computations feasible
of different spatial contexts, cross-sectional for quite a large study program. Assume that
comparison of the performance of those there are M origin zones. The N locations
stores can be used to produce a regression from which the model will select sites are
type model for store sales levels. Once these organized as the columns of the interaction
models are estimated, the retailer can then table with an extra column that will be used
seek new locations where the mix of factors to store any user demand that is under-served
leans heavily towards those variables that by the solution program. This modification
have proven to be successful predictors in is essential when dealing with site selection
other locations. The operational version of models. To see this, imagine that a retailer
this idea is to test each of the locational is planning to site three new outlets in a
scenarios by projecting the probable trade very large metropolitan area. If the maximum
area of each store, existing or proposed, in distance a customer would be willing to
travel to the store is set at say 10 miles the model to be chosen or not as the analysis
(equivalent to the concept of the range in progresses. Once again having an example
CPT) then in a large city, it is quite clear may help to fix these ideas. Suppose that a
that some consumers will be too far from city currently has a total of 35 supermarkets
any of the chosen sites to be able to use from a number of major chain stores. One
this retailers service. It is important that of these chains is considering a variety of
the model provide a means to calculate such expansion programs in this city. Among
unserved customers and we propose to do the locational options available to it are
this by placing those unserved consumers the acquisition of new sites, the acquisition
in a separate dummy destination category of existing sites from competitors, and the
as a holding bin for the under-provided origin expansion of some or all of the current stores
zones. In the absence of competition, the goal in its portfolio. In this case it is reasonable
would be to minimize unserved demand. In to think that the existing stores in the market
the presence of competitive alternatives, the are in a sense locked in and will occur in
goal would be to capture as much unserved all of the comparison scenarios: 35 columns
demand as possible for the clients chain. of the interaction matrix are therefore locked
With the exception of the concept of an in for the purposes of this initial run. Any
additional destination, the basic calculation additional locations are simply tacked on as
process is identical to that of a production say the 36th, 37th or 38th columns of this
constrained spatial interaction model. The interaction matrix. Depending on how many
device used to operationalize a particular candidates sites are available from which to
choice of actively considered facilities is to pick these three additional locations one can
simply keep a list of certain columns from the imagine that the model is exploring a finite
interaction matrix to which consumers might list of potential new store packages. Common
be allocated during that particular iteration. sense dictates that the store chain is unlikely
As the model proceeds from one locational to want all of its new site picks in the same
pattern to another the set of active columns area, as it would make a great deal more
is simply switched on and off to provide an sense to spread the chosen sites across a
indication of the currently available desti- variety of sectors of the city. If it so happened
nation choices. To make these calculations that a pool of presently underserved demand
efficiently the computer is provided with could be found, the model would place a
lists pointing to various types of columns facility in that area. More likely, the model
in the matrix. For example any sites which would be making a complex set of trade-offs,
are required to be provided in all cases trying to eke out a market share from among
may be indicated by placing their column and between the existing set of competitive
numbers in a vector of open facilities. Such centers, and indeed avoiding cannibalizing
a vector might be the noted by the letter the existing store already owned by the
R for required centers. A second set of chain. In this regard the strategy is essentially
pointers might be used to indicate that in similar to the well known gap in the
a particular analysis some potential facility map rubric for locating new services. The
locations are to be ignored completely. These, bulk of the program then would spend
for example might be sites which we wish time computing the benefits of specific
to lock out of the current set of optionally chosen alternatives in, for example, the
available sites. Yet another list could maintain north, east, and south suburbs. For those
a set of pointers to the available remaining with the obvious question of how is this
unexplored options that are freely available to done, it would be realistic to state that the
current practice involves a combination of represent the locations of competitor stores

GIS software to manage the spatial data, that we know are remaining in operation.
customized optimization algorithms coded These would be treated as fixed sites.
as executable computer programs, and a
report writer to digest the output from the
optimization run. While these capabilities Prohibited site
may be combined in various customized Areas or store locations that are prohibited
software environments by consultants, there from entering the model are equally impor-
is probably no prepackaged comprehensive tant to a realistic implementation. If we are
optimization environment for the applied sure that the chain does not wish to enter
tasks enumerated here, though this situation certain malls, or if the location in proximity to
will undoubtedly change. some existing stores is strongly discouraged,
then candidate locations in the no go zone
should be flagged to (a) save computer time;
and (b) and to enhance the chance that the
22.4.4. Heuristics and other
model will focus attention in areas that are
shortcuts9
worth investigating.
The position of the store relative to the pool
of demand and to other complementary and
competitive stores is critical in measuring Flexible sites
market area and size. If the objective is The set of locations from which the model
related to maximizing aggregate market share will pick are predefined by the user. These
for our entire chain, and if there is an accurate could be the result of selection set operations,
representation of maximum distance (reser- query based lists, or geographically delimited
vation distance) we can expect that the model regions on the screen. What is important is
will naturally space out our stores giving for an underlying comprehensive data base
them somewhat non-overlapping exclusive to be kept up to date in order for the analysts
market areas. Nevertheless, when two stores to have meaningful choices from which to
are close enough to contest a middle ground derive the set of active alternatives.
then the gravity model will do better than the
usual deterministic all-or-nothing location
models. The gravity model will in fact
partition the demand between the centers in
22.4.5. Computable Location
proportion to their attraction and weight.
Models10
If such a model is to be run in site selection Location models must be flexible to allow
mode, realize that the attraction/repulsion analysis of different scenarios. The model
score will have to be computed for prospec- takes as input the required and flexible sites.
tive as well existing sites in other words The existing literature contains several mod-
it has to be some calculated feature of sites els dealing with joint location and allocation
that are prospects: it cannot be simply some under spatial interaction: these however, need
observable feature of existing sites. to be modified to handle realistic selection
sets of required and prohibited sites.
The best practice at this time is to
Required site use a robust vertex substitution method
These are locations of our own chain that we appropriately modified to handle lists of
wish to keep. We can also, in some cases, required and prohibited sites, as well as
efficiently managing the introduction of new scenarios. The method takes as input the fixed
candidate locations. locations, the candidates, and the prohibited
The vertex substitution method also needs sites (if any). As output the model produces
to include the capability of a maximum the requested number of additional facility
service radius for the facilities, and for this sites, and reports on the area characteristics
radius to be flexible/variable between centers: of both the current and the new sites. The
this is essential if some notion of center candidates are either a comprehensive list
hierarchy is to be accommodated. It should be of all feasible shopping centers, stores are
clarified that the vertex substitution method generated from a list of picks and potential
is a local optimally solution in the sense that sites. The user may select the candidates as
there may be a better solution that was not those sites which meet some criteria, and the
reached during the course of the exploration; detail and realism of these selection criteria
this possibility can be reduced by trying are really only constrained by the imagination
the method with various starting values. of the user. All kinds of filters can be used,
Research experience has shown, however, including center size, or selections can be
that the good locations stand out very well based on attributes of the centers. Having
and the possibility that the vertex substitution selected the candidates, the user would have
method completely misses the best package to select the objective function: normally this
of locations is remote. One idea that is is driven on the basis of aggregate market
suggested to prevent mistakes due to local share, or demand, or minimizing competitors
optimality is to produce a list not only of share. This is potentially extended to include
the best locations but other close contenders acquisition, lease, closing and opening finan-
discovered in the course of the algorithms cial decisions.
progress. Vertex substitution has the great advantage
Research by Church has shown that the that as a general purpose optimization
introduction of maximum service radii into strategy (i.e., heuristic) it is robust to
a median type of problem (which is what changes of objective function, in a way,
we have) disrupts one normal property of for example, that would not be true of
the model, making it potentially possible that a specialized exact optimization code. In
the optimal locations occur at points other other words, the weakness of an exact
than the nodes of the network. However, the method is that it typically has to exploit
actual problem that we are concerned with some aspect of the problem structure and
realistically limits the feasible locations to any change in that structure would likely
the nodes of the network, as this is where undermine the mathematical formulation.
the shopping centers are. In other words Heuristics (and there are many of these
we ignore the theoretical possibility that the available for combinatorial problems) can
true optimal solution is at an intermediate frequently be set up to explore a solution
location along street segments, as in practice space effectively and this can be chosen
this kind of locational solution would not be to evaluate the users choice of objective
permissible. (and indeed multiple objectives) to achieve
What does experience tell us about the the desired goals. Indeed the final great
solution of location allocation models? The advantage of an exploratory heuristic is that
basic model is conceptually very simple by careful book-keeping many runner up or
and easy to understand. The idea is to close alternative solutions can be kept and
systematically explore alternative locational compared.
22.5. STRATEGIC PLANNING measured by using the size of the center

EXAMPLES as a proxy for its suitability. Suppose then
that the location allocation model algorithm,
22.5.1. Shopping centers such as the Interchange Heuristic, picks four
Store location siting is often made from locations as the close-to-optimal added sites.
among a predefined set of existing shopping (We are careful not to call them optimal in
view of the many simplifications and the use
centers, so in a sense the set from which the
of a heuristic which after all depends on some
strategic location is to be chosen is already
fixed. Thus, 1747 block groups in Atlanta short cuts to avoid complete enumeration of
represent the pools of available demand, the many thousands of combinations that are
available.)
which for the purposes of this simple example
are weighted by the population or disposable The impact of each new center on
income as a proxy for the demand. Assume the 12 existing sites is then operationally
a chain has 12 existing stores distributed measured using a formula such as the one
discussed above.
throughout the Atlanta region in specific
shopping centers. There are approximately
230 potential sites in shopping centers.
Assume that the reach or draw of the center 22.5.2. Chain combinations
candidates is a function of the size of the Sales of branches in two existing sets of chain
center in other words the decision to open stores can give a good clue as to the best ones
a new branch in a thriving center with a to keep in the combined operation, but that
super-regional draw might be appropriately still leaves a difficult problem to determine
Where did the impact on store come from?
NEW1 NEW2 NEW3 NEW4 Taken from

store number
0 0 0 0.04 1 0.04
0 0 0 0.47 2 0.47
0 1.02 0 0.01 3 1.03
0 0.65 0 0.41 4 1.06
0 0 0 0 5 0
0 0 0 0 6 0
0.04 0 0 0 7 0.04
0 0 0 0 8 0
0 0 1.11 0 9 1.11
0.22 0 0 0 10 0.22
0 0 0.51 0 11 0.5
0.02 0 0 0 12 0.02
0.28 1.67 1.62 0.93 From existing
0.82 2.35 2.64 2.78 Total
0.54 0.68 1.02 1.85 Net added
Before Total share 0.9434

After Total share 0.9842
4.08
which ones to close. Predicting retention of would be well worth while, would be the
customers from old stores to re-aligned new inclusion of the interaction based model in
branches is also difficult though the managers a multiobjective and multiattribute decision
of such operations may have good insight into framework. The difficulty would be to elicit
the likely levels of customer loyalty. from the decision maker a set of trade off
An interesting question is to determine the parameters that define the relative scales for
diversion of sales or the result of a store/chain the attributes of the alternative locational
closure. Such questions frequently are packages.
presented in practice to retailers as they The mechanism reviewed in this chapter
have the option to purchase competitors will operate to allocate the sales from the
sites. Which of these sites would make good origin zones to the destinations is called the
acquisitions (if the option to cherry pick the allocation model. It is driven by a gravity
best of the available store)? Which would be based spatial interaction model, and given
blended well and open under the new label if careful data and careful assessment of the
the acquiring chain gets the whole suite? foundation assumptions this is a robust model
If two chains merge, and there are for trade area delimitation.
regulatory concerns that the two chains have
to divest some of their branches, or wish
to streamline their combined operations, one
would have to analyze the closure of branches ACKNOWLEDGMENTS
one by one to determine the package that
makes the most sense from the point of view Parts of this chapter are based on materials
of the combined operations. developed over many years in my Retail
Location Seminar where comments from
Debbie Bryan, Tony Grubesic, and Tim
Matisziw are gratefully acknowledged (see
22.6. SUMMARY AND specific footnotes). In addition, a great deal
CONCLUSIONS of the common sense application flavor of
this paper derives from conversations with
The great strength of the gravity model is Jim Stone (GeoVue), Tony Lea (Environics
its simplicity and its allocation of demand Analytics), and Steve Wheelock. I thank these
to centers in proportion to their attraction individuals while taking full responsibility
and inversely proportional to distance. It can for the product here. Some material originally
incorporate center specific attraction and prepared as a discussion/research memo on
center specific maximum trade area radii. location models for Geonomics. See 22.1.2,
The strength of the SI based location model 22.4.5, 22.4.6, and examples in 5. Other
is that it provides assistance with all of the material derived from Retail Location.
following tasks: measuring saturation, impact Models and Spatial Interaction M.E. OKelly
of changes on current trade areas, assessment and D. Bryan. A Review of Modeling in
of the advantages of certain locations for Retail Location Unpublished working paper.
particular formats, and an estimation of the
forecast of sales. In addition the allocation
models allow a profile of the demographics
NOTES
of a trade area.
What would take a large amount of extra 1 Introduction is based on Geography 845
research effort, but which in my opinion Lecture Jan 2, 2001.
2 A major sector using the results from spatial Beaumont, J.R. (1981). Locationallocation models in
modeling capability is that of businesses with a plane, a review of some models. Socio-Economic
multi-store/branch locations. Home Depot for exam- Planning Sciences, 15(5): 217229.
ple has made extensive use of reports from what used
to be Thompson Associates, and is now a unit of Berry, B.J.L. (1967). The Geography of Market Centers
MapInfo in Ann Arbor, MI. Other well known users and Retail Distribution. Englewood Cliffs, NJ:
include McDonalds and Blockbuster. Prentice Hall.
3 Based on applications as discussed with
Jim Stone and Tony Lea.
Birkin, M., Clarke, G. and Clarke, M.P. (2002).
4 Some aspects of these following paragraphs Retail Geography and Intelligent Network Planning.
have beneted from discussion with Jim Stone. New York: Wiley.
5 The target level of goodness-of-t in conve- Black, W. (1984). Choice-set denition in patronage
nience store forecasting models is for high r -square
modeling. Journal of Retailing, 60(2): 6385.
values (about 0.8).
6 Section 22.2.1 is based on Retail Location Boots, B. and South, R. (1997). Modeling retail trade
Models and Spatial Interaction by M.E. OKelly and areas using higher-order, multiplicatively weighted
D. Bryan, A Review of Modeling in Retail Location. Voronoi diagrams. Journal of Retailing, 73(4):
Unpublished working paper.
519536.
7 GeoVue has a gravity based software package.
ESRI Business Analyst software has a Huff trade area Borgers, A. and Timmermans, H. (1986). A model
model. of pedestrian route choice and demand for
8 This material derived from Retail Location retail facilities within inner-city shopping areas.
Models and Spatial Interaction by M.E. OKelly and Geographical Analysis, 18(2): 115128.
D. Bryan, A Review of Modeling in Retail Location.
Unpublished working paper. Borgers, A. and Timmermans, H. (1991). A decision
9 Material in section 22.4.5 was originally dis- support and expert system for retail planning.
cussed in an explanatory memo from this author to Computers Environment and Urban Systems, 15(3):
Jim Stone at Geonomics (now GeoVue). Jims critique 179188.
was helpful in framing the discussion.
10 This section beneted from discussion with Brown, S. (1989). Retail location theory, the legacy
Jim Stone and Steve Wheelock. of Harold Hotelling. Journal of Retailing, 65(4):
450470.
Brown, S. (1992). The wheel of retail gravitation.
REFERENCES Environment and Planning A, 24(10): 14091429.
Brown, S. (1994). Retail location at the micro-sale
(Although not all these papers are cited directly, these inventory and prospect. Service Industries Journal,
are however all inuential papers in my analysis; they 14(4): 542576.
are retained as a general bibliographic resource.)
Buckner, R.W. (1998). Site Selection, New Advances
Achabal, D., Gorr, W. and Mahajan, V. (1982). in Methods and Technology, 2nd Edn. New York:
MULTILOC, A multiple store location decision model. Lebhar-Friedman Books.
Journal of Retailing, 58(2): 525.
Clark, W.A.V. (1968). Consumer travel patterns and the
Applebaum, W. (1966). Methods for determining store concept of range. Annals, Association of American
trade areas, market penetration, and potential sales. Geographers, 58: 386396.
Journal of Marketing Research, 3: 127141.
Congdon, P. (2000). A Bayesian approach to prediction
Baker, R.G.V. (2000). Towards a dynamic aggregate using the gravity model, with an application
shopping model and its application to retail trading to patient ow modeling. Geographical Analysis,
hour and market area analysis. Papers in Regional 32(3): 205224.
Science, 79(4): 413434.
Craig, C.S., Ghosh, A. and McLafferty, S. (1984).
Balakrishnan, P.V., Desai, A. and Storbeck, J.E. (1994). Models of the retail location process, a review.
Efciency evaluation of retail outlet networks. Journal of Retailing, 60(1): 536.
Environment and Planning B, 21(4): 477488.
Current, J.R. and Storbeck, J.E. (1994). A multiobjective
Beaumont, J.R. (1980). Spatial interaction models and approach to design franchise outlet networks.
the locationallocation problem. Journal of Regional Journal of the Operational Research Society, 45(1):
Science, 20(1): 3750. 7181.
Ding, G. and OKelly, M.E. (2008). Choice-based Fotheringham, A.S., Brunsdon, C. and Charlton, M.
estimation of Alonsos Theory of Movement, (2002). Geographically Weighted Regression,
Methods and Experiments. in Environment and The Analysis of Spatially Varying Relationships.
Planning A, 40(5): 10761089. Chichester: Wiley.
Donthu, N. and Yoo, B. (1998). Retail productivity Ghosh, A. (1986). The value of a mall and other
assessment using data envelopment analysis. Journal insights from a revised central place model. Journal
of Retailing, 74: 89105. of Retailing, 62(1): 7997.
Drezner, T. (1994). Optimal continuous location of Ghosh, A. and Craig, C.S. (1983). Formulating retail
a retail facility, facility attractiveness, and market location strategy in a changing environment. Journal
share an interactive model. Journal of Retailing, of Marketing, 47(3): 5668.
70(1): 4964.
Ghosh, A. and McLafferty, S. (1982). Locating stores
Drezner, T. and Drezner, Z. (2002). Validating in uncertain environments, a scenario planning
the gravity-based competitive location model approach. Journal of Retailing, 58(Winter): 522.
using inferred attractiveness. Annals of Operations
Ghosh, A. and McLafferty, S.L. (1987). Location
Research, 111(14): 227237.
Strategies for Retail and Service Firms. Lexington,
Drezner, T., Drezner, Z. and Salhi, S. (2002). Solving MA: Lexington Books.
the multiple competitive facilities location problem.
Ghosh, A. and Craig, C.S. (1991). FRANSYS, a franchise
European Journal of Operational Research, 142(1):
distribution system location model. Journal of
138151.
Retailing, 67(4): 466495.
Durvasula, S., Sharma, S. and Andrews, J.C. (1992).
Ghosh, A. and Tibrewala, V. (1992). Optimal timing
Storeloc a retail store location model-based on
and location in competitive markets. Geographical
managerial judgments. Journal of Retailing, 68(4):
Analysis, 24(4): 317334.
420444.
Golledge, R. and Spector, A. (1978). Comprehending
Eagle, T.C. (1984). Parameter stability in disaggregate
the urban environment, theory and practice.
retail choice models experimental-evidence.
Goodchild, M.F. (1984). ILACS, A location-allocation
Eaton, B.C. and Lipsey, R.G. (1982). An eco-
model for retail site selection. Journal of Retailing,
nomic theory of central places. Economic Journal,
60(1): 84100.
92: 5672.
Goodchild, M.F. (1991). Geographic information
Erlenkotter, D. and Leonardi, G. (1985). Facility location
systems. Journal of Retailing, 67(1): 315.
with spatially-interactive behavior. Sistemi Urbani,
1: 2941. Guy, C.M. (1991). Spatial interaction modeling in
retail planning practice the need for robust
Fotheringham, A.S. (1983). A new set of spatial inter-
statistical-methods. Environment and Planning B,
action models, the theory of competing destinations.
18(2): 191203.
Environment and Planning A, 15: 1536.
Hallsworth, A.G. (1994). Decentralization of retailing in
Fotheringham, A.S. (1986). Modelling hierarchical
Britain the breaking of the 3rd wave. Professional
destination choice. Environment and Planning A, 18:
Geographer, 46(3): 296307.
401418.
Hanson, S. (1980). Spatial diversication and mul-
Fotheringham, A.S. and OKelly, M.E. (1989). Spatial
tipurpose travel, implications for choice theory.
Interaction Models, Formulations and Applications
Studies in Operational Regional Science. Dordrecht,
Netherlands: Kluwer. Hodgson, M.J. (1978). Towards more realistic alloca-
tion in locationallocation models, an interaction
Fotheringham, A.S. and Knudsen, D.C. (1986).
approach. Environment and Planning A, 10:
Modeling discontinuous change in retailing
12731285.
systems extensions of the HarrisWilson
framework with results from a simulated urban Hodgson, M.J. (1981). A locationallocation model
retailing system. Geographical Analysis, 18(4): maximizing consumers welfare. Regional Studies,
295312. 15(6): 493506.
Hodgson, M.J. (1986). An hierarchical location Lakshmanan, T.R. and Hansen, W.A. (1965). A retail
allocation model with allocations based on facility market potential model. Journal of the American
size. Annals of Operational Research, 6: 273289. Institute of Planners, 31: 134143.
Houston, F.S. and Stanton, J.(1984). Evaluating retail Langston, P., Clarke, G.P. and Clarke, D.B. (1997).
trade areas for convenience stores. Journal of Retail saturation, retail location, and retail com-
Retailing, 60(1): 124136. petition, An analysis of British grocery retailing.
Environment and Planning A, 29(1): 77104.
Hubbard, R. (1978). A review of selected factors
conditioning consumer travel behavior. Journal of Leonardi, G. (1980). A unifying framework for
Consumer Research, 5: 121. public facility location problems. WP-80-79, IIASA,
Laxenburg, Austria.
Huff, D.L. (1962). Determination of Intra-Urban
Retail Trade Areas. Real Estate Research Program. Leonardi, G. (1983). The use of random-utility theory in
University of California at Los Angeles. building locationallocation models. In: Thisse, J.-F.
and Zoller, H. (eds), Locational Analysis of Public
Huff, D.L. (1963). A probabilistic analysis of shopping
Facilities, pp. 357383. Amsterdam: North Holland.
center trade areas. Land Economics, 39: 8190.
Leszczyc, P. and Timmermans, H.J.P. (1996). An
Huff, D.L. (1964). Dening and estimating a trade area.
unconditional competing risk hazard model of
Journal of Marketing, 28: 3438.
consumer store- choice dynamics. Environment and
Jain, A.K. and Mahajan, V. (1979). Evaluating Planning A, 28(2): 357368.
the competitive environment in retailing using
Leszczyc, P., Sinha, A. and Timmermans, H. (2000).
multiplicative competitive interaction models. In
Consumer store choice dynamics. An analysis of
Sheth, J. (ed.), Research in Marketing, pp. 217235.
the competitive market structure for grocery stores.
Greenwich, Conn: JAI Press.
Kantorovich, Y.G. (1992). Equilibrium-Models of Spatial
Leszczyc, P., Sinha, A. and Sahgal, A. (2004). The effect
Interaction with Locational-Capacity Constraints.
of multi-purpose shopping on pricing and location
Environment and Planning A, 24(8): 10771095.
strategy for grocery stores. Journal of Retailing,
Kaufmann, P.J., Donthu, N.B. and Brooks, C.M. 80(2): 8599.
(2000). Multi-unit retail site selection processes,
Longley, P. and Clarke, G. (1996). GIS for Business and
incorporating opening delays and unidentied
Service Planning. New York: Wiley.
competition. Journal of Retailing, 76(1): 113127.
McLafferty, S.L. and Ghosh, A. (1986). Multipurpose
Kitamura, R. and Kermanshah, M. (1985). Sequential
shopping and the location of retail rms. Geograph-
model of interdependent activity and destination
ical Analysis, 18(3): 215226.
choices. Transportation Research Record, 987:
8189. Mercer, A. (1993). Developments in implementable
retailing research. European Journal of Operational
Kohsaka, H. (1989). A spatial search-location model
Research, 68(1): 18.
of retail centers. Geographical Analysis, 21(4):
338349. Miller, H. and OKelly, M.E. (1991). Properties and
estimation of a production-constrained Alonso
Kohsaka, H. (1992). Three-dimensional representation
model. Environment and Planning A, 23: 127138.
and estimation of retail store demand by bicubic
splines. Journal of Retailing, 68(2): 221241. Miller, H.J. (1993). Consumer Search and Retail
Analysis. Journal of Retailing, 69(2): 160192.
Kohsaka, H. (1993). A monitoring and locational deci-
sion support system for retail activity. Environment Min, H. (1987). A multiobjective retail service location
and Planning A, 25(2): 197211. model for fastfood restaurants. OMEGA, 15(5):
429441.
Krider, R.E. and Weinberg, C.B. (1997). Spatial
competition and bounded rationality, Retailing at the Munroe, S. (2001). Retail structural dynamics and the
edge of chaos. Geographical Analysis, 29(1): 1634. forces behind big-box retailing. Annals of Regional
Science, 35(3): 357373.
Kumar, V. and Karande, K. (2000). The effect of retail
store environment on retailer performance. Journal Nakanishi, M. and Cooper, L.G. (1974). Parameter
of Business Research, 49(2): 167181. estimation for a multiplicative competitive interaction
model least squares approach. Journal of Rust, R.T. and Brown, J.A.N. (1986). Estimation and
Marketing Research, 11: 303311. comparison of market area densities. Journal of
Retailing, 62(4): 410430.
OKelly, M.E. (1981). A model of the demand for retail
facilities incorporating multistop multipurpose trips. Rust, R.T. and Donthu, N. (1995). Capturing geo-
Geographical Analysis, 13(2): 134148. graphically localized misspecication error in retail
store choice models. Journal of Marketing Research,
OKelly, M.E. (1983a). Multipurpose shopping trips and
32(1): 103110.
the size of retail facilities. Annals of the Association
of American Geographers, 73(2): 231239. Seldin, M. (1995). The information revolution and real
estate analyses. Real Estate Issues, April 1995.
OKelly, M.E. (1983b). Impacts of multistop multipur-
pose trips on retail distributions. Urban Geography, Thill, J.C. (2000). Network competition and branch
4(2): 173190. differentiation with consumer heterogeneity. Annals
OKelly, M.E. and Storbeck, J.E. (1984). Hierarchi- of Regional Science, 34(3): 451468.
cal location models with probabilistic allocation. Timmermans, H., Arentze, T. and Joh, C.-H. (2002).
Regional Studies, 18(2): 121129. Analysing spacetime behaviour, new approaches to
OKelly, M.E. (1987). Spatial interaction based old problems. Progress in Human Geography, 26(2):
locationallocation models. In: Ghosh A. and 175190.
Rushton, G. (eds), Spatial Analysis and Location Timmermans, H., Vanderhagen, X. and Borgers, A.
Allocation Models, pp. 302326. New York: van (1992). Transportation systems, retail environments
Nostrand Reinhold. and pedestrian trip chaining behavior modeling
OKelly, M.E. and Miller, H.J. (1989). A synthesis of issues and applications. Transportation Research
some market area delimitation tools. Growth and Part B Methodological, 26(1): 4559.
Change, 20: 1433. Tobler, W.R. (1970). A computer movie simulating
OKelly, M.E. (1999). Trade-area models and choice- urban growth in the Detroit region. Economic
based samples, methods. Environment and Plan- Geography, 46: 234240.
ning A, 31(4): 613627. Wee, C.H. and Pearce, M.R. (1985). Patronage Behavior
OKelly, M.E. (2001). Retail market share and toward Shopping Areas a Proposed Model Based
saturation. Journal of Retailing and Consumer on Huffs Model of Retail Gravitation. Advances in
Services, 8(1): 3745. Consumer Research, 12: 592597.
Oppenheim, N. (1990). Discontinuous changes in equi- Weisbrod, G.E., Parcells, R.J. and Kern, C. (1984).
librium retail activity and travel structures. Papers of A disaggregate model for predicting shopping
the Regional Science Association, 68: 4356. area market attraction. Journal of Retailing, 60(1):
6583.
Pirkul, H., Narasimhan, S. and De, P. (1987). Firm
expansion through franchising, a model and solution Wilson, A.G. and Senior, M.L. (1974). Some rela-
procedure. Decision Science, 18: 631641. tionships between entropy maximizing models,
mathematical programming models and their duals.
Prendergast G., Marr, N. and Jarratt, B. (1998). Journal of Regional Science, 14: 207215.
Retailers views of shopping centres, a comparison
of tenants and non-tenants. International Journal Wilson, A.G., Coelho, J.D. Macgill, S.M. and
of Retail and Distribution Management, 26(4): Williams, H.C.W.L. (1981). Optimization in Loca-
162171. tional and Transport Analysis, London: Wiley.
Roy, J.R. and Thill, J.C. (2004). Spatial interaction Zeller, R.E., Achabal, D.D. and Brown, L.A. (1980).
modelling. Papers in Regional Science, 83(1): Market penetration and locational conict in
339361. franchise systems. Decision Sciences, 11: 5880.
23
Spatial Analysis on a Network
Atsuyuki Okabe and Toshiaki Satoh
23.1. INTRODUCTION These are examples of network spatial

phenomena that occur directly on a network.
In the real world, various types of phenomena As well as the above network spatial
occur on or alongside a network; these are phenomena, another broad class of
termed network spatial phenomena. A typical phenomena occurs alongside rather than
example is illustrated in Figure 23.1, where directly on a network. A typical example is
dots show traffic accidents in Chiba, Japan. illustrated in Figure 23.2, where dots indicate
As with this example, many types of network parking lots in Kyoto, Japan. There are many
spatial phenomena are reported in the related facilities in addition to parking lots that are
literature: traffic accidents on a road network located alongside street networks within
(e.g., Jones et al., 1996; Levine et al., densely inhabited areas. In fact, the entrances
1995; McGuigan, 1981; Nicholson, 1989; to almost all facilities in a city are adjacent
Yamada and Thill, 2004), road kills on a to a street and users access such facilities
road network (e.g., Bashore et al., 1985; through these entrances. Consequently, the
Clevenger et al., 2003; Mallick et al., location phenomena of almost all facilities
1998; Saeki and MacDonald, 2004), street within urbanized areas can be regarded as a
crimes on a street network (e.g., Anselin second class of network spatial phenomena.
et al., 2000; Bowers and Hirschfield, 1999; Network spatial phenomena are usually
Ratcliffe, 2002; Ratcliffe and McCullagh, analyzed by methods that assume a contin-
1999; Painter, 1994), the distribution of uous plane and Euclidean distance (expect
seabirds along a coastline (e.g., ODriscoll, for transportation studies). For referential
1998), and the distribution of trees along a purposes, these types of spatial methods
road network (e.g., Spooner et al., 2004). are referred to as planar spatial methods,
Figure 23.1 Sites of trafc accidents in Chiba, Japan (the width of each line segment
represents trafc volume).
Figure 23.2 The distribution of parking lots in Kyoto, Japan.

SPATIAL ANALYSIS ON A NETWORK 445
and analyses via planar spatial methods are is clearly demonstrated in Figure 23.4.
termed planar spatial analyses. Planar spatial Having assessed the distribution of points
methods are generally used for the analysis of in Figure 23.4(a), nobody would consider
network spatial phenomena because: (1) it is that the points are randomly distributed. This
much easier to compute Euclidean distance view is true when points are distributed on
on a plane than by the shortest-path distance a plane; however, this view is false when
on a network, and (2) it is believed that the points are distributed on the network
the shortest-path distance is approximated by indicated by the lines in Figure 23.4(b). In
Euclidean distance. The first reason remains fact, the points in the figure are randomly
true, although the difficulty is reduced generated on the network.
these days because the use of Geographical Figure 23.4 provides the following warn-
Information Systems (GIS) makes it easy ing: analyzing network spatial phenomena
to calculate the shortest-path distance. The using a planar spatial method is likely to lead
second reason might be true over a large to false conclusions. To avoid such errors,
region, but its validity is questionable across this chapter considers a class of network spa-
a small area or within a city. For example, tial methods. The chapter consists of seven
Maki and Okabe (2005) demonstrated that, sections including this introductory section.
in Kokuryo, a Tokyo suburb, the dif- Section 23.2 describes a method, termed the
ference between the shortest-path distance uniform network transformation, that deals
and Euclidean distance is significant if the with a nonuniform distribution function on
Euclidean distance is less than 500 m (see a network. Section 23.3 considers a class of
Figure 23.3). Therefore, to analyze spatial network Voronoi diagrams, and section 23.4
phenomena in small areas such as the market discusses a class of network local and global
areas of convenience stores in a city, planar K function methods. Section 23.5 describes
spatial methods are inappropriate; instead, a class of network kernel methods, and
spatial methods that assume a network Section 23.6 outlines a GIS-based toolbox
space using the shortest-path distance, termed termed SANET, which is used for network
network spatial methods, should be used. spatial analysis. The chapter ends with
The danger in applying planar spatial Section 23.7, which considers network spatial
methods to network spatial phenomena methods that we have not discussed earlier.
Ratio
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
0 1000 2000 3000 4000
Euclidean distance (m)
Figure 23.3 Ratio of the shortest-path distance to its corresponding Euclidean distance for
the street network in Kokuryo, Tokyo (from Maki and Okabe, 2005).
(a) (b)
Figure 23.4 Non-randomly distributed points on a plane (a), and randomly distributed
points on a network (b). (Note that the two distributions of points are the same.)
23.2. UNIFORM NETWORK (the width represents traffic volume). It is

TRANSFORMATION likely that traffic accidents are related to
traffic volume, and this relation will be
Many spatial methods for analyzing spatial examined in Section 23.4. Applying network
point patterns are designed to test the spatial methods that assume a uniform
null hypothesis that points are randomly network (termed uniform network spatial
distributed on a space. In the case of a methods) to a nonuniform network is likely
network space, this null hypothesis means to result in false conclusions. Such errors can
that the probability of a point (say, a traffic be avoided by the use of uniform network
accident site) being generated on a unit line transformation (Okabe and Satoh, 2006),
segment on a network is the same regardless which is briefly introduced in this section.
of the location of the unit line segment; stated First, consider a network L (e.g., a road
differently, the density of points is uniform network) that consists of n line segments
over the network. Networks that possess (street segments), i.e., L = {l1 , . . ., ln }, and
this probabilistic property are referred to let ci l be the probability that a point is
as uniform networks. In the real world, located within a unit line segment l on li ,
however, uniform networks are unlikely to i = 1, . . ., n. Note that this assumption means
exist. Rather, the probability of a point being that the density ci of points is uniform over
generated on a unit line segment on a network li , but may vary (as in Figure 23.5(a)) or not
varies according to the location of the unit vary (as in Figure 23.5(b)) between different
line segment. Such networks are referred to line segments. If the density does not vary,
as nonuniform networks. For example, a road i.e., ci = cj for all i, j = 1, . . ., n, then the
network in which traffic accidents occur in network is a uniform network; if it does vary,
proportion to traffic volume is a nonuniform i.e., ci = cj for at least one pair i, j, then the
network. Figure 23.1 shows the actual distri- network is a nonuniform network.
bution of traffic accidents (as dots) in relation Second, consider a new network L =
to traffic volume along each line segment
{l1 , . . ., ln } whose graph is isomorphic to that
c1=3
c4=1 * * c2*
c2=1 c4 c1
I4 I4* * I2*
c3=2 I1 s I1
e1 t
I2
* e*
1
c3
I3
*
I3
(a) (b)
Figure 23.5 A nonuniform network (a) and the equivalent uniform network transformed by
the uniform network transformation (b).
of the original network L (Figure 23.5(b)). analyzing a nonuniform network because

The location of a point at distance t they are designed for the analysis of a
from one end point ei of li along li is uniform network. However, they can be
mapped on the point at distance s from the used if the following simple preprocessing is
end point ei of li along li by s = ci t, performed. First, transform a given nonuni-
where satisfies ci > 1 for all i = 1, . . ., n form network to a uniform network by the
(Figure 23.5(a, b)). uniform network transformation described
Third, consider the transformation that above. Second, apply a uniform network
satisfies the condition that the probability spatial method to the transformed network
ci l of a point being placed in a unit (which is a uniform network). No special
line segment l on li is the same as the development is necessary for dealing with a
probability ci l of a point being placed nonuniform network. Many existing uniform
in a unit line segment l on li , i.e., network spatial methods can be utilized for
ci l = ci l. analyzing a nonuniform network without
Okabe and Satoh (2006) proved that the modification through the uniform network
above transformation transforms a nonuni- transformation. This transformation has the
form network into a uniform network, advantage of network spatial analysis, which
and is thus termed the uniform network is not enjoyed by planar spatial analysis.
transformation. Note that this transformation
is an extension of the probability integral
transformation that transforms a univariate
nonuniform distribution function to a uniform
distribution function. The probability integral 23.3. NETWORK VORONOI
transformation is commonly used in statistics DIAGRAMS
to generate nonuniform random variables
23.3.1. Ordinary network Voronoi
(Freund, 1998).
diagram
The uniform network transformation
provides a powerful tool for analyzing As reviewed by Okabe et al. (2000),
nonuniform networks that are commonly the ordinary Voronoi diagram, i.e., the
found in the real world. Obviously, uniform Voronoi diagram defined on a plane with
network spatial methods cannot be used for Euclidean distance (the ordinary planar
Voronoi diagram), is used in many ways in The set of the resulting subnetworks, V =
spatial analysis (Figure 23.6). In particular, {V1 , . . ., Vm }, is termed the (ordinary) net-
the ordinary planar Voronoi diagram is com- work Voronoi diagram (Okabe et al., 2000);
monly used in retail marketing and facilities an example is provided in Figure 23.7.
management as a first approximation of the It is instructive to compare this network
service areas of stores or facilities. Voronoi diagram with its corresponding pla-
This approximation, however, is problem- nar Voronoi diagram shown in Figure 23.6.
atic when service areas are small. Table 23.1
shows the average radii of circular market
areas in Shinjuku Ward, Tokyo, with respect
23.3.2. Directed network
to store type. In all cases, the distance to the
Voronoi diagrams
nearest store is less than five hundred meters.
Recalling the difference between Euclidean In a downtown area, streets are commonly
distance and the shortest-path distance shown one-way. Pizza delivery stores should
in Figure 23.3, the data in Table 23.1 suggest consider this fact when dispatching delivery
that the ordinary planar Voronoi diagram is bikes. To take one-way regulations into
not appropriate as a first approximation of account, consider a directed network L and
the service areas. let d (pi , p) be the directed shortest-path
Instead, a Voronoi diagram defined on a distance from pi (e.g., a pizza delivery store)
network with shortest-path distance, termed to p (e.g., a house). Let Vi be a set of
the network Voronoi diagram, should be used. points on L (a subnetwork) that satisfies
To show this clearly, let d(p, pi ) be the equation (23.1), where d(p, pi ) is replaced
shortest-path distance between a point p and with d (pi , p). The set of the resulting
a point pi on a network L, where m generator subnetworks, V = {V1 , . . ., Vm }, is
points (e.g., stores) are located at p1 , . . ., pm . termed a directed network Voronoi diagram
Let Vi be a set of points on L (a subnetwork) (Okabe et al., 2008); an example is shown
that satisfies in Figure 23.8, where one-way streets are
indicated by arrows.
Note that the directed shortest-path
Vi = p|d(p, pi ) d(p, pj ), p L, distance is not symmetric, i.e., d (pi , p) =

d (p, pi ) does not always hold. Suppose that
j = i, j = 1, . . ., m . (23.1)
p1 , . . ., pm are parking lots, and a driver at p
wants to use the nearest parking lot among
p1 , . . ., pm . The service area of the parking
lot at pi is then defined by the set Vi of
points on L (a subnetwork) that satisfies
Table 23.1 Average radii of circular market equation (23.1), where d(p, pi ) is replaced
areas in Shinjuku ward, Tokyo (m)
with d (p, pi ). The set of the resulting
Store types Average radius
subnetworks, V = {V1 , . . ., Vm }, is
Bakery 320 also a directed network Voronoi diagram.
Shoe store 255
To distinguish V = {V1 , . . ., Vm }
Fruit shop 213
Book store 177 and V = {V1 , . . ., Vm }, the former is
Chinese noodle shop 153 termed the outward directed network Voronoi
Convenience store 150 diagram and the latter the inward directed
Beauty parlor 114 Voronoi diagram (Okabe et al., 2008).
Clinic 113
Both are directed Voronoi diagrams that
Figure 23.6 The ordinary planar Voronoi diagram generated from parking lots in
Kyoto, Japan.
Figure 23.7 The ordinary network Voronoi diagram generated from parking lots in
Kyoto, Japan.
Figure 23.8 The outward directed Voronoi diagram generated from parking lots in
Kyoto, Japan.
should be distinguished from the ordinary subnetworks, VAW = {VAW 1 , . . . , VAWm }, is

network Voronoi diagram V = {V1 , . . ., Vm } termed the additively weighted network
(alternatively, the ordinary network Voronoi Voronoi diagram (Okano and Okabe, 2004);
diagram is termed a nondirected network an example is shown in Figure 23.10, where
Voronoi diagram). An example of an inward the dots indicate convenience stores in Kyoto,
directed Voronoi diagram is shown in Japan, and the radius of each circle indicates
Figure 23.9. its weight i .
Suppose that the delivery speed i of
goods is different from store to store. In this
case, the multiplicatively weighted distance,
23.3.3. Weighted network Voronoi
dMW (p, pi ) = i d(p, pi ), is appropriate for
diagrams
estimating market areas. To be explicit, let
Consider, for instance, that consumers choose VMWi be the set of points on L (a subnetwork)
a store by considering prices i at alternative that satisfies equation (23.1), where d(p, pi )
stores and the transportation cost d(p, pi ) is replaced with dMW (p, pi ) = i d(p, pi ).
between their house p and the store pi , The set of the resulting subnetworks, VMW =
where is the unit transportation cost. In {VMW 1 , . . ., VMWm }, is termed the multiplica-
this case, the market area of a store is tively weighted network Voronoi diagram
defined by the set VAWi of points on L (Okano and Okabe, 2004). An example is
(a subnetwork) that satisfies equation (23.1), shown in Figure 23.11, where the dots
where d(p, pi ) is replaced with dAW (p, pi ) = indicate convenience stores in Kyoto, Japan,
d(p, pi )+i , termed the additively weighted and the radius of each circle indicates its
network distance. The set of the resulting weight i .
Figure 23.9 The inward directed Voronoi diagram generated from parking lots in
Kyoto, Japan.
20
40
10
20
40
20
10
5
20 17
Figure 23.10 The additively weighted network Voronoi diagram generated from
convenience stores in Kyoto, Japan (each circle indicates its weight i ).
23.3.4. Other network Voronoi function method was developed for points
diagrams on a plane, and was termed the planar K
function method (Ripley, 1976, 1977). Okabe
In addition to the above network Voronoi
and Yamada (2001) extended the planar K
diagrams, the kth nearest network Voronoi
function method to the K function method for
diagram, the network Voronoi diagram for
points on a network to develop the network
line segments, and the network Voronoi
K function method. To state this method
diagram for polygons have also been
explicitly, consider a network L on which
proposed in the literature. The reader should
points p1 , . . ., pm are placed, and let Di (t)
consult Furuata et al. (2005) and Okabe et al.,
be a subnetwork of L in which the shortest-
(2008) for information on these diagrams.
path distance between any point on Di (t)
and pi is less than or equal to t (the heavy
lines in Figure 23.12; in the planar case,
Di (t) corresponds to the disk centered at pi
23.4. LOCAL AND GLOBAL
with radius t truncated by a bounded global
NETWORK K FUNCTION
space). Let Ki (t) be the number of points of
METHODS
p1 , . . ., pm that are included in Di (t). In this
23.4.1. Global network auto K term, a network K function is defined by
function
One of the most commonly used tech-
m
niques in statistical spatial analysis is K(t) = Ki (t). (23.2)
the K function method. Originally, the K i=1
200
300 100
150
300
200
100
250 150
50
Figure 23.11 The multiplicatively weighted network Voronoi diagram generated from
convenience stores in Kyoto, Japan (each circle indicates its weight i ).
Note that, in contrast to the cross K burglaries occur uniformly and randomly
function, which is defined below, the above distributed on the street network. Because
function is referred to as the network auto the observed curve is always above the
K function (as with spatial auto correlation); expected curve in Figure 23.13, it is
also note that constants (the density and concluded that burglaries tend to cluster
number of points) are omitted here for themselves.
simplicity. The difference between the network
To show an actual example, the K function and the planar K function is
distribution of street burglaries in Kyoto distinct. Actually, Yamada and Thill (2004)
is depicted in Figure 23.12, where the applied both the planar K function method
triangle marks indicate sits of incidence. and the network K function method to
For this distribution, the network auto the same traffic accident data and found
K function is calculated, and the result that the planar K function method overesti-
is shown in Figure 23.13. The black mates clustering tendency. The authors con-
line indicates the expected value and the cluded that the network K function method
gray line indicates the observed value should be used for the analysis of traffic
obtained under the null hypothesis that accidents.
Figure 23.12 Street burglaries (the triangle marks), railway stations (the circles), and the
Voronoi sub-network (the heavy lines) of the station (the large circle) in Kyoto, Japan.
160
140
120
100
Expected
80
Observed
60
40
20
0
0 2000 4000 6000 8000
Figure 23.13 The global network auto K function for street burglaries on the street
network in Kyoto, Japan (Figure 23.12).
23.4.2. Global network cross (Note that a constant is omitted from the
K function above equation for simplicity.) Because this
function considers all points of P across the
Another type of network K function
entire network space L (the global space),
method is the network cross K function
the function can be regarded as a global
method (Okabe and Yamada, 2001). Consider
network cross K function. An actual example
two sets of points, P = {p1 , . . ., pm } and
of the global network cross K function
Q = {q1 , . . ., qk }, on a network L. Points of P
is shown in Figure 23.14, where points
are stochastically distributed on L, but points
of P are street burglaries and points of
of Q are fixed (note that the configuration of
Q are railway stations in Kyoto, Japan as
the points is arbitrary). For instance, points
shown in Figure 23.12. Because the observed
of P may be crime spots and points of
curve is always above the expected curve
Q may be railway stations. The network
in Figure 23.14, it is concluded that street
cross K function is used for testing whether
burglaries tend to occur around railway
points p1 , . . ., pm (crime spots) tend to cluster
stations.
around (or apart from) q1 , . . ., qk (railway
stations) as a whole.
To state the network cross K function
explicitly, let Dqi (t) be a subnetwork of L in
which the shortest-path distance between any
23.4.3. Local network cross K
point in Dqi (t) and qi is less than or equal to t,
function
and let Kqi (t) be the number of points of P
that are included in Dqi (t). Then, the network The global network cross K function method
cross K function, KQP (t), is defined by: deals with the average tendency of a point
pattern around all fixed points Q; therefore, it
cannot detect local tendencies. For example,
the global network cross K function cannot

k
KQP (t) = Kqi (t). (23.3) detect the specific railway stations around
i=1 which crime spots tend to cluster. To detect
160
140
120
100
80 Expected
Observed
60
40
20
0
0 2000 4000 6000 8000
Figure 23.14 The global network cross K function for street burglaries in relation to
railway stations in Kyoto, Japan (Figure 23.12).
local tendencies, a local network cross Voronoi diagram generated by Q. Then, a

K function should be developed. local space of point qi (the neighborhood
A simple way of defining a local cross of the ith railway station) is given by
K function is to decompose the global cross the Voronoi subnetwork Vi (a Voronoi
K function given by equation (23.3) into subnetwork is indicated by the heavy lines
each term; that is, a local cross K function in Figure 23.12).
defined by Kqi (t), i = 1, . . ., k. This local In terms of this natural local space, an
cross K function, however, is not always alternative network cross K function KVqi (t)
natural, because the local space Dqi (t) of can be defined as the number of points of
Kqi (t) becomes large as t increases, and P (e.g., crime spots) that are included in
eventually the local space includes the global Vi Dqi (t), i.e., the number of crime spots
space (the entire space L). in a local space Vi Dqi (t) whose shortest-
path distance to the railway station qi is
less than or equal to t. Because Vi Dqi (t)
is bounded by a local space Vi , the local
space of KVqi (t) remains a local space of
23.4.4. Local network Voronoi
the global space even for a large t (this
cross K function
contrasts with the network cross K function
In the context of crime spots and railway Kqi (t) in Section 23.4.3). The function KVqi (t)
stations referred to above, it is natural is termed the local network Voronoi cross
to examine whether crime spots cluster K function, and should be distinguished from
in the neighborhood of specific railway the function Kqi (t) in Section 23.4.3, which
stations. If commuters use their near- is referred to as the local network ordinary
est railway stations, the neighborhoods of cross K function. Figure 23.15 shows an
railway stations are given by the ordi- actual example of the local network Voronoi
nary network Voronoi diagram generated cross K function for street burglaries in the
from railway stations. To be explicit, let local space indicated by the heavy lines in
V = {V1 , . . ., Vk } be the ordinary network Figure 23.12.
23.4.5. Global network Voronoi This function deals with all points P in
cross K function the entire network L (the global space).
Therefore, this function can be regarded as
In sections 23.3.2 and 23.3.3, a local K
a global network cross K function, which
function is obtained from a global K function.
is referred to as the global network Voronoi
Conversely, a global function can also be
cross K function. An example is illustrated
obtained from a local function. For instance,
in Figure 23.16. Comparison between the
let KVQP (t) be a function defined in terms of
local network Voronoi cross K function in
the local network Voronoi K functions as
Figure 23.15 and the global network Voronoi
cross K function in Figure 23.16 reveals local

k
variety.
KVQP (t) = KVqi (t). (23.4)
i=1
12
10
8
Expected
6
Observed
4
0
0 200 400 600 800
Figure 23.15 The local network Voronoi cross K function for street burglaries in the local
space (the heavy lines in Figure 23.12) in Kyoto, Japan.
160
140
120
100
Expected
80
Observed
60
40
20
0
0 500 1000 1500
Figure 23.16 The global network Voronoi cross K function for street burglaries in relation
to railway stations in Kyoto, Japan (Figure 23.12).
Having obtained two global network 23.5. NETWORK KERNEL METHOD

K functions, namely the global network
ordinary cross K function in Section 23.4.3 The kernel method is a nonparametric method
and the global network Voronoi cross for estimating a density function from a given
K function in Section 23.4.4, one might set of observed values (Silverman, 1986).
question the difference between them. Kernel functions are usually defined for
Figure 23.17 provides an illustration of this univariate or bivariate density functions. In
difference. Figure 23.17(a) shows Dq1 (2) the context of spatial analysis, the kernel
(heavy gray lines) and Dq2 (2) (heavy black method is applied to the density of points on
lines). Points p1 , p2 , p3 , p4 , p5 , p6 , p7 a plane and is commonly employed to detect
are included in Kq1 (2), whereas p3 , p4 , hot spots (or cold spots), e.g., highly
p6 , p7 , p8 are included in Kq2 (2); con- concentrated areas of crime occurrences
sequently, the global network ordinary within a city. If the crime of interest is street
cross K function KQP (2) = Kq1 (2) + Kq2 (2) burglaries, the density of street burglaries
counts points p1 , p2 , p8 once and points on streets is to be estimated. This section
p3 , p4 , p6 , p7 twice. In contrast, as shown demonstrates a kernel method for estimating
in Figure 23.17(b) where Dq1 (2) V1 (heavy a density function on a network (Okabe et al.,
gray lines) and Dq2 (2) V2 (heavy black 2009).
lines) are depicted (broken lines indicate the A simple way of estimating a density
boundary points of the Voronoi subnetworks function from given observed points is to
V1 and V2 ), the global network Voronoi use kernel functions defined on a plane,
cross K function KVQP (2) = KVq1 (2) + referred to as planar kernel functions.
KVq2 (2) counts every point of P only once. Suppose that the coordinates (represented by
These two global network cross K functions vectors) of observed points on a network
focus on different aspects of a point pattern. L embedded on a plane are x1 , . . ., xm ,
Actually, Figures 23.14 and 23.16 show this and let k(x) be a two-dimensional kernel
difference. function defined on a plane. Then, the
Dq1(2) Dq1(2) V1
p8
p1
p6
q1 p5 p7
Dq2(2) V2
p2
Dq2(2)
p3 p4 q2
(b)
(a)
0 1 2
Figure 23.17 Comparison between the global network ordinary cross K function (a) and the
global network Voronoi cross K function (b).
estimated density function, f (x), on a plane is for such points (one million). Therefore, this
given by: method is inappropriate.
An alternative method is to use a net-

m
work kernel function, kL (t) = kL (x), x L,
f (x) = k(xi ). (23.5)
defined on L. An example is shown in
i=1
Figure 23.19, where one million points are
An example is presented in Figure 23.18, uniformly and randomly generated and the
where one million points are uniformly and density function is estimated from those
randomly generated and the kernel function is points using the one dimensional bi-weight
given by the bi-weight function (Silverman, function.
1986). This appears to be a natural extension of
One might estimate the density function, the planar kernel method, but Figure 23.19
fL (x), of points on L from the intersection proves that this method is inappropriate. As
of f (x) with L, i.e., fL (x) = f (x), x L. the points in Figure 23.19 are uniformly
This method would be fine if the estimated and randomly generated on L, the estimated
density function could produce a uniform density should be uniform; however, the
distribution function for the points that are density in Figure 23.19 is not uniform
uniformly and randomly distributed on the on L, which suggests that this method is
network. However, Figure 23.18 shows that inappropriate. One reason that the natural
fL (x) does not show a uniform distribution extension of the planar kernel method does
Figure 23.18 The density function for uniformly random points (one million) on the street
network in Kyoto estimated by the two-dimensional bi-weight kernel function.
network in Kyoto estimated by the one-dimensional bi-weight kernel function.
not work is that a plane is isotropic whereas Figure 23.21 shows hot spots of traffic
a network is not isotropic in the sense that accidents in Chiba, Japan, determined by
directions are restricted and is bounded. using the above method and assuming that
Okabe et al., (2009) provide two kernal the given network is a uniform network, i.e.,
functions Ki (t) that produces a uniform the probability of an accident occurring in a
density function for uniformly and randomly unit line segment is constant regardless of the
distributed points. location of the unit line segment. As noted in
Once a density function has been esti- section 23.2, however, it is more likely that
mated, it is easy to find hot spots. Let traffic accidents tend to occur in proportion
L(u) be a subnetwork of L that satisfies to traffic volume, as shown in Figure 23.1
fL (t) u, and let L be the subnetwork L(u) (a nonuniform network). To examine this
that satisfies: tendency of accident hot spots, the uniform
network transformation in section 23.2 is
applied to this nonuniform network, and
;
fL (t) dt fL (t) dt = 100. (23.6) the network kernel method is applied to
tL(u) tL the resulting uniform network. Figure 23.22
shows hot spots of traffic accidents for
the transformed network. These hot spots
The subnetwork L is the area of hot spots; indicate the places where traffic accidents
the probability of points occurring on the tend to occur more frequently than would
subnetworks L is high at the significance be expected from the measured traffic
level . volumes.
network in Kyoto estimated by a one-dimensional modied bi-weight kernel function.
Figure 23.21 Hot spots of trafc accidents on the uniform road network in Chiba, Japan.
Figure 23.22 Hot spots of trafc accidents on the nonuniform road network in Chiba, Japan
that takes account of trafc volume.
23.6. GIS-BASED TOOLS FOR 23.7. CONCLUSIONS

NETWORK SPATIAL
ANALYSIS, SANET In the real world, there are many network
spatial phenomena. Planar spatial methods
To analyze network spatial phenomena in are inappropriate for analyzing these phe-
theory, network spatial methods are more nomena because they commonly lead to false
appropriate than planar spatial methods. In conclusions. To avoid such false conclusions,
practice, however, developing computer pro- network spatial methods should be used.
grams for network spatial methods is more This chapter considered three classes of
difficult than that for planar spatial methods. network spatial methods: (1) a class of
Fortunately, this difficulty is overcome by network Voronoi diagrams that includes the
a free software package termed SANET nondirected Voronoi diagram, the inward
(Spatial Analysis on a NETwork), which can directed Voronoi diagram, the outward
be downloaded from http://okabe.t.u-tokyo. directed Voronoi diagram, the additively
ac.jp/okabelab/atsu/sanet/sanet-index.html. weighted Voronoi diagram, and the multi-
The functions are outlined in Okabe et al. plicatively weighted Voronoi diagram; (2) a
(2006a, b), and the manual is available from class of network K functions that includes
the above site (Okabe et al., 2004). Note that the global auto K function, the global
Version 3 is the current release but functions ordinary cross K function, the local ordinary
are updated from time to time. cross K function, the local Voronoi cross
K function, and the global Voronoi cross K England: an analysis using geographical information
function; and (3) a class of network kernel systems. International Journal of Geographical
methods that includes a method for detecting Information Science, 13: 159184.
hot spots. Clevenger, A.P., Chruszcz, B. and Gunson, K.E.
In addition to the above network spatial (2003). Spatial patterns and factors inuencing small
methods, Okabe et al. (1995) formulated vertebrate fauna road-kill aggregations. Biological
Conservation, 109: 1526.
the network of the (conditional) nearest
neighbor distance method; Miller (1994, Freund, J.E. (1998). Mathematical Statistics, (6th edn),
1999), Okabe and Kitamura (1996), Okabe Englewood Cliff: Prentice-Hall.
and Okunuki (2001), and Morita et al. (2001) Furuta, T., Suzuki, A. and Inakawa, K. (2005). The
formulated the network Huff model; Shiode k th nearest network Voronoi diagram and its
and Okabe (2004a) formulated the network application to districting problem of ambulance
clumping method; Shiode and Okabe (2004b) systems, Discussion Paper No. 0501, Center for
formulated the network cell count method; Management Studies, Nanzan University.
and Okabe et al. (2006b) proposed the Jones, A.P., Langford, I.H. and Betham, G. (1996).
network spatial interpolation method. There The application of K -function analysis to the
are many other planar spatial methods that geographical distribution of road trafc accident
outcomes in Norfolk, England. Social Science and
have not yet been extended to network
Medicine, 42(6): 879885.
spatial methods. Hopefully, the readers of
this chapter will extend these methods Levine, N., Kim, K.E., Nitz, L.H. (1995). Spatial Analysis
and enrich the field of network spatial of Honolulu Motor Vehicle Crashes: I. Spatial
Patterns. Accident Analysis and Prevention, 27(5):
analysis. 663674.
Maki, N. and Okabe, A. (2005). Spatio-

temporal analysis of aged members of a
tness club in a suburbs. Proceedings of the
ACKNOWLEDGMENTS Geographical Information Systems Association,
14: 2934.
We express our thanks to Barry Boots,
Mallick, S.A., Hocking, G.J. and Driessen, M.M.
Kei-ich Okunuki, Shino Shiode, Kyoko
(1998). Road-kills of the eastern barred bandicoot
Okano, Ikuho Yamada and Takashi Maki for (Perameles gunnii ) in Tasmania:an index of
their comments on an earlier draft. abundance. Wildlife Research, 25: 139145.
McGuigan, D.R.D. (1981). The use of relationships

between road accidents and trafc ow in
black-spot identication. Trafc Engineering and
REFERENCES Control, 22: 448453.
Anselin, L., Cohen, J., Cook, D., Gorr, W. and Tita, G. Miller, H.J. (1994). Market area delimitation within
(2000). Spatial analyses of crime. In: David Duffee networks using geographic information systems.
(ed.), Criminal Justice 2000: Volume 4. Measurement Geographical Systems, 1: 157173.
and Analysis of Crime and Justice, pp. 213262. Miller, H.J. (1999). Measuring space-time acces-
Washington, DC: National Institute of Justice. sibility benets within transportation networks.
Bashore, T., Tzilkowski, W. and Bellis, E. (1985). Geographical Analysis, 31(2): 187212.
Analysis of deervehicle collision sites in
Morita, M., Okunuki, K. and Okabe, A. (2001).
Pennsylvania. Journal of Wildlife Management,
A market area analysis on a network using GIS
49(3): 769774.
A case study of retail stores in Nisshin city. Papers
Bowers, K. and Hirscheld, A. (1999). Exploring links and Proceedings of the Geographic Information
between crime and disadvantage in north-west Systems Association, 10: 4550.
Nicholson, A.J. (1989). Accident clustering: Some Okabe, A., Satoh. T. and Sugihara, K. (2009). A
Simple Measures. Trafc Engineering and Control, kernel density estimation method for networks,
30: 241246. its computational method, and a GIS-based tool.
ODriscoll, R.L. (1998). Descriptions of spatial pattern Science (to appear).
in seabird distributions along line transects using
neighbor K statistics. Marine Ecology Progress Okabe, A. and Yamada, I. (2001). The K -function
Series, 165: 8194. method on a network and its computational imple-
mentation. Geographical Analysis, 33: 271290.
Okabe, A., Boots, B. and Satoh, T. (2006). A class of
local and global K -functions and cross K -functions, Okabe, A., Yomono, H. and Kitamura, M. (1995).
The 2006 Annual Meeting of the AAG, March 711, Statistical analysis of the distribution of points
2006, Chicago, IL. on a network. Geographical Analysis, 27(2):
152175.
Okabe, A., Boots, B., Sugihara, K. and Chiu, S.N.
(2000). Spatial Tessellations: Concepts and Appli- Okano, K. and Okabe, A. (2004). Algorithms for com-
cations of Voronoi Diagrams (2nd edn). Chichester: puting weighted network Voronoi diagrams. Papers
John Wiley. and Proceedings of the Geographic Information
Systems Association, 13: 311314.
Okabe, A. and Kitamura, M. (1996). A computational
method for market area analysis on a network. Painter, K. (1994). The impact of lighting on crime,
Geographical Analysis, 28: 330349. fear, and pedestrian street use. Security Journal, 5:
Okabe, A. and Okunuki, K. (2001). A Computational 116124.
method for estimating the demand of retail stores
Ratcliffe, H.J. (2002). Aoristic signatures and the
on a street network using GIS. Transactions in GIS,
spatio-temporal analysis of high volume crime
5(3): 209220.
patterns. Journal of Quantitative Criminology,
Okabe, A., Okunuki, K. and Shiode, S. (2004). SANET: 18: 2343.
A toolbox for spatial analysis on a network
Ratcliffe, J.H. and McCullagh, M.J. (1999). Hotbeds of
Version 2.0 040102. Center for Spatial Information
crime and the search for spatial accuracy. Journal of
Science, University of Tokyo, Tokyo.
Geographical Systems, 1: 385398.
Okabe, A., Okunuki, K. and Shiode, S. (2006a).
SANET: a toolbox for spatial analysis on a network. Ripley, B.D. (1976). The second-order analysis of
Geographical Analysis, 38(1): 5766. stationary point processes. Journal of Applied
Probability, 13: 255266.
Okabe, A., Okunuki, K. and Shiode, S. (2006b). The
SANET toolbox: new methods for network spatial Ripley, B.D. (1977). Modeling spatial patterns. Journal
analysis. Transactions in GIS, 10: 535550. of the Royal Statistical Society, Series B, 39:
172192.
Okabe, A. and Satoh, T. (2006). Uniform network
transformation for points pattern analysis on a non- Saeki, M. and MacDonald, D.W. (2004). The effects of
uniform network. Journal of Geographical Systems, trafc on the raccoon dog (Nyctereutes procyonoides
8(1): 2537. viverrinus) and other mammals in Japan. Biological
Conservation, 118: 559571.
Okabe, A., Satoh, T., Furuta, T., Suzuki, A., Okano, A.
(2008). Generalized network Voronoi diagrams: Shiode, S. and Okabe, A. (2004a). Network variable
Concepts, computational methods, and applications. clumping method for analyzing point patterns on a
International Journal of Geographical Information network. The 2004 Annual Meeting of the AAG,
Science, 130. Philadelphia, PA.
Okabe, A., Satoh, T and Sugihara, K. (2009) A Shiode, S. and Okabe, A. (2004b). Cell count
kernel density estimation method for networks, method on a network with SANET and its
Its computational method and a GIS-based tool, application. International Conference on Geoinfo-
International Journal of Geographical Information matics and Geographical Systems Modelling, Beijng,
Science (to appear). China.
Silverman, B.W. (1986). Density Estimation for network K -function. Landscape Ecology, 19(5):
Statistics and Data Analysis. London: Chapman 491499.
and Hall.
Yamada, Y. and Thill, J.-C. (2004). Comparison of
Spooner, P.G., Lunt, I.D., Okabe, A. and Shiode, S. planar and network K -functions in trafc accident
(2004). Spatial analysis of roadside Acacia analysis. Journal of Transport Geography, 12:
populations on a road network using the 149158.
24
Challenges in Spatial Analysis
Michael F. Goodchild1
24.1. INTRODUCTION as a paradigm for primary and secondary

education, and projects have been funded
This is a time of unprecedented opportunity around the world to advance spatial literacy
for spatial analysis. More people than (e.g., www.spatial-literacy.org).
ever have access to the Global Position- At the same time the field faces substantial
ing System for direct measurement of challenges, as it attempts to take advantage
location on the Earths surface; to the of these new opportunities. This chapter
products of high-resolution remote-sensing addresses four: the challenge raised by the
satellites; and to the manipulative power of continuing rapid advance in computing and
geographic information systems (GIS). Some networking technology; the challenge of
of these technologies are encountered in addressing the temporal dimension through
everyday life, through sites such as Google the analysis of dynamic phenomena; the
Earth (earth.google.com), Google Maps challenge posed by the immense popularity
(maps.google.com), and Microsoft Windows of Web sites that offer rudimentary forms of
Live Local (live.local.com), and through spatial analysis to a user community that has
the widespread use of in-vehicle navigation little or no formal educational background in
systems. Several academic disciplines are this area; and the challenge of formulating
recording a spatial turn, a new and in some a new philosophy of science that reflects the
cases renewed interest in space and location actual conditions under which spatial analysis
as a framework for analysis, understanding, is used in todays research and problem-
and presentation of results. A recent pub- solving environments.
lication (National Research Council, 2006) The four topics by no means exhaust
has defined and explored spatial thinking the full set of issues facing the field.
Many readers will have their own ideas, and for analyzing and visualizing spatial data.
the chapter on the future of spatial analysis The results of each stage of analysis could
that follows include discussions of additional be fed into further stages, and data could
issues. Meanwhile, the four considered in be managed within a single environment
this chapter are very much a personal list, that recognized a range of data formats.
and reflect the authors own interests and Comparisons were frequently drawn with
concerns at this point in the long history of the statistical packages (e.g., Goodchild,
spatial analytic methods. 1987), which similarly offered easy access
to a multitude of statistical techniques,
along with the necessary housekeeping
functions.
24.2. COMPUTING AND At the time, each GIS software product was
NETWORKING TECHNOLOGY organized into a single, monolithic package.
In the 1980s such packages were typically
In the early 1990s a substantial literature installed on minicomputers such as the VAX
accumulated on the opportunities offered by or Prime, but in the late 1980s the transition
GIS. In 1988 the U.S. National Science to personal Unix workstations and later to the
Foundation had established the National PC and Mac had opened the possibility of
Center for Geographic Information and Anal- an entirely individualized toolbox installed
ysis (NCGIA) at three sites: the University on the researchers desk. GIS was likened
of California, Santa Barbara; the State to a butler an intelligent assistant working
University of New York at Buffalo; and with the user to solve problems, knowing
the University of Maine. One of NCGIAs the foibles and preferences of the user, and
objectives was to advance the use of GIS taking on those tasks that the user found too
across the sciences, as a platform for spatial complex, tedious, time-consuming, or inac-
analysis, so it was considered important to curate if performed by hand. Abler (1987)
assess progress to date, and to identify and hailed GIS as geographys equivalent of the
remove impediments to the greater use of microscope or the telescope, a powerful tool
spatial analysis. NCGIA organized a spe- that allowed researchers to gain insights that
cialist meeting on the topic that eventually were simply impossible with the normal
led to a book (Fotheringham and Rogerson, senses and intuition.
1994), and several additional papers appeared From this perspective, the power of GIS
(Anselin and Getis, 1992; Burrough, 1990; would be judged simply by the proportion
Ding and Fotheringham, 1992; Goodchild, of known techniques of spatial analysis that
1987; Goodchild et al., 1992; Openshaw, it supported, by the accuracy with which
1990; and for a later perspective see it implemented each method, and by the
Goodchild and Longley, 1999). extent to which it prevented misuse and
Underlying this spate of funding and misinterpretation of results. There were many
writing was the simple premise that GIS complaints about this time regarding the
provided an ideal means of implementing success of GIS against these objectives.
the known techniques of spatial analysis, Commercial software developers were seen
as well as techniques that might be devel- as insufficiently interested in supporting
oped in the future. A single package, if advanced spatial analysis, being content
sufficiently sophisticated, could offer easy instead to direct their efforts at satisfying
and largely painless access to an abundance the needs of their more wealthy corpo-
of robust, scientifically sound techniques rate and agency customers, whose interests
CHALLENGES IN SPATIAL ANALYSIS 467
tended to be more in data management and simple or thin, and most actual computation
inventory. GIS designers failed to ground occurs remotely on a more powerful server.
their products in sound theory, preferring In the extreme, the user needs only a Web
intuitive terms and explanations over formal browser such as Microsoft Explorer or
and mathematical ones. Because of this Netscape. Instead of installing a thick piece
lack of formal grounding, each vendor of software, such as a GIS package, the user
tended to adopt its own terms, formats, and obtains many if not all of its services from
structures, leading to endless proliferation a remote server. For example, the task of
and an apparently insurmountable lack of finding the optimum route from an origin
interoperability. to a destination through a street network,
It was in this context that the Web appeared the task performed by many Web sites
on the scene, and the Internet emerged such as mapquest.com, no longer requires
as the dominant and indeed quickly the the user to obtain a powerful GIS and
only network for computer communication. the necessary database representing roads
Since 1993 and the release of Mosaic and streets, and to mount both on his
the impact of communications technology or her desktop machine, since the same
has been so profound as to change the service can be obtained free from the server.
entire landscape of GIS and spatial analysis. The user need only specify the origin and
Sui and Goodchild (2001) have argued destination to the server using a Web browser;
that the metaphor of the butler is no the results are then sent back from the
longer appropriate instead, GIS technology server and displayed locally using the same
now constitutes a medium through which Web browser.
people communicate what they know about In principle all GIS functions and all types
the Earths surface that is comparable to of spatial analysis could be organized in this
traditional media such as print, radio, and way. Instead of installing and operating their
television. As such, its issues are dramatically own software, researchers could send data
different from those of earlier decades. to sites where sophisticated forms of spatial
Bandwidth, interoperability, and metadata analysis were performed. Researchers devel-
have largely replaced computing speed, oping new forms of spatial analysis would
storage capacity, and the sophistication of find it far easier to offer their techniques as
desktop software as major concerns of GIS Web services than to engage in the time-
users. Even the most sophisticated of users consuming distribution of software, and users
no longer program, relying instead on the would benefit by not having to spend time
incredibly abundant resources of the Web, obtaining, installing, and maintaining their
easy mechanisms for sharing code, and own copies.
new forms of software architecture. The Server GIS is now common among public
following three sections explore some of agencies interested in providing public access
these issues, and their implications for spatial to their spatial data, along with simple
analysis. capabilities for query and visual display.
Many local governments provide access to
their land-ownership and property taxation
databases in this way, allowing users to query
24.2.1. Server GIS
details of their own and other properties,
In the clientserver computing paradigm using a map interface.
that underlies the Web, the user or clients In practice, however, server GIS has had
hardware and software are comparatively a limited impact to date, particularly for
more sophisticated analysis, for a number of and implementers of spatial-analytic routines

reasons: will need to consider the options carefully.
However, it is clear that the nature of
computing is changing, as many services
There is no consensus on the appropriate business
move to a central, server-based model.
model for server GIS. Desktop GIS software
generates income for its developers through
sales and licensing, providing a healthy income
stream, and developers of new methods of 24.2.2. Process scripts
spatial analysis have sometimes used this same
approach. Users of server GIS typically expect Research tends to proceed in stages, as
services to be provided free, leaving the providers problems are formulated, data are collected
of such services to generate income through and checked, analysis is performed, and
advertising and the licensing of services to third results are scrutinized. Each stage feeds
parties. Routing services, for example, can be forward to the next, and also back to the
found embedded in the Web sites of on-line travel previous stages, as projects are rethought and
agencies and real-estate companies, presumably as hypotheses are tested and modified. By
at some cost to them. Moreover, the software
the time the project is finally completed, the
for server GIS tends to be more expensive per
investigator may well have lost track of some
copy than conventional desktop GIS software
of the stages, and may find it difficult to
(although open-source packages are available,
e.g., mapserver.gis.umn.edu). provide the necessary details in publications
and reports. Somewhat paradoxically, the
Server GIS is most effective when the volume research community has invested heavily in
of data that needs to be input by the user is the infrastructure to create and share data,
limited, and when the data needed are common and in the software to process them, but
to a large number of users and applications. has not made similar investments in the
A routing service, for example, requires only techniques for management of the research
an origin and destination, and uses a generic process. The problem grows more severe as
database of streets and roads stored at the server.
research becomes more collaborative, with
Moreover, such databases change frequently, and
many participants who may or may not
there are enormous economies of scale if all users
communicate in person, and as the tools of
can rely on a single version. Geocoding or address
matching, the task of converting street addresses research become more complex.
to coordinates, has become a popular function for Against this background it is not surprising
server GIS for the same reason. that many vendors of GIS and spatial-
analytic packages have created macro- or
Lack of interoperability continues to be an issue scripting languages that allow researchers
for server GIS. There are no standards for to express complex analyses as sequences
the description of services, though several geo- of operations, and to store, manage, and
portals now provide limited directories (Maguire execute such sequences as simple commands.
and Longley, 2005; Goodchild, et al., in press).
A script in digital form is immediately more
Extensive reformatting may be needed to make
easily shared, managed, and documented than
data readable by remote services, and the results
its equivalent in the jotted and invariably
returned may similarly need to be reformatted to
be useful locally. incomplete hand-notes of the researcher.
Modern scripting languages allow complex
hierarchical structures, since a single line in a
The choice between local and server-based script can invoke other scripts and programs,
computing is a complex one, and developers and allow sequences of operations to be
repeated many times in such applications as comprehensive languages that can be used to
Monte Carlo simulation. describe and share computational methods. In
However, the design of an appropriate the past, mathematics provided an adequate
scripting language is a very sophisticated language, and models were effectively shared
task, requiring a high level of knowledge using algebraic representation, through the
of the needs of the research community, pages of learned journals and books. But
across many disciplines and domains. Simple todays computational environments present
scripting languages merely allow the user to a somewhat different problem, since the
invoke any of the commands of the package, language of mathematics lies too far from
but more sophisticated languages imply actual implementation, and cannot readily be
a recognition of the fundamental elements used to express the entire algorithmic basis
from which complex spatial analyses are of spatial analysis.
built. If the granularity of the scripting
language is too coarse, researchers will find
it too difficult to express the full range of
24.2.3. Interchangeable software
applications and if it is too detailed, the
components
script will be unnecessarily long.
The work of Tomlin (1990) provided the Early computer software was comprised
first successful effort at a generic scripting of programs, integrated pieces of software
language for GIS, albeit only for congruent that performed well-defined functions. Early
layers of raster data. The language was GIS developed in this context, and by
adopted by several packages, and several the early 1990s a fully featured GIS such
extensions were made. Van Deursen (1995) as ESRIs ARC/INFO included millions of
analyzed the operations required to support lines of code, all designed to be compiled
dynamic modeling in a raster environment, and executed together to provide a single,
including the implementation of finite- integrated computing environment.
difference models, in what became the This approach to software was both
scripting language for PCRaster (pcraster. redundant, in the sense that large amounts
geo.uu.nl), a raster-based package heavily of code might never be executed by a given
oriented towards environmental modeling. user, whose interests might focus only on
Takeyama and Couclelis (1997) described a small number of functions; and costly, in
a sophisticated language for the manipulation the sense that it was difficult for programmers
of pairs of raster cells, providing support to pull pieces of code out of one package to
for the analysis of spatial interactions. More be reused in another. Even today, the average
broadly, all of these approaches are strongly user of a package such as Microsoft Word
related to the languages developed in image will likely never have invoked many of the
processing, or image algebras. functions in this very large and complex
To date, however, there have been no com- package.
parably ambitious efforts to devise languages Several attempts to break out of this mold
for vector data, or for the broader framework were made in the 1980s and 1990s. One
that spans both discrete objects and continu- of the more successful was the concept
ous fields. Dynamic GIS that addresses both of a subroutine library, a collection of
space and time also lacks comprehensive standard routines that could be called by
scripting languages. The effectiveness of programs, avoiding the need for repetitive
future spatial analysis clearly depends on reprogramming. Subroutine libraries became
the communitys ability to devise simple yet common in areas such as statistics, since they
allowed comparatively sophisticated users common to more than one form? Perhaps
to develop new programs quickly, relying they will lead eventually to a new approach
on standard subroutines for many of the to teaching in spatial analysis, in which these
programs functions. The idea was diffi- fundamental building blocks are the elements
cult to implement for less sophisticated of a course, rather than the analytic methods
users, however, since it required each themselves.
to possess a substantial knowledge of
programming.
Contemporary approaches to software
emphasize a rather different approach, 24.3. TIME AND DYNAMICS
in which sections of reusable code, or
components, can be freely combined during Many authors have commented on the gener-
the execution of a program. Standards have ally static nature of GIS, and the difficulty of
been developed by vendors such as Microsoft representing time and dynamic phenomena.
that allow compliant components to be freely Most attribute this to the legacy of the
linked and executed. Ungerer and Goodchild paper map, which inevitably emphasizes
(2002) describe one such application, in those aspects of the Earths surface that
which ESRIs ArcGIS and Microsofts Excel remain relatively static, over such dynamic
have been combined to solve a standard phenomena as events, transactions, and
problem in areal interpolation (Goodchild, flows. Several comprehensive reviews have
et al., 1993). Functions that are native appeared, and much progress has been made
to the GIS, such as polygon overlay, are in building spatial databases that include time
obtained from ArcGIS, while operations on (Langran, 1993; Peuquet, 1999, 2001, 2002).
tables, such as matrix multiplication, are This same emphasis on the static is evident
obtained from Excel. The entire analysis in the toolkit of spatial analysis, with its
is invoked through commands written in focus on cross-sectional data. In part this
Visual Basic, a form of scripting language, is due to the difficulty of creating and
though other general scripting languages acquiring longitudinal data; to the administra-
such as Python might also be used. Both tive difficulties that statistical agencies face
packages are compliant with the Microsoft in funding and maintaining data-collection
COM standard, allowing the components that programs through time; to the changing
form the building blocks of each to be freely nature of the Earths surface, and the impact
combined and executed. that this has on data-collection procedures
Approaches such as these are breaking and the definitions of reporting zones; and
down the barriers that previously existed to the changing nature of human society, and
between different types of software in this its notoriously short attention span. Efforts
case, ArcGIS and Excel and allowing much such as the National Historic GIS project
more flexible forms of analysis. They invite (www.nhgis.org) have attempted to overcome
an entirely new approach to software design, these difficulties, building systems that allow
in which fundamental components with users to construct longitudinal series from
widespread application are combined to meet the census for example, but they remain
the needs of specific applications. They also comparatively few and far between.
call for answers to a fundamental question: While much progress has been made,
what are the basic building blocks of spatial the analysis of spatio-temporal data remains
analytic software, and to what extent are the a comparatively underexplored area, and
operations invoked by each form of analysis a source of substantial challenges for the
community. The next two subsections address then one might reasonably ask whether
two of these in greater detail. similar principles exist for spatio-temporal
data, and whether such principles might
usefully inform the development of a more
dynamic approach to GIS and spatial analy-
24.3.1. Fundamental laws
sis. What is the spatio-temporal equivalent of
Much of the nature of GIS and many of Toblers First Law, for example? Does spatial
the architectural choices that have been heterogeneity apply also in time? What
made over the past several decades are relationships exist between the parameters of
ultimately attributable to the nature of the spatio-temporal and spatial dependence and
data themselves the ways in which spatial heterogeneity? Are other general principles
data are special. Anselin (1989) has identified of spatio-temporal phenomena waiting to be
two general characteristics, and Goodchild discovered?
(2003) has discussed several more.
Spatial dependence describes the widely
observed tendency for the variance of spatial
24.3.2. Dynamic form
data to increase with distance. To paraphrase
Tobler (1970), nearby things are more Spatial dependence and spatial heterogeneity
similar than distant things, a principle that are both properties of how the Earths surface
has become known as the First Law of looks, capturing aspects of its form. Studies
Geography (Sui, 2004). All of the methods of form have a long history in science, but
used to represent geographic phenomena in have given way in the long term to a desire
GIS are to some extent reliant on the validity to understand process to understand how
of this principle. For example, there would systems work, and the effects of human
be no value in representing topography with intervention. In geomorphology, for example,
isolines if elevation did not vary smoothly, many scientists of the 19th and early 20th
and there would be no value in aggregating centuries were content to describe landforms,
areas into contiguous regions if the latter devising elaborate systems of morphological
could not be designed with relatively low classification, and only later did interest
within-region variance. develop in understanding how landforms
Anselins second principle is spatial het- came to be, and the processes that left
erogeneity, the tendency for the Earths such characteristic footprints on the surface.
surface to exhibit spatial non-stationarity. Today, of course, such studies of form are
All of the various techniques developed largely discredited, as they are in many other
over the past two decades for local spatial disciplines.
analysis are based on this principle, since Because of its essentially static legacy,
they attempt to summarize what is true much GIS analysis has focused on form,
locally, rather than what is true globally. and has been criticized for doing so. It is
The Geographically Weighted Regression of comparatively difficult to tease insights into
Fotheringham, et al. (2002) falls into this process from cross-sectional form, though it
category, as do the LISA technique of Anselin is perhaps sometimes possible to eliminate
(1995) and the local statistics of Getis and false hypotheses about process. GIS has been
Ord (1992). accused of being the last manifestation of
If such principles are generally true of the quantitative revolution that occurred in
spatial data, and are useful in guiding geography in the 1960s, when Bunge (1966)
the development of computational systems, and others attempted to draw insights from
the similarity of forms found on the human citizen with a computer and a high-speed
and physical landscapes (see, for example, Internet connection access to many of
the critique of Taylor, 1990). the data sets and computational functions
Very little is known, however, about the of GIS, and in some cases have even
characteristic forms that may exist in spatio- exposed the more sophisticated functions
temporal phenomena. Hagerstrand (1970) of spatial analysis. For example, anyone
and others have examined the movements requesting driving directions from one of
of individuals in space and time using these sites receives answers that result from
three-dimensional displays, in which the two the execution of a complex algorithm that
spatial dimensions form the horizontal plane was previously the reserve of operations
and time forms the vertical axis. Much of this researchers and specialists in spatial opti-
work focuses on similarities that may exist in mization.
the forms of such tracks, and the implications The methods of cartography and related
they may have for process. We know from disciplines are complex, and it is no surprise
the work of many researchers (e.g., Janelle therefore that sophisticated tools in nave
and Goodchild, 1983) that different social hands can produce mistakes. A suitable
conditions lead to dramatically distinct track example concerns the Greenwich meridian,
forms, as for example in the differences and its position when displayed in Google
between the daily tracks of single mothers, Earth. Many users of this site have noted that
with their orientations to both workplace the zero of longitude misses the Greenwich
and daycare, and the tracks of workers Observatory by approximately 100 m, and
in families in which only one of two have posted comments, some of which
adults works. conclude that a serious mistake has been
The development of greater support for made by Google, and by extension that
time in GIS may lead to many other the georegistration of imagery on the site
recognizable patterns in spatio-temporal data, is poor. In reality, the WGS84 (World
and to a rebirth of interest in the study of Geodetic System of 1984) datum, now widely
spatio-temporal form. A new generation of adopted around the world, does not place
analytic techniques is needed that extracts the Greenwich Observatory at exactly zero
meaningful pattern from the mass of tracks longitude, despite the international treaty that
displayed in the visualizations of Kwan and established it there in 1884 and the position
Lee (2004) and others, and links such patterns shown in Google Earth appears to be correct
to hypotheses about process. to within a few meters.
Although their support for spatial anal-
ysis is extremely limited, these sites have
clearly provided the general public with
24.4. SPATIAL LITERACY access to a rich resource, and thousands of
people have been empowered to create their
In the past few years a remarkable series own applications. The recent publication
of Web sites have brought the sophisti- Mapping Hacks (Erle, et al., 2005) describes
cated functions of GIS and spatial analysis many fascinating examples, but contains
much closer to the general public. While not a single reference to the cartographic
effective use of GIS requires extensive literature. At the same time students who
training, and in many cases advanced work have endured many hours of lectures and lab
at the undergraduate level, technologies exercises to become competent in GIS may
such as Google Earth have given every be frustrated to realize that a child of ten
can create a computationally complex fly-by material might fit in the already stove-piped
using Google Earth in a few minutes. curriculum.
It seems clear that in part as a result
of these developments the demand for
basic knowledge of the principles of spatial
analysis, GIS, geography, cartography, and 24.5. BEYOND TRADITIONAL
related fields for basic spatial literacy PRACTICE IN SCIENCE
is perhaps two or more orders of magnitude
out of alignment with the supply. Education When Harvey wrote his well-known and
in these topics cannot be confined to a few highly influential Explanation in Geography
advanced undergraduates, and to campuses (Harvey, 1969) the dominant form of
lucky enough to have faculty interest, if it scientific practice centered on the individual
is to be accessible to the numbers of people investigator, whose methods followed a set of
now exposed to and enthusiastically adopting well-defined principles. For example, every
these tools. In this respect, spatial analysis experiment was to be reported in sufficient
faces an unprecedented challenge, to make detail to allow its replication by another
itself known to a much larger community independent investigator. Every numerical
than previously. result was to be reported with a level of
There are several ways in which such precision that matched its accuracy. Every
a challenge might be met, by concerted search of the literature was to be complete
effort on the part of the spatial-analysis and comprehensive, so that the investigator
community. One is to bring spatial literacy could demonstrate knowledge of all previous
into the general-education or core curriculum and relevant work and prove the new
of institutions of higher education, making works originality. The principle of Occams
its material accessible and eligible for credit Razor a willingness to adopt the simplest
for the vast majority of undergraduates. of several competing explanations was
Courses in other kinds of literacy are already universally accepted, as was the notion
available in this form; the argument needs that all conclusions could be subject to
to be made that familiarity with spatial empirical test and possible rejection. The
analysis and GIS represents another, and goal of science was complete explanation,
arguably a more powerful form of literacy or in statistical terms an R2 of 1. When
that should be part of the education of every sample data were analyzed, all numerical
citizen. Another strategy would be to develop results were to be subject to tests of
a larger and more visible set of courses statistical significance, to prove that they
in the informal education sector, making were not likely to be simply artifacts of the
spatial literacy part of on-line and certificate particular sample chosen, but properties of
programs, and exposing its contents through the population from which the sample was
libraries, museums, and other institutions. presumed to be drawn. All terms were to be
A third is to work to introduce spatial rigorously defined, and vague terms were to
literacy earlier in the educational hierarchy, be replaced by ones that met the standard of
in high school and even elementary school. objectivity rigorous and shared definition,
Valiant efforts have been made in this such that two investigators would always
direction in the past, but they remain agree on the outcome when the definition
minimal in comparison with the size of was applied.
the primary and secondary sectors, and These standards are of course collectively
there is much confusion about where such unattainable in all circumstances. They may
be more attainable in some disciplines than when it was no longer possible to believe
others, and certainly it is possible to imagine that every aspect of a computational analysis
a physicist having no difficulty adhering could be replicated by hand, given enough
to them, and being fiercely critical of any time. Operating systems were perhaps the
study that appeared to relax them. But first such area of computing by 1990 they
researchers in the general domain of this had advanced to the point where it was no
book clearly encounter situations in which longer possible to believe they were the work
one or more of them is distinctly problematic. of one person, or that any one individual fully
This is not to say that one should therefore understood every aspect of their operation.
reject them outright, and follow the lead of Today these failures are commonplace. The
those who have looked for alternatives to documentation of our more sophisticated
scientific principles rather, they constitute software, including GIS, is often not suffi-
goals to which research should attempt cient to detail every aspect of an analysis, and
always to aspire, while admitting that it may it may be impossible to discover exactly how
sometimes fall short. This section explores a given system computes a standard property,
three of these issues in some detail, and such as slope, from a given input (Burrough
then argues for a renewed approach to and McDonnell (1998) detail some of the
scientific methodology that better reflects the options, but many more can be hidden in
real conditions under which spatial analysts the details of a given implementation). In
currently work. effect the developers of software, many
of them operating in for-profit commercial
environments, have become authorities that
must be trusted, and it is difficult to submit
24.5.1. Collaboration, replicability,
their products to rigorous and exhaustive test.
and the black box
Moreover, researchers now find it
Before the widespread adoption of com- increasingly effective to work in teams,
puting, it was customary for instructors in each team member providing some specific
statistics courses to insist that each student expertise. Funding agencies often express
be able to carry out a test by hand, before a willingness to fund research that brings
using any computational aids. Only then, together teams from many disciplines, in the
it was argued, would the student fully interests of greater collaboration and cross-
understand the process involved, and be able fertilization of ideas. But such arrangements
to replicate it later. In this simple world it inevitably lead to situations in which no
was possible to assume that every researcher one individual knows everything about an
knew every detail of every analysis, and analysis, and members of the team have
that the published version of the research little alternative but to trust each other, just
would include sufficient detail to allow others as researchers often have little alternative
to repeat the experiment and replicate the but to trust software.
results.
This principle has come under fire in
recent decades, for a number of reasons.
24.5.2. Keeping the stakeholders
Computational aids have advanced to the
happy
point where it is not possible for any
one individual to comprehend fully all of Tools such as GIS invite researchers to
the algorithms involved. The author recalls become involved in the processes of policy
passing a threshold, some time around 1990, formulation and decision making. The very
architecture of GIS, with its database of this guarantees, however, that the results
local details and its procedures representing presented to the stakeholders are in fact based
general principles, invites engagement with on good science. It is easy, with a little
the ultimate users of research, since it allows thought, to manipulate the outcomes of such
decision makers to investigate the effects processes to achieve hidden objectives. For
of manipulating outcomes in local contexts, example, when stakeholders are presented
and gives them many useful tools for with five alternatives and asked to choose
implementing the results of analysis. A new one, it is easy to see how the outcome
subdiscipline, public-participation GIS, has might be manipulated by presenting a set
grown up to study these issues, and to that includes the desired outcome, plus
improve the use of GIS and spatial analysis four obviously unacceptable red herrings.
in public decision making. Experience suggests that stakeholders will
Many of the arguments for the use of find no difficulty in assigning relative mea-
technology in support of decision mak- sures of importance to factors, irrespective
ing for spatial decision support sys- of whether the factors are or are not
tems (Densham, 1991) center on the commensurate, and whether or not any
benefits of these tools in settings that definition of importance has been advanced
involve the potentially conflicting views and agreed.
of multiple stakeholders. Much has been
written about spatial-analytic techniques that
support multiple views, and address multiple
24.5.3. Accuracy, uncertainty,
criteria (Voogd, 1983; Eastman, 1999; Thill,
and cost
1999; Malczewski, 1999). GIS may allow
stakeholders to express their own views as All measurements are subject to error,
sets of weights to be given to relevant and science has developed sophisticated
factors. Saatys Analytic Hierarchy Process techniques for measuring instrument accu-
(Saaty, 1980) is a widely used technique racy, and for determining how accuracy
for eliciting such weights from stakeholders, impacts the results of analysis. The basic
and for deriving consensus weights and principles of error analysis have been adapted
measures of agreement. Stakeholders benefit to the specific needs of geographic data by
from the visualization capabilities of these Heuvelink (1998) and others, and statistical
systems, which allow them to see the effects models have been developed for most of the
of decisions in readily understood ways. standard geographic data types.
They gain the impression that decisions are Uncertainty is often defined as the degree
made scientifically, with abundant use of to which data leaves the user uncertain about
mathematics and computation, and are led the true nature of the real world. As such it
to believe that these approaches represent presents a greater problem, because it derives
a more objective, more desirable approach to not from errors in measurement, but from
debate and conflict resolution. vagueness in definitions, lack of detail, and
It is all too easy in such circumstances numerous other sources. When definitions are
to see stakeholder satisfaction as the pri- vague, there can be no objective definition of
mary goal of the exercise. If stakeholders truth, but only the less satisfactory concept
leave the room believing that a rigorous, of consensus. A scientist steeped in tradi-
scientific process has been conducted then tional methodology would react by rejecting
everyone can feel that a useful exercise has vague terms entirely, replacing them with
come to an acceptable conclusion. None of terms that have rigorous definition, and are
therefore capable of supporting replicability. have been made over the past decade
Subjective terms such as warm, cold, in data warehouses, spatial data centers,
near, and far would be replaced by well- and geo-portals, with a view to facilitating
defined scales of temperature measurement the discovery and sharing of spatial data.
and distance. Metadata standards have been devised that
Nevertheless, GIS and to a lesser extent support search, by allowing researchers to
spatial analysis clearly exist at the interface hunt through catalogs looking for data that
between the rigorous, scientific world of might meet their needs.
well-defined terms and replicable experi- Yet almost certainly data discovered in
ments, and the vague, intuitive world of this way will fail to meet the exact needs
human discourse. Many users of GIS appear of the researcher. The data set will be too
happy to work with vaguely defined classes generalized, not sufficiently current, too inac-
of vegetation or land use, and there has been curate, or inadequate in another of a myriad
much interest in building user interfaces to of possible ways. In these circumstances
GIS that come closer to emulating human it is inevitable that research objectives
ways of reasoning and discovering. Nave become modified to fit the properties of
geography has been defined as a field that the available data, if the alternative is an
studies the simplifications humans often exercise in field data collection that may
impose on the world around them, and be impossibly expensive. But the prevailing
writers have speculated about the potential methodology of science says nothing about
for systems that also simplify that think such compromises, maintaining instead that
more like humans do. data must be exactly fit for purpose, and
In the past decade or so there has been providing no basis on which users can
much interest in the application of fuzzy find compromises between cost on the one
sets, rough sets, and related ideas in spatial hand, and accuracy or fitness for use on
analysis. There seems to be some degree the other.
of intuitive appeal in the idea of assigning
degrees of membership to a class, even
when the class is not itself well defined.
24.5.4. Summary
Methods have been devised for eliciting
fuzzy membership values from professionals, The previous three sections have presented
from remotely sensed data, and from other examples of the ways in which spatial
sources, and for displaying these values in analysts increasingly find the traditional prin-
the form of maps. All of these methods ciples of scientific methodology inadequate
stretch the norms of science, by arguing that as a guide to practice. While much of
it is possible to observe and measure useful science is concerned with the nomothetic
properties despite a lack of agreement on the goal of discovering general principles that
definitions of those properties. As such, they apply everywhere in space and time, spatial
demand a re-examination of the basic tenets analysis is increasingly concerned also with
of scientific method. the variations that exist in such principles
Finally, spatial analysts find themselves from place to place, and in the ways
today in a world overflowing with data. in which such principles are placed in
Satellite images, digital topographic maps, local context to solve problems and make
and a host of other sources provide an decisions. As Laudan (1996) has argued,
unprecedented opportunity for new and there is no longer an effective method-
interesting research. Massive investments ological distinction between science and
problem-solving, since the same principles basic activities as wayfinding and activity
apply to both. In summary, spatial analysts planning.
face an important challenge, to develop a How the field responds to these challenges
new methodological understanding that is remains to be seen, of course. Undoubtedly
consistent both with the traditional tenets of new and better techniques will be discovered
the scientific method, and with the realities and published in the next few years, new
of current practice. code will be written, and new application
areas will be described. But the challenges
described in this chapter seem to go beyond
such business-as-usual, and to require dis-
24.6. CONCLUSIONS cussion across the entire community. Such
community-wide debate has occurred very
The four major sections of this chapter rarely in the past, yet is more feasible
have argued that spatial analysis faces many than ever with todays communications
challenges at this time, but it also faces technologies.
unprecedented opportunity. More people than
ever are aware of its potential, and the tools
to implement it are more sophisticated and
powerful than ever. ACKNOWLEDGMENTS
Discussions of the importance of spatial
analysis often focus on one or two partic- Support of the U.S. National Science Foun-
ularly compelling application domains, and dation through award BCS 0417131 is
it may well be that by making the case for gratefully acknowledged. The author also
spatial analysis in support of improved public benefited from an E.T.S. Walton Fellowship
health, for example, or better response to which allowed him to spend much of the
emergencies, it will be possible at the same 20056 academic year at the National Centre
time to promote the entire field. On the other for Geocomputation, National University of
hand, one might argue that identifying spatial Ireland, Maynooth.
analysis too clearly with one application
domain tends to render the case for other
applications more difficult. Essentially, it
can be very difficult to promote a set of
techniques that are applicable to almost
NOTE
everything the case for spatial analysis is 1 National Center for Geographic Information
everywhere, and yet at the same time it is and Analysis, and Department of Geography,
nowhere. University of California, Santa Barbara, CA 93106-
4060, USA. Phone +1 805 893 8049, Fax +1 805
The argument for spatial literacy made 893 3146, E-mail good@geog.ucsb.edu
in section 24.4 seems especially relevant in
this context. Many skill areas are important
across a vast array of human activities,
including skill in language, in mathematics,
and in logic. Spatial analysis should not REFERENCES
be a highly specialized area of technique
Abler, R.F. (1987). The National Science Foundation
that is only accessible to experts, but National Center for Geographic Information and
should be part of every citizens basic Analysis. International Journal of Geographical
set of skills, and used every day in such Information Systems, 1: 303326.
Anselin, L. (1989). What is Special About Spatial Journal of Geographical Information Systems, 1:
Data? Alternative Perspectives on Spatial Data 327334.
Analysis. Technical Report 894. Santa Barbara,
Goodchild, M.F. (2003). The fundamental laws
CA: National Center for Geographic Information and
of GIScience. Paper presented at the Summer
Analysis.
Assembly of the University Consortium for
Anselin, L. (1995). Local indicators of spatial associa- Geographic Information Science, Pacic Grove, CA,
tion LISA. Geographical Analysis, 27: 93115. June. Available: http://www.csiss.org/aboutus/
presentations/les/goodchild_ucgis_jun03.pdf
Anselin, L. and Getis, A. (1992). Spatial statistical
analysis and geographic information systems. Annals Goodchild, M.F., Anselin, L. and Deichmann, U.
of Regional Science, 26: 1933. (1993). A framework for the areal interpolation of
socioeconomic data. Environment and Planning A,
Bunge, W. (1966). Theoretical Geography. 2nd edn. 25: 383397.
Lund Studies in Geography Series C: General and
Mathematical Geography, No. 1. Lund, Sweden: Goodchild, M.F., Fu, P. and Rich, P. (in press).
Gleerup. Sharing geographic information: an assessment of
the geospatial one-stop. Annals of the Association
Burrough, P.A. (1990). Methods of spatial analysis of American Geographers.
and GIS. International Journal of Geographical
Information Systems, 4: 221223. Goodchild, M.F., Haining, R.P. and Wise, S. (1992).
Integrating GIS and spatial analysis: problems and
Burrough, P.A. and McDonnell, R.A. (1998). Principles possibilities. International Journal of Geographical
of Geographical Information Systems. New York: Information Systems 6: 407423.
Oxford University Press.
Goodchild, M.F. and Longley, P.A. (1999). The future
Densham, P.J. (1991). Spatial decision support systems. of GIS and spatial analysis. In: Longley, P.A.,
In: Maguire, D.J., Goodchild, M.F. and Rhind, D.W. Goodchild, M.F., Maguire, D.J. and Rhind, D.W.
(eds), Geographical Information Systems: Principles (eds), Geographical Information Systems: Principles,
and Applications. pp. 403412. Harlow, UK: Techniques, Management and Applications.
Longman Scientic and Technical. pp. 235248. New York: Wiley.
Ding, Y. and Fotheringham, A.S. (1992). The inte- Hgerstrand, T. (1970). What about people in regional
gration of spatial analysis and GIS. Computers in science? Papers of the Regional Science Association,
Environmental and Urban Systems, 16: 319. 24: 721.
Eastman, J.R. (1999). Multi-criteria evaluation and GIS. Harvey, D. (1969). Explanation in Geography.
In: Longley, P.A., Goodchild, M.F., Maguire, D.J. New York: St Martins Press.
and Rhind, D.W. (eds), Geographical Information
Systems: Principles, Techniques, Management and Heuvelink, G.B.M. (1998). Error Propagation in
Applications. pp. 225234. New York: Wiley. Environmental Modelling with GIS. Bristol, PA: Taylor
and Francis.
Erle, S., Gibson, R. and Walsh, J. (2005). Mapping
Hacks: Tips and Tools for Electronic Cartography. Janelle, D.G. and Goodchild, M.F. (1983).
Sebastopol, CA: OReilly Media. Transportation indicators of space-time autonomy.
Urban Geography, 4: 317337.
(2002). Geographically Weighted Regression: The Kwan, M.-P. and Lee, J. (2004). Geovisualization of
Analysis of Spatially Varying Relationships. Hoboken, human activity patterns using 3D GIS: A time-
NJ: Wiley. geographic approach. In: Goodchild, M.F. and
Janelle, D.G. (eds), Spatially Integrated Social
Fotheringham, A.S. and P. Rogerson, (eds), (1994). Science, pp. 4866. New York: Oxford University
Spatial Analysis and GIS. London: Taylor and Francis. Press.
Getis, A. and Ord, J.K. (1992). The analysis of spatial Langran, G. (1993). Time in Geographic Information
association by distance statistics. Geographical Systems. London: Taylor and Francis.
Analysis, 24: 189206.
Laudan, L. (1996). Beyond Positivism and Relativism:
Goodchild, M.F. (1987). A spatial analytical perspective Theory, Method, and Evidence. Boulder, CO:
on geographical information systems. International Westview Press.
Maguire, D.J. and Longley, P.A. (2005). The emergence Sui, D.Z. (ed.), (2004). Forum: on Toblers First Law of
of geoportals and their role in spatial data Geography. Annals of the Association of American
infrastructures. Computers, Environment and Urban Geographers, 94(2): 269310.
Systems, 29(1): 314.
Takeyama, M. and Couclelis, H. (1997). Map dynamics:
Malczewski, J. (1999). GIS and Multicriteria Decision integrating cellular automata and GIS through
Analysis. New York: Wiley. Geo-Algebra. International Journal of Geographical
Information Science, 11(1): 7391.
National Research Council (2006). Learning to Think
Spatially: GIS as a Support System in the K-12 Taylor, P.J. (1990). GKS. Political Geography Quarterly,
Curriculum. Washington, DC: National Academies 9(3): 211212.
Press.
Thill, J.-C. (1999). Spatial Multicriteria Decision Making
Openshaw, S. (1990). Spatial analysis and geographical and Analysis: A Geographic Information Sciences
information systems: a review of progress and Approach. Brookeld, VT: Ashgate.
possibilities. In: Scholten, H.J. and Stillwell, J.C.H.
Tobler, W.R. (1970). A computer movie simulating
(eds), Geographical Information Systems for Urban
urban growth in the Detroit region. Economic
and Regional Planning. pp. 153163. Dordrecht:
Geography, 46: 234240.
Kluwer.
Tomlin, C.D. (1990). Geographic Information Systems
Peuquet, D.J. (1999). Time in GIS and geographical
and Cartographic Modeling. Englewood Cliffs, NJ:
databases. In: Longley, P.A., Goodchild, M.F.,
Prentice Hall.
Maguire, D.J. and Rhind, D.W. (eds), Geograph-
ical Information Systems: Principles, Techniques, Ungerer, M.J. and Goodchild, M.F. (2002). Integrating
Management and Applications. New York: Wiley. spatial data analysis and GIS: a new implementation
using the Component Object Model (COM). Interna-
Peuquet, D.J. (2001). Making space for time: issues
tional Journal of Geographical Information Science,
in spacetime representation. Geoinformatica, 5(1):
16(1): 4154.
1132.
van Deursen, W.P.A. (1995). Geographical Informa-
Peuquet, D.J. (2002). Representations of Space and
tion Systems and Dynamic Models: Development
Time. New York: Guilford.
and Application of a Prototype Spatial Mod-
Saaty, T.L. (1980). The Analytic Hierarchy Process: elling Language. Nederlandse Geograsche Studies
Planning, Priority Setting, Resource Allocation. 190. Utrecht: Koninklijk Nederlands Aardrijkskundig
New York: McGraw-Hill. Genntschap/Faculteit Ruimtelijke Wetenschappen
Universiteit Utrecht.
Sui, D.Z. and Goodchild, M.F. (2001). Guest Editorial:
GIS as media? International Journal of Geographical Voogd, H. (1983). Multi-Criteria Evaluation for Urban
Information Science, 15(5): 387389. and Regional Planning. London: Pion.
25
The Future
for Spatial Analysis
Reginald G. Golledge
25.1. SPATIAL ANALYSIS PAST for valid and reliable conclusions from
AND PRESENT active and innovative research. A variety
of exploratory and confirmatory, qualita-
The future of geography is inextricably bound tive and quantitative procedures have been
to the future of spatial analysis. Why? Simply developed or explored for relevance, and
because spatial analysis captures the essence relevant procedures and methodologies have
of a support system for the science and been globally termed spatial analysis.
technology involved in geospatial thinking While some parts of the discipline are
and reasoning. The latter are the distinct content to imitate the theories, methods, and
and unique contributions of geography to technologies of other physical or human
the universe of academe, government, and sciences, or to copy the research designs
business. and practices of the various humanities,
For about 50 years, geographers have parts of geography have vigorously explored
been slowly but surely building a structure the development of unique means of think-
of theories, models, methods, technologies, ing, reasoning, analyzing, and represent-
and vocabulary that anchor the disciplines ing geospatial information. Spatial analy-
claim to being a science. This effort has sis has been perhaps the most vigorous
occurred both in the physical and human of these throughout the years. Lately, it
components of the discipline. A common has been complemented by the enthusiasm
theme in both efforts has been the search for technology particularly Geographic
Information Systems (GIS). However, most manner, and covers both physical science
academic practitioners realized quickly that (natural science) and human science (the science
GIS needed a wider base: a base of analysis as involved in comprehending humanhuman and
well as its forte in representation. To provide humanenvironment relations). It provides a
this base, Geographic Information Science menu for ensuring valid and reliable reasoning
in the forum of knowledge accumulation.
(GISc) developed. Spatial analysis proved
to be a primary support system for GISc, Spatially referenced data either in relative
and the two themes have converged to give (qualitative) or absolute (quantitative) form
geographic researchers and teachers powerful has become the currency of todays information
new ways to explore the massive data banks processing society. Spatial analysis is exclusively
of the new digital world. developed for analyzing place-based digital
Many geographers would not agree with information. It includes the use of topologies,
my opening statement. I would challenge geometry, fuzzy logic, and multidimensional
them to disprove it or to make valid reasoning capabilities, all directed towards the
claims for other dimensions of the discipline. spatial domain. Thus, it is useful at all scales from
One could not support a contrary argument the nano and micro levels to the gigantic scale of
universe-wide exploration, and is being diffused
based on geographys traditional role of
through areas as different as neurological exper-
collecting facts about the earths physical
imentation, archeological reconstructions of past
or human environment. While other aspects
civilizations, and the search for extraterrestrial
of the discipline continue to have much understanding.
to offer in terms of understanding the
relations between people and places, it is It is generally agreed that geographers have a
not always possible to differentiate the unique way of examining problems (Beck, 1967;
geographic/geospatial component from the Uttal, 2000), and that diagrammatic (including
more general humanities, political sciences, map-based) reasoning provides insight into many
or social sciences thinking and reasoning that problems that is unattainable using conventional
drives much of this work. Thus, it has the reasoning such as verbal, text-based, and mathe-
potential to contribute to the accumulation matical procedures. This uniqueness begins with
the accepted signicance of the spatial domain
of general social and cultural knowledge
(something that has been rather neglected by
more than to geospatial knowledge. This
other disciplines and by many parts of the
can be viewed as a positive result if one human side of geography), and then expresses
accepts that integrated disciplinary thinking itself via its emphasis on visualization and
is likely to be of future importance, but spatialization processes. Data is collected with
does little to support or enhance the image some form of spatial coding (Klatzky, et al.,
and practice of geography in the real 1990; Fujita, et al., 1993), and is represented
world. by the spatializations such as at (2D) paper
So, why does Spatial analysis hold the key maps, 3D models, and on-screen image-based
(in my opinion) to the future of geography? representation (graphs and graphics), all of which
To reflect on this, I offer the following require a particular form of interpretation. Faithful
thoughts (see also Goodchild, 2001): representation is one of the prerequisites for
spatial analysis.
Spatial analysis is a unique and special During the second half of the 20th century,
contribution by geographers to the ongoing geography matured by borrowing (sometimes
trend of integrated science. Here, science is wholesale, sometimes modied) theories from
interpreted in both a qualitative and quantitative other disciplines. As the profession gained more
THE FUTURE FOR SPATIAL ANALYSIS 483
condence in its ability to offer innovative, expanded through academe, government, and
exploratory and conrmatory investigations of business, many disciplines have laid claim to
spatial and geospatial concerns, there nally being the principal originator and purveyor of GIS
emerged a series of spatially explicit theories technology. But none have been able to dispute
of the relations that were being uncovered geographys claims to the special conuence
by research in the spatial (and specically of GIScs search for relevant spatial theory,
geographical) domains. These theories tended its representational capability, and the many
to be investigated and validated using spatial procedures of spatial analysis that add meaning
analysis. They included timespace associations, and usefulness, validity and reliability to GIScs
spatial decision making, spatial choice, location problem solving activities. The integration of
theory, locationallocation processes, population GIS and spatial analysis has been inuential
density gradients, the form and structure of in moving GISc-related research beyond mere
built environments, geospatial learning, move- technology to scientic status. Via this link,
ment behavior at different scales, and other spatial analysis has been forming the basis
areas that are explicitly spatial (see earlier for new theories that incorporate human
chapters). And, as the profession learned to environment relations, e.g., spatial knowledge
think and reason spatially (rather than socially, acquisition (Golledge, 1978; Montello, 1998) and
politically, or economically), the processes new theories of data and data manipulation
involved in spatial analysis continued to grow in (Goodchild, 2004; Couclelis, 2003).
importance.
The majority of geographers (not just those

engaging in spatial analysis) use place-based
reasoning. In many cases, this provides the only 25.2. THE ROLE OF SPATIAL
link to the spatial domain in that it spatializes ANALYSIS
non-spatial phenomena such as social class,
political ideology, and nancial perspectives. In the process of re-establishing itself as
Often, the tie to place is loose and general but a viable academic discipline (i.e., after its
still provides the wherewithal to discuss place-
role in examining what was where on
to-place differences. But the latter is frequently
the earths surface, and pursuing the descrip-
incidental to the reasoning process and is
used largely for illustrative or representational
tion of the results of humanenvironment
purposes. But one of the strengths of spatial interactions, was made somewhat redundant
analysis is its explicit focus on place-to-place by remotely sensed image processing pro-
variation across all scales of investigation; i.e., cedures), geographers have had to justify
a principal purpose of spatial analysis is to their continual existence or go out of
record and to help explain the existence of business. Some leading departments such
such differences and why they occur. In this as Chicago and Michigan have, in effect,
way, spatial analysis provides a support system gone out of business, while many others
that makes spatial thinking paramount, and not have been merged with geology, geological
incidental. science, environmental science, sociology,
urban planning/architecture/design, or other
Spatial analysis procedures have become part
and parcel of GISc software. Part of GISc
groups. Despite these dire warnings, much
has been tied to understanding what spatial of the discipline has gone its pedestrian way,
analysis can do to clarify and validate spatial virtually ignoring the global change from a
thinking across all scales of investigation. The partly known and image-based world to a
interdependence of GISc and spatial analysis group of information societies and a digital
has been forged. As the use of GIS has world.
But these later trends have provided a global climate change or understanding human
rationale and need for specific spatially-based spatial abilities;
means of examining, processing, and repre-
made geographic training and expertise a
senting the data that is becoming increasingly
valuable commodity in the job market;
available in digital form. The need for such
procedures is not confined to geography. brought the realization that, as globalization
Other social, behavioral, political, economic, of societies and their essential activities occur,
and health sciences, for example, have geographers have a unique contribution to
discovered that their data banks are being make in the form of geo-education, spatial
spatialized by geocoding of occurrences and concept recognition, and spatial thinking and
attributes, and that traditional measures of reasoning;
statistical analysis do not account for the
encouraged exploring the possibility of enhancing
effects of spatial coincidence or variation.
geography in the K-12 system of education.
Hence, the demand for spatial analysis
is growing in these disciplinary areas.
I predict it will continue to grow. It is I anticipate that each of these contributions
the goal of every spatial theorist to see will become more important in both the near
various methods for spatial analysis of data and distant futures.
incorporated into every standard statistical To speculate about the what and where
package, thus imprinting this contribution of spatial analysis contribution to the future
by geographers on the domain of every of geography, consider the following:
spatially oriented discipline. One recent
example of this recognition is the inclusion
of a chapter on GIS in a recent Handbook Recognition that spatial analysis applies and can
of Environmental Psychology (Bechtel and be used at all scales from the nano scale to
Churchman, 2002) and a decision by the the universal. We already have evidence that
American Psychological Association (APA) researchers in microbiology, neurology, DNA, and
stem cell research (as well as other research
to support an advanced institute on GIS and
areas not traditionally identied with geography)
spatial analysis (probably in 2007).
are facing questions concerning representation
and analysis of their spatially-based ndings.
Both GISc and spatial analysis potentially have
an important contribution to make in these
25.3. NEW DIRECTIONS FOR areas (e.g., via spatialization, representation, and
SPATIAL ANALYSIS analysis).
The interweaving of GISc and spatial analysis One of the most important frontiers for future
has given to geography a justifiable scientific research is to investigate how the mind works.
base that, for most of geographys history, has Great advances already have been made in
discovering how the brain works. Indeed, one
been lacking. This new basis has:
of the most intriguing investigations from a
geographers viewpoint is the extent to which
increased the public and academic image of place cells exist (OKeefe and Nadel, 1978) and
geography as a serious scientic discipline; form a basis for internal data manipulations that
constitutes the minds contribution to solving
improved the standing and reputation of geog- spatial problems. The question arises then, if
raphy as a useful contributor to the examination place cells do exist, what light is shed on
and solution of problems such as comprehending how data is sensed and coded and stored in
THE FUTURE FOR SPATIAL ANALYSIS 485
the brain? What happens when we start to or symbolic representation. For example,
think spatially? Is there a particular pattern of a glance at the psychology literature on
neural excitation when we think spatially? Can spatial perception and cognition reveals
spatial analysis help both to investigate this little comprehension of the role space plays
and add a newly emerging area for geospatial in information gathering and information
investigation?
processing in the large uncontrolled spaces
The world is digitizing. We already have more of the real, inhabited world, and various
data from satellites than can conceivably be graphic and image-based representations of
analyzed in the present or the near future. The this world.
question arises as to whether the existing form There also appears to be a growing
of spatial analysis can contribute to performing demand for applied geography, particularly
data mining and, as necessary, add new and in government and business domains. We
valuable components to existing search engines. have already seen such a demand within
A question for the future may be: are there the business community as with the use
yet other levels of spatial analysis we have of locationallocation models and use of
not yet thought about but which could be location-based services. Spatial analysis is a
an essential part of recovering the spatial
key to expanding this demand. The result
relations contained in these massive archival
should be a more widespread acceptance of
structures?
the contributions that geography can make to
everyday life and practice throughout local
As disciplines such as psychology and and global societies.
cognitive science experiment more in the real In my opinion, therefore, spatial analysis,
world (in addition to ongoing research in perhaps in conjunction with the use of GIS
laboratories and virtual systems), and as the technology and a GISc search for reliable
importance of scale effects and the significant and valid bases for knowledge accumulation,
role of place-to-place variation in forming will provide an avenue for maintaining and
attitudes and behavior is realized, so too has expanding the image and acceptance of
the demand for spatial analysis started to geography as an integrated science that has a
emerge. There is much room for geographers positive capacity to assist the search for new
to both teach about and disseminate spatial knowledge, and improve our general quality
analysis procedures within and beyond the of life.
profession of geography. For decades, we As a final statement, allow me to raise
have been borrowing from measurement a question that is critical to the future
theory from maths symbolic thinking strate- of geography itself. Are we producing
gies, from mathematical models developed graduates who can compete for jobs
in economics, and analytic procedures from in academic, government, or business
psychology and mathematical statistics; it is marketplaces? Sadly, the answer for most of
time to return this favor by encouraging the the profession is NO! But spatial analysts
use of spatial analytic techniques for pro- and GIS programs are doing this, very
cessing relevant geospatial data and drawing successfully. To return to my opening
attention to the very specific contributions statement: the future of geography as a
of space in the construction of knowledge. viable discipline is inextricably tied to
At the very least, psychologists and cognitive the continued development and use of
scientists should become aware of both the spatial analysis. Weve already seen the first
advantages and disadvantages of spatializ- indicators of this in terms of which students
ing data for graphic, map, image-based, are getting jobs today outside of academe.
As a discipline, we must become more aware completion without vision. Geographical Analysis,
of this need and do our best to ensure that 25(4): 295314.
those areas contributing most to this pattern Golledge, R.G. (1978). Learning about urban environ-
are well supported in the near and more ments. In: Carlstein, T., Parkes, D. and Thrift, N.
distant future. (eds), Timing Space and Spacing Time, Volume I:
Making Sense of Time, pp. 7698. London: Edward
Arnold.
Goodchild, M.F. (2001). A geographer looks at spatial
ACKNOWLEDGMENTS information theory. In: Montello, D.R. (ed.), Spatial
Information Theory: Foundations of Geographic
The research for this chapter was partly Information Science. Proceedings, International Con-
supported by NSF Grant # BCS0239883 ference, COSIT 2001, Morro Bay, CA, September,
pp. 113. New York: Springer.
(Spatial Thinking) and by UCTC Grant #
SA4655 (Assessing Route Accessibility for Goodchild, M.F. (2004). GIScience: geography, form,
Wheelchair Users). and process. Annals of the Association of American
Geographers, 94(4): 709714.
Klatzky, R.L., Loomis, J.M., Golledge, R.G.,
Cicinelli, J.G., Doherty, S. and Pellegrino, J.W.
REFERENCES (1990). Acquisition of route and survey knowledge
in the absence of vision. Journal of Motor Behavior,
Beck, R. (1967). Spatial meaning and the properties 22(1): 1943.
of the environment. In: D. Lowenthal (ed.), Montello, D.R. (1998). A new framework for under-
Environmental Perception and Behavior (Research standing the acquisition of spatial knowledge
Paper No. 109, pp. 1829). Chicago: Department in large-scale environments. In: M.J. Egenhofer
of Geography, University of Chicago. and Golledge, R.G. (eds), Spatial and Temporal
Bechtel, R.B., and Churchman, A. (eds). (2002). Reasoning in Geographic Information Systems,
Handbook of Environmental Psychology. New York: pp. 143154. New York: Oxford University Press.
John Wiley & Sons.
OKeefe, J. and Nadel, L. (1978). The Hippocampus as
Couclelis, H. (2003). The certainty of uncertainty. a Cognitive Map. Oxford: Clarendon Press.
Transactions in GIS, 7(2): 165175.
Uttal, D.H. (2000). Seeing the big picture: Map use and
Fujita, N., Klatzky, R.L., Loomis, J.M. and Golledge, the development of spatial cognition. Developmental
R.G. (1993). The encoding-error model of pathway Science, 3: 247264.
Index
Page numbers in italics indicate figures and tables
Abler, R.F. 466 application domain 678, 68

Abrahart, R.J. 402, 412, 413 Arbia, G. 11, 116, 117, 194
accessibility 477 ARC/INFO 469
Accession software 32 Arc Macro Language (AML) 34
accuracy 475 ArcGIS 470
Achabal, D. 432 ArcGIS software 33, 36
adaptation 400 ArcInfo software 33
adaptive neuro-fuzzy inference system areal frameworks 8
(ANFIS) 233 areal interpolation 11
adaptive sampling 1978 areally referenced data 321
affline anisotropy 164 Arlinghaus, S.L. 406
Agarwal, T. 79 Arlinghaus, W.C. 406
age dependent model of exposure window and Armstrong, M. 163, 167, 202, 413
latency 363 artificial neural network (ANN) 228, 2334,
agent-based models (ABM) 2889, 41011 41113
aggregation 7, 9, 15, 89, 110, 112, 277 application modes 41213
aggregation bias 20 Arvanitis, L.G. 227
aggregation operators 2345 Aschengrau, A. 357
Agrawal, R. 64, 78, 84 Aspie, D. 203
Ahamed, T.R.N. 227 association 234
Ahlqvist, O. 228 association rule-based approaches, spatial data
Ahn, C.-W. 236 mining 79
Akaike Information Criteria (AIC) 165, association rule mining problem 78
221, 247 association rules 78
Akyurek, Z. 227, 231 Assuno, R. 313
Amrhein, C.G. 116, 185 asymmetric mapping 119
Anderson, D.R. 221 asymptotic t-test 2623
Andrews, D.W. 259, 260 asymptotically optimal algorithm 226
Andrienko, G. 47, 56, 57 at risk background 335
Andrienko, N. 47 Atkinson, P.M. 92, 94, 117, 118, 119, 159, 161,
Anile, M.A. 227 163, 177, 178, 402
anisotropic dependency structures 8 atomistic fallacy 21
anisotropy 162, 164, 193, 196 attractors 407
Anselin, L. 17, 21, 31, 32, 69, 75, 76, 91, attribute errors, as independent 10
96, 118, 255, 257, 260, 261, 262, 263, attributes, spatial 75
264, 265, 266, 267, 270, 301, 343, Aubry, P. 185
466, 471 Augusteijn, M.F. 67
anti-monoticity 80 Auto-Regressive Moving Average (ARMA)
Applebaum, W. 424, 425, 431 model 98
Appleby, S. 404, 405 autocovariance 18
488 INDEX
automated zoning procedure 32 posterior sampling methods 3246

autonomy 410 software 3401
average radii of circular market areas, Shinjulu, univariate spatial process models 3303
Tokyo 448 Beck, R. 482
average run length (ARL) 3457, 34951 Beguin, H. 401
Avery, K.L. 116 Bellander, T. 357
Axtell, R. 411 Bellhouse, D.R. 190
Ayeni, O. 197 Ben-Shlomo, Y. 360
AZM software 32 Benenson, I. 409, 410, 411
Benguigui, L. 405
Bailey, T.C. 27, 36, 193, 313, 343 Bera, A. 263, 264, 266
Baker, A.M. 183 Berberoglu, S. 177
Baker, R.G.V. 419 Berger, J.O. 323
balanced loss function 330 Berry, B.J.L. 183, 420
Baldi, P. 388 Berry, J.K. 26
Ballas, D. 279, 283, 284, 285, 287, 288, 291 Bertin, J. 44
Balmer, M. 411 Besag, J.E. 19, 255, 301, 310, 311, 344, 358
Baltagi, B.H. 263 Besag L-function 72
Banerjee, S. 332, 338 best linear unbiased estimator (BLUE) 18
Bardossy, A. 229 best linear unbiased predictor (BLUP) 332
Barker, 360 Best, N. 338
Barnard, G.A. 3045 Beyea, J. 357
Barnes, R.J. 203 Bezdek, J.C. 229, 232
Barnett, V. 73 Bhattacharjee, A. 156
Barnsley, M. 405 Bian, L. 117
Barry, R.P. 267 Biehl, K. 112, 115
Bartlett, M.S. 304 bioterrorism 3389, 344
basic areal units 106 Birkin, M. 279, 2823, 285, 293
Basile, R. 269 Bishop, C.M. 376, 379, 386, 387, 388
Batagelj, V. 54, 55 bishop contiguity 1278
Batey, P.W.J. 288 correlation matrices 131, 1315, 133
Batty, J.M. 25, 36, 116, 277 correlogram 137
Batty, M. 288, 404, 405, 406, 408, 409, 411 neighbors in 128, 134
Baxter, R.S. 118 relationship paths 134
Bayes, T. 208 subset of unstandardized weight matrix 129
Bayes theorem 213 Bithell, J. 316
Bayesian inference 208, 21317, 3236, 391 bivariate correlation 15
Bayesian spatial analysis Bjrnstad, O.N. 93
Bayesian spatial regression and kriging 3313 black box 474
case event data 3346 Black Mesa, Arizona 306, 307
contour lines of estimated random spatial Blackman, G.E. 343
effects 334 Blair, P. 116
Gaussian process 3301 Boarnet, M.G. 268
Gibbs sampler 332 Bocher, P.K. 177
health data notation 322 Bodum, L. 50
hierarchical models 3267 Bogaert, P. 193
likelihood 3223 Boman, M. 288, 410
Markov Chain Monte Carlo method (MCMC) Bone, C. 229, 234
3279 Bonferronis correction 97, 219, 252, 350
model GOF measures 32930 Bong, C.W. 118
models for disease mapping 33340 Bonner, M.R. 357
notation 322 Boolean set theory 226
parameter estimates 333 Boolean spatial features 778
point-referenced spatial data notation 322 Boots, B. 96, 100, 101, 113, 117, 262
INDEX 489
bootstrapping 389 Chawla, S. 66

Borcard, D. 100 Chen, M.S. 399
Bordogna, G. 230, 235 Chen, X. 268
Bossomaier, T. 229 Cheng, T. 232
Botia, J.A. 235 Cheung, D.W. 84
boundaries 20, 106, 225 chi-squared tests 2078
bounded rationality 411 Chils, J.-P. 159, 168, 179
Boyd, D.S. 233 Chiou, A. 235
Bradley, D. 3567 choropleth maps 4950, 52
Bragato, G. 233 and area cartograms 49
Braimoh, A.K. 230 Christakos, G. 165, 200, 203
Breslow, N. 349 Church, 436
Brindley, P. 6 Civilis, S.P.A. 83
British Household Panel Survey (BHPS) Clark, P.J. 343
279, 288 Clark, W.A.V. 116, 431
Brock, W. 257 Clarke, G. 277, 430
Brody, J.G. 357 Clarke, G.P. 279, 2823, 284, 287
Brown, D.G. 232, 236 Clarke, K.C. 404, 405, 409, 413
Brown, L.A. 34 Clarke, M. 292, 293
Brueckner, J.K. 257 classical data mining 82
Brunsdon, C. 21, 31, 97, 208, 217 interest measures of patterns 83
Buckeridge, D.L. 344 classical inference 208, 20913
Buckner, R.W. 430 location of study area and samples 213
Bunge, W. 471 classification 73
Burnhan, K.P. 221 different approaches 73
Burridge, P.A. 262, 263 supervised and semi-supervised approach 74
Burrough, P.A. 159, 229, 230, 233, 405, Clayton, D.G. 349
466, 474 Cliff, A.D. 8, 91, 93, 155, 255, 261, 262, 266,
307, 356
Caldwell, S.B. 283 Clifford, P. 15
Canada GIS (CGIS) 27 close-coupled component object model (COM)
Cao, H. 84 software 334
CAR see conditional autoregression (CAR) cloud cover 12
Card, S.K. 42 cluster analysis 412
Carlin, 323, 325 clustered sampling 190
Carr, J.R. 50 clustered spatial association rule 79
cartograms 15, 4850 clustering 21920 see also clusters
cartographic modeling 26 analytic tools 3026
case-control epidemiological study 371 and associated p-values 309
Casella, B.P. 324 defining 301
Castro, M.C. 350 detecting 30610, 344
categorical data 1001 detection in residential histories 3635
Cavailhs, J. 406 estimate of standardized K function 311
Celik, J.A. 82 focus on static distributions 355
cells 106 looking for 301
cellular automata (CA) 40810 nearest neighbor analysis 3078
census 11, 29, 108, 11012 questions answered by different methods 318
census geography 11012 second order measures and spatial
central place theory 420 scale 3089
centric systematic sampling 185 clustering-based map overlay approach, spatial
chaos 406, 4078 data mining 79
Charlton, M. 14, 31 clustering point process 712
Charnpratheep, K. 227, 231, 235 clusters see also clustering
Chauvin, Y. 388 analytic tools 3026
490 INDEX
background information 302 conditional autoregression (CAR) 16, 218, 260

contouring relative risk 31518 conditional probability measures 80
data for detection 302 conditional simulation 168, 178
defining 301 confidence intervals 211
detecting 300, 31018 congressional redistricting 10610, 107
detection 344 conjugate forms 324
estimating spatial intensity 31315 Conley, T.G. 260, 268
illustrative data set 306, 307 constant error variance, violation of
kernel estimation 31315, 314, 31617 assumption 19
log relative risk 317 constant risk 302
looking for 301 contiguity matrices 667, 67
Monte Carlo simulation 3045 continuity 7
potential 311 continuous variables, continuous function 160
questions answered by different methods 313, continuous weighting schemes 139
31617, 318 Cook, D. 259
SaTScan results 314 Cooper, L.G. 422, 425
scan statistics 31113 coregionalization 168
useful texts 299, 310 correlated heterogeneity 337
co-location rule approaches, spatial data mining correlation-based queries, spatial data
7881 mining 81
co-location rule discovery 78 correlation, bivariate 15
co-locations 78, 80 correlation matrices 112, 1306
discovering patterns 79 bishop contiguity 131, 1315, 133
interest measures 80 inverse distance 1523, 153
Cobb, M.A. 235 limit models 1501, 151
Cochran, W.G. 183 nearest neighbor 140, 141
Cockings, S. 11 negative exponential model 155
cokriging 168, 177 Pace and Gilleys continuous version of nearest
cold spots 96, 457 neighbors 148
collaboration 56, 474 queen contiguity 133, 1356, 136
Collia, D.V. 357 rook contiguity 132, 135, 135
Commission on geoVisualization of the three nearest neighbors 144
International Cartographic Association two nearest neighbors 143
(ICA) 44 correlograms 1368
common shocks framework 260 inverse distance 153, 154
CommonGIS 48 irregularly located point data 1456
comparability, and population size 10 limit models 1501, 151
competitive learning networks 391 nearest neighbor 146
complete spatial randomness (CSR) 71, 303, negative exponential model 154, 156
3034, 310 Pace and Gilleys continuous version of nearest
spatial clustering 71 neighbors 1489, 149
complexity 399401 regular lattice data 137
complexity term 3867 CORSIM 283
computational approach 21820 Corsten, L.C.A. 190, 193
computational data mining 45 cost 4756
computational efficiency, improving 84 Couclelis, H. 29, 4023, 408, 409, 469, 483
computational exploration 45 count errors 11
computational process, spatial data coupling, GIS and spatial analysis 314
mining 812 coupling strategies, GIS and spatial analysis 30
computational science (CS) 3979 covariogram, choice of fitting model 196
computer intensive tests 98 covariogram estimation, optimal geometric
computing and networking technology 46670 designs 1913
concepts, inexact 226 Cox, L.A. 197
conceptualization 6 Cox-Poisson 93
INDEX 491
Craig, C.S. 422, 431 Deane, G. 267

Cressie, N. 7, 8, 14, 65, 67, 71, 75, 78, 92, 93, 97, decision-making, and microsimulation 2912
98, 117, 165, 183, 185, 188, 190, 191, 200, decision tables (DT) 227
218, 255, 304, 332 DeGenst, A. 230
crime rate distribution 221 Delfiner, P. 159, 168, 179
crisp set theory 226 Delmelle, E.M. 89, 200, 203
critical value 210 DeMers, M.N. 26
cross K-function 78 Demar, U. 50, 51, 57
cross product statistic 8 Denison, D. 306
Cross, V. 230 Densham, P. 288, 4745
cross-validation, feedforward neural density function for uniformly random points
networks 3889 estimated by one-dimensional bi-weight
cross-validation score 247 kernel function 459
Csillag, F. 97, 100, 101 density function for uniformly random points
cumulative sum (CUSUM) charts see cusums estimated by one-dimensional modified
Curran, P.J. 117, 159 bi-weight kernel function 460
cusums 3458 density function for uniformly random points
Cuzick, J. 307, 308, 358 estimated by two-dimensional bi-weight
Cybenko, G. 378 kernel function 458
dependence, mean-variance 910
D-matrix 53 Depending Areal Units Sequential Technique
Dacey, M. 155 (DUST) 1945
daily mobility 357 Derudder, B. 227, 228, 233
Dale, M.R.T. 90, 91, 92, 93, 94, 95, 96, 97, 98 detail, loss of 7
Dalenius, T. 190 Deutsch, C.V. 162, 164, 168, 200
Dalton, R. 183 Deviance Information Criteria (DIC) 329
Daoud, M. 405 DiBiase, D. 42
data diffusion process 13
availability 277, 300, 476 Diggle, P.J. 93, 178, 219, 303, 305, 314, 316
choice-based 4278 Digital Elevation Model (DEM) 197
collection and storage 399, 400 Ding, G. 4278
graphic representation of 43 Ding, Y. 34, 466
incompleteness 1112 direct assignment, fussy set membership 2301
interaction with 43, 45 directed network Voronoi diagrams 44850
means of collection 1 disaggregation 277
quality 10 discrete choice modeling 232
relationships among non-spatial and spatial 64 discrete weighting schemes 139
remotely sensed 11 DISCUSS system 228
sources 6 disease clustering
spatio-temporal 4701 adjusting for covariates and other risk
data errors 9 factors 365
data generation, for spatial variation 16 bladder cancer example 36871
data transformations 19 consequence of static world view 3589
data visualization 42 exposure traces 3678
datasets, availability 161 identifying focus 372
Davidson, R. 263 logistic model and probability of being
Davidsson, P. 288 a case 3656
Davies, H. 2834 randomization accounting for risk factors and
Davies, R. 356 covariates 3656
de Almeida, C.M. 409 unrealistic assumptions 3578
De Cola, L. 404, 405 disease latency models 35962
de Graaff, T. 262, 265 disease mapping 33340
De Keersmaecker, M.-L. 405 case event data 3346
Deadman, P.J. 411 congenital anomalies deaths 339
492 INDEX
count data 3367 Ellner, S.P. 262

disease map reconstruction 338 Engelen, G. 406, 408, 409
example 33940 epochs 382
larynx cancer incident locations 335 Epperson, B.K. 93
parametric forms 336 Epstein, J.M. 410, 411
posterior expected relative risk estimates for Erle, S. 472
congenital abnormalities data 340 error component models 260
posterior probability of exceedance for error propagation 11
congenital abnormalities data 340 errors
putative health hazard assessment 3378 adjacent pixel values 11
at risk background 335 spatially dependent 91
surveillance 3389 ESDA see exploratory spatial data analysis
disease surveillance 348 (ESDA)
dispersal process 13 Esser, I. 409
distance-based approach, spatial data estimation, spatial regression 2657
mining 7980 Evandrou, M. 286
distance classes 96 Evans, A. 292
distributions 93 Evans, F.C. 343
Dobson, A.J. 19 Evans, S. 34
Dolk, H. 357 event-centric model, spatial data mining 80
Donthu, N. 421, 429 Excel 470
Dorling, D. 14, 15 exchange and transfer process 13
dose-response relationships 201 Expectation-Maximization algorithm 72
double length artificial regressions (DLR) 263 explicit space 411
Dougherty, D.E. 412 exploratory data analysis (EDA) 14, 42, 221
Dowers, S. 413 exploratory spatial data analysis (ESDA) 1415,
Dubes, R. 64 31, 42
Dubin, R.A. 94, 259 exponentially weighted moving average (EWMA)
Dungan, J.L. 95, 168 chart 3478
Dunn, J.C. 232 exposure assessment 357
Durbin-Watson test 262 exposure traces 3678
Durlauf, S. 257 exposure windows 3603
Durvasula, S. 432 extent, effects on global spatial statistics 956
Dykes, J.A. 56
dynamic form 4712 Fabrikant, S.A. 53, 54
dynamical systems 406 Fairbairn, D. 233
DYNASIM 283 Falck, W. 93
DYNASIM2 283 Falkingham, J. 282, 285, 286
Fayyad, U. 42
Eagle, T.C. 429 feasible generalized least squares (FGLS) 266
Eastman, J.R. 228, 235, 475 feedforward neural networks 372 see also neural
Eaton, B.C. 427 networks
Eck, J.E. 314, 316 activation functions 3779
ecological fallacy 201, 118 architecture 376
EDA see exploratory data analysis (EDA) batch mode of training 383
edge correction algorithms 96 bootstrapping 389
edge effects 96, 134 conjugate gradient methods 383
education 4723, 4856 cross-validation 3889
Edwards, R. 307, 308, 358 description 3769
effective sample size 15, 97 early stopping 3878
Efron, B. 389 error backpropogation 3846
eigenvalues 267 generalization performance 38891
Eli, R.N. 413 gradient descent optimization 3823
Elliott, P. 357 information processing 378
INDEX 493
network complexity 3868 Funahashi, K. 378

network diagram for single hidden layer neural fundamental laws 471
network 377 Furuta, T. 452
network diagrams 3767 fuzziness 225, 476
network training 37982 fuzzy adaptive sampler 227
Newtons method 383 fuzzy analytical hierarchical process
parameter optimization 3824 (AHP) 228, 2312
pattern mode of training 3823 fuzzy c-means 229
potential developments 391 fuzzy clustering 2323
quasi-Newton methods 383 fuzzy decision tables (FDT) 227
real-time learning 384 fuzzy geodemographics 290
regularization 3867 fuzzy hypercube 229
stochastic global search procedures 384 fuzzy ISODATA 232
test sets 388 fuzzy k-means 232
topology 376 fuzzy kappa 228, 229
Feng, C.M. 290 fuzzy kriging 229
Ferreyra, R.A. 1967 fuzzy set theory
Ferri, M. 203 accomplishments in spatial analysis 22630
field view 67 accuracy 227
Fienberg, S. 338 applications 226
Fingleton, B. 255 assigning membership 2304, 235
Finnoff, W. 388 assignment by transformation 2324
Firat, A. 230 association 234
first law of geography 7, 66, 208, 422, 471 challenges and research issues 2356
Fischer, M.M. 376, 381, 384, 386, 3878, 389, combining memberships 2345
400, 402, 412, 413 direct assignment 2301
Fisher, P.F. 50, 119, 229 and GIS 230
Fisher, R.A. 51 indirect assignment 2312
Fishers information measure 18 and mainstream spatial analysis 236
fixed-row matrix 52 map comparison 228
fixed spatial weighting function 245, 246 sampling 227
Flake, G.W. 399, 404, 405, 406, 407, 408, 410 simulation models 22930
Flexer, A. 412 and spatial analysis 2256
Florax, R. 257, 262, 265 statistical data analysis 234
Flowerdew, R. 116, 119, 290 underlying idea 226
fluctuations 9 use of questionnaires 232
Fogel, D. 203, 3824 useful texts 226
Folmer, H. 257, 265 fuzzy spatial disaggregation 228
Foody, G.M. 233, 412
formal inferential frameworks 20917 Gahegan, M. 46, 52
Forsberg, L. 349 Gale, S. 225
Forster, B.C. 11 Gan, F.F. 347
Fortin, M.-J. 89, 90, 91, 93, 94, 95, 96, 97, 98, Gastner, M.T. 50
99, 101 Gatrell, A.C. 27, 193, 313, 343
Fotheringham, A.S. 2, 14, 21, 26, 27, 31, 34, 96, Gaussian process 3301
97, 108, 112, 116, 118, 119, 120, 208, 217, Gaussian random field 3301, 344, 3512
221, 232, 277, 401, 402, 406, 407, 421, Gaydos, L.J. 409
427, 466 Gearys c 92
f ( ) 211, 214 Gedeon, T.D. 227
fractals 11819, 4036 Gehlke, C.E. 111, 115
Frisen, M. 346, 348 Gelfand, A. 330
Fritz, S. 228, 232 Gelman, A. 10, 323, 324, 325, 326, 339
Fuhrmann, S. 46, 47 Geman, D. 324
Fujita, N. 482 Geman, S. 324
494 INDEX
generalization, feedforward neural networks technical barriers 356

38891 user requirements 29
generalized least squares, fitting models to geographically weighted regression (GWR) 31,
semi-variograms 165 119, 177, 217, 471
generalized Lotka-Volterra systems 4067 experiment: parameters spatially invariant
generative geographic science 41011 2489
geo-space 401 experiment: parameters spatially varying
geo-spatial data capture technologies 400 24950
GeoBUGS 341 geographical weighting models 252
geocomputation 27, 21820 mechanics of 2447
agent-based models (ABM) 41011 mixed models 252
artificial neural network (ANN) 41113 output 247
cellular automata 40810 prediction 252
chaotic behavior and strange attractors 406 research topics 2502
computational science (CS) 3979 simulation experiments 24750
data collection and storage 399, 400 software 250
description and basis 397 and spatial regression 252
distinctive features 401 statistical inference 252
dynamic systems and chaotic behavior 4068 usefulness 253
fractals 4036 variable selection 252
and geographical information systems geography
(GIS) 29 development of 4824
growth in computing power 399 and spatial analysis 481
hard and soft 403 geometric anisotropy 164
motivation 4001 George, R. 233
multi-agent systems (MAS) 410 Georgia
nature and complexity 399400 percent blacks according to Congressional
potential developments 41314 Districts 109
relationship to spatial analysis and GIS 4012 selected statistics for variable percent
spatial chaos 4078 black 110
theory of 4023 geospatial lifelines 358
geocomputational and non-geocomputational geostatistical data 321
techniques 403 geostatistics 656
geocomputational techniques 40313 automatic fitting of variogram models 174
GeoDa 312, 34, 48 background and description 15961
geodemographics 290 characterizing spatial variation 1625
Geographic Information Science (GISc) 5, 482 cokriging 168
Geographical Analysis Machine (GAM) 219, conditional simulation 168
31113 estimating experimental semi-variogram
Geographical Information Science (GISc) 483 1623
geographical information systems (GIS) fitting a semi-variogram model 1635
coupling with spatial analysis 304 future trends 178
definition 26 generative geographic science 41011
development of 256, 367 model-based 178
early development 278 modifiable areal units problem (MAUP) 119
and fuzzy set theory 230 non-stationary mean 1703
and geocomputation 29, 4012 non-stationary models 16874
influence of spatial analysis 28 non-stationary semi-variogram 1734
integration with spatial analysis 347 non-stationary semi-variograms and
as limiting spatial analysis 28 kriging 1745
relationship to spatial analysis 2930 objective of non-stationary modeling 177
seen as inadequate 25 polynomial trend models 170
software 47 random function (RF) model 15960
software development 4667 sampling random fields 1907
INDEX 495
sequential Gaussian simulation (SGS) 168 Goodchild, M.F. 6, 25, 26, 29, 30, 33, 34, 36,
spatial prediction and simulation 1658 116, 117, 256, 358, 404, 405, 406, 466, 467,
stochastic imaging 168 470, 471, 472, 482, 483
using secondary variables 1713 Google Earth 34, 4723
GeoSurveillance 352 Goovaerts, P. 93, 99, 159, 160, 167, 168, 173,
GeoTools 291 178, 191
GeoVISTA Studio 478, 50, 52 Gopal, S. 3878, 412
Geovisual Analytics 567 Goreaud, F. 96
geovisualization see also visual data exploration; Gottsegen, J. 116
visualization Gotway, C.A. 119, 173, 178, 302, 304, 305, 343
3D 502 gradient descent optimization 3823
bedrock-fractures-radon visualization as Graniero, P.A. 227, 229
a 2.5D surface. 51 graphic representation, of data 43
definition and description 435 graphical tests 75
developing tools 467 Green, J.L. 90
examples 4855 Green, M. 119
fixed row matrix of bivariate visualizations 53 Greenland, S. 360
GeoVISTA-based system displaying a Gress, B. 269
synthetic spatial dataset 51 grid computing environments software 413
mobile 57 grid-enabled computing 36
research topics 57 grids 8
software 478 Griffith, D.A. 113, 185, 255, 267
and spatial data exploration 437 Grtschel, M. 203
spatialization of a non-spatial phenomenon 55 group work 56
Visually discovering relationships between the Gstat 165
spatio-temporal attributes from the SOM Gumerman, G.J. 306
component planes visualization 54 Guneralp, B. 228
Getis, A. 21, 92, 96, 219, 343, 466, 471 Guptill, S.C. 6, 10
Ghosh, A. 422, 431 Gustafson, E.J. 100
Ghosh, S. 330 Guy, C.M. 422
Gibbons, S. 269 GWR software 31, 250
Gibbs sampler 324, 328 model editor 251
Gilley, O. 1468
Gimblett, H.R. 411 Haas, T.C. 173, 174
GISc see Geographical Information Haase, P. 96
Science (GISc) Hagen, A. 228
GIScience 26 Hagen-Zanker, A. 228
GISystems 26 Hgerstrand, T. 278, 358, 411, 472
Glennerster, H. 282, 286 Haggett, P. 356
global network auto K function 4524 Haining, R.P. 11, 13, 14, 15, 16, 17, 18, 19, 20,
street burglaries, Tokyo 454 32, 33, 91, 92, 93, 183, 184, 188, 190, 202,
global network cross K function 454, 455 203, 255
global network cross K function, comparison Hall, P. 260
between ordinary and Voronoi 457 Han, D. 358, 360
global network Voronoi cross K function 4567 Han, J. 41, 43, 71, 72
global spatial statistics 3434 see also local Hancock, R. 283, 286
spatial statistics; spatial statistics Hanna, A.S. 234
effects of extent 956 Hansen, W.A. 424
sampling issues 96 Hanson, S. 431
Godfrey, L. 263 Hare, M. 411
Goldberg, D.E. 203, 3824 Harvey, D. 473
Golledge, R.G. 483 Hastie, T. 389
Gong, P. 412 Hastings, 324
Good, P. 98 Hatch, M. 357
496 INDEX
Hausdorff, Felix 404 residential mobility in environmental health

Hawkins, D. 74, 346, 347 studies 357
Haykin, S. 383 Hunt, L. 113, 117
Healey, R. 413 Hunter, J.S. 347
Healy charts 3501 Hwang, S. 227
Healy, J.D. 3501 hypothesis testing 1520, 21011
Hedayat, A.S. 183
Heikkila, E.J. 229 Iachan, R. 190
Hengl, T. 202 Illingworth, V. 404
Henn, V. 232 impact analysis 422
Hepner, G.F. 412 imprecision 227
Heppenstall, A.J. 288 index formulations 117
Hertz, J. 376 individual fallacy 21
Herzfeld, U.C. 159 inferences, drawing 201
Hessian matrix 383 information, borrowing 13
heterogeneity 9, 19, 243, 410 information visualization 42
heterogeneous Poisson process 304, 334 Inselberg, A. 52
heteroskedastic and spatial autocorrelation integration, GIS and spatial analysis 347
consistent (HAC) estimator 269 intensity function 304
heteroskedasticity 9, 260, 261, 262 interaction process 13
heuristics, use in sampling optimization 203 interest measures 80
Heuvelink, G.B.M. 475 co-location approaches 80
hierarchical sampling 190 internal homogeneity 119
Higgs, G. 32 Internet 467
Hilbert curve, 404 intra-area heterogeneity 19
Hills, J. 282, 286 intra-unit heterogeneity 9
Hodgson, M.J. 116 intrinsic complexity 4001
Hoff, M.E. 372 Intrinsic Random Functions of Order
Holm, E. 288, 410 k kriging 171
Holmlund, P. 159 inverse distance 1523
Holt, D. 117, 119 correlation matrices 1523, 153
homogeneity 9 correlograms 153, 154
homoscedasticity, violation of assumption 19 weight matrices 152
Hooimeijer, P. 2845 irregularly located point data 13848
Horner, M.W. 116 coordinates for example data 138
Hornik, K. 378 correlation matrices 141, 143, 144
Horowitz, J.L. 268 correlation matrices for Pace and Gilley
Hossain, M. 338 model 148
hot spot analysis 70 correlations between points 141, 143, 145
hot spots 96, 457 correlograms for nearest neighbors 1456, 146
hot spots of traffic accidents on nonuniform road correlograms for Pace and Gilley model
network, Chiba, Japan 461 1489, 149
hot spots of traffic accidents on uniform road distance matrix 139
network, Chiba, Japan 460 example data 138
Huang, Y. 79, 80 nearest neighbor weight matrix 1402
Huang, Z. 279 Pace and Gilleys continuous version of nearest
Hudson, G. 173 neighbors 1468
Huff, D.L. 4245 standardized weight matrices 147
Huijbregts, C. 94, 159, 164, 165, 167 two nearest neighbors weight matrix 142
Human-Computer Interaction (HCI) 46 weight matrices 140, 142, 1445
human mobility weighting schemes 13940
historical perspective 3567 Isaaks, E.H. 8, 13, 160, 164, 167, 193
increase in lifetime distances traveled 356, isotropy 196
3567 iterated functional systems (IFS) 405
INDEX 497
iteration 400 Klir, J.G. 234

iterative algorithms, spatial outliers 77 knowledge construction 42
iterative proportional fitting (IPF) 279 Knox, G. 344, 359
Knox test 359
Jacobian 265, 267 Kohonen maps 412
Jacquez, G.M. 91, 93, 99, 303, 3589, 364, 365, Kohonen, T. 53
369, 370 Koussoulaku, A. 50
Jain, A. 64 Koutsopoulos, H.N. 232
Janelle, D.G. 472 Kraak, M.J. 43, 50, 56
Jang, J.-S.R. 233 Kreuseler, M. 50
Jankowski, P. 227 kriging 13, 161, 164, 178, 191
Jelinski, D.E. 117 and Bayesian spatial regression 3313
Jensen-Butler, C. 156 with external drift model (KED) 173
Jensen, C.S. 83 fuzzy 229
Jiang, H. 228, 235 Intrinsic Random Functions of Order k 171
Jin, J. 288 and non stationary semi-variograms 1745
Johnson, G.A. 332 optimal designs to minimize variance 1936
Johnston, R. 118 ordinary 1658
join-less approach, spatial data mining 801 shortcomings of use of variance 200, 202
joint-count statistics 8, 92 simple 1658
Joshi, H. 2834 simple kriging with locally varying means
Journel, A.G. 94, 159, 162, 164, 165, 167, (SKlm) 1712, 173
168, 202 with a trend model (KT) 1701
Judd, K.L. 411 with a trend model (KT) derived map of
Justice, C.O. 117 precipitation 172
variance 167, 195, 199200, 201
K-fuzzy 228, 229 weighting variance 200, 203
k-neighboring class sets 7980
Kuh, D. 360
Kabos, S. 96
Kulldorff, M. 11, 312, 3489, 358
Kahraman, C. 228, 234
Kuo, R.J. 228, 231, 233
Kainz, W. 233
Kurkov, v. 377
kappa measure 228
Kurtzweil, R. 399, 413
Kaspar, B. 357
Kwan, M.P. 50
Katz, A. 228
Kyriakidis, P.C. 178
Kaufmann, P.J. 430
Kauth, R.J. 332
Keim, D.A. 43, 44, 46 L-systems 4056
Keister, L.A. 283 Lacayo, M. 47
Keitt, T.H. 100 Lagrange multiplier (LM) test 167, 262, 2634
Kelejian, H.H. 260, 262, 265, 268, 269 Lagrangian relaxation 432
Kelley, K. 399 Lajaunie, C. 204
Kelsall, J. 316 Lakshmanan, T.R. 424
kernel density 343 Lam, N.S.-N. 117, 119, 404, 405
kernel estimation 31315, 31617 Land, K. 267
kernels 2456 Langford, M. 119
King, G. 13, 118 Langholz, B. 35960
King, L.J. 190 Langlois, A. 409
King, M. 262 Langran, G. 36, 470
Kingston, R. 292 lasso 387
Klaassen, L. 255 lattice data 321
Klatzky, R.L. 482 lattices 65
Kleinman, K. 338, 344, 349, 352 Laudan, L. 476
Klepeis, 357 Law, J. 20
Klinkenberg, B. 406 laws, fundamental 471
498 INDEX
Lawson, A.B. 306, 316, 338, 339, 340, 344, local statistics 343, 344, 471
349, 352 local variance 177
learning data 68 locally equivalent alternatives 263
Lee, J. 108 location errors 1011
Lee, L.-F. 265, 266, 268 locations 76
Lee, P. 323 actual and predicted 83
Lee, P.M. 214 Lodwick, W.A. 227
Lee, S. 268 logistic models 365
Leenders, R.T.A.I. 257 Long, L. 357
Legendre, P. 95, 100 long-term mobility 357
Leong, T. 317 Longley, P.A. 6, 25, 27, 36, 277, 402, 404, 405,
LeSage, J. 218, 267 430, 466
Lessof, C. 282, 285 Lorenz attractor 4067
Leszczyc, P. 432 Louis, T. 323, 325
Leung, Y. 225, 384, 391, 400, 402 Lovsz, L. 203
Lewis, T. 73, 177 Lucas, J.M. 3467
Li, D. 263 Lundberg, C.G. 225
Li, S. 69
Li, X. 409 MA see moving average (MA)
Lichstein, J.W. 91 MacEachren, A.M. 43, 44, 52, 56
Liew, A.W.C. 233 Machin, S. 269
LIFEMOD 285, 286 Mackay, D.S. 230, 234, 236
light scattering 11 MacKinnon, J.G. 263
likelihood function 82 MacMillan, R.A. 227
likelihood ratio 262 macros, programming 34
likelihood ratio test statistic 263 Madow, W.G. 190
limit models Maes, P. 410
correlation matrices 1501, 151 Magnus, J. 266
correlograms 1501, 151 Makarovic, B. 197, 203
weight matrices 14850 Maki, N. 445
Lin, J.-J. 232, 234 Malczewski, J. 475
Lin, X. 268 Mamdani-type inference 235
linear errors 11 Mandelbroit, Benoit 404
linear regression 2434, 256 Manly, B. 211
linear separability 52 Mantel, N. 344
Lipsey, R.G. 427 map comparison 2278
literacy, spatial 4723, 477 mapping, modifiable areal units problem
Liu, K. 404 (MAUP) 112
Liu, Z. 233 maps, hand drawn 229
Lloyd, C.D. 92, 159, 165, 173, 177 Marble, D. 28, 36
local Getis statistic 96 Marceau, D.J. 410
local indicators of spatial autocorrelation Mardia, K. 259
(LISA) 967, 471 mark connection functions 101
local interactions 411 Mark, D. 358, 404, 405
local-ness 1778 marked spatial point process, spatial clustering
local network auto K function 4545 72, 72
local network Voronoi cross K function 455, 456 market basket datasets 78
local Ord statistic 96 Markov Chain Monte Carlo method (MCMC)
local range parameter 177 306, 321, 324, 325, 3279
local sill 177 Gibbs updates 328
local spatial statistics 967 see also global spatial Metropolis and Metropolis-Hastings
statistics; spatial statistics algorithms 327
monitoring many 3512 Metropolis and Metropolis-Hastings
monitoring single 3501 updates 3278
INDEX 499
Metropolis-Hastings versus Gibbs comprehensive urban system models 2878

algorithms 3289 CORSIM 283
special methods 329 credibility 287
useful texts 327 database attributes for linkage to remote
Markov property 1617 sensing 289
Markov random field-based Bayesian classifiers, and decision-making 2912
spatial data mining 69 definition and description 27881
Marshall, R. 259 dynamic models 278, 283
Martin, D.J. 7, 28, 29, 32, 36, 402 DYNASIM 283
Martin, R. 267 DYNASIM2 283
Matrn, B. 191 household activity patterns 2846
MATLAB 341 importance 2778
Matsakis, P. 229 improving model calibration 2923
MAUP see modifiable areal units problem IPF based approach 2823
(MAUP) labor and housing markets 2845
maximum likelihood (ML) 82 LIFEMOD 285, 286
fitting models to semi-variograms 165 Microsimulation Modelling and
maximum likelihood (ML) based tests, spatial Predictive Policy Analysis System
regression 262 (Micro-MaPPAS) 291
maximum likelihood (ML) estimation 323 NEDYMAS (Netherlands Dynamic
McBratney, A.B. 163, 165, 193, 196, 225, 232 Micro-Analytic Simulation Model) 286
McCloy, K.R. 177 new applications 292
McCormick, B.H. 42 output validation 293
McCulloch, W.S. 372 PENSIM 283
McDonnell, R.A. 159, 230, 474 procedure for allocation of economic activity
McHarg, I.L. 402 status 280
McLafferty, 314 and remote sensing 28991
McNamee, R. 357 research agenda 28794
mean-variance dependence 910 retail 2856
memberships reweighting 27980
combining 2345 SIMBRITAIN 287
fuzzy set theory 2304 SimLeeds 291
Mnard, A. 410 social policy change 2867
Meng, L. 57 static models 278
Mennis, J. 119 SYNTHESIS 2823
meso-scales 277 TAX 282
methodologies, development of 212 tax and income modeling 2824
methodology, choice 14 transport and land-use 285
Metroplois, N. 211 Microsimulation Modelling and Predictive Policy
Metropolis algorithm 216, 332 Analysis System (Micro-MaPPAS) 291
Metropolis and Metropolis-Hastings Microsoft VBA 34, 36
algorithms 327 middleware 413
updates 3278 Miller, H.J. 41, 43, 400, 401
Metropolis-Hastings algorithm 324 Miller, H.Z. 28, 32
Michalewicz, Z. 203 Miller, R.E. 116
micro-models 277 Min, H. 432
MicroMaPPAS 291 Mineter, M.J. 413
microsimulation 411 Minimization of the Mean of the Shortest
academic studies 282 Distances 1945
advantages 2934 Minkowski metric 358
and agent-based models 2889 misaligned data problem (MIDP) 338
applications 2817 missing data 1112
attributes of individual micro-unit 284 Mitton, L. 286
combining with remote sensing 290 mixed integer programming 432
500 INDEX
mixing process 13 Moustakides, G.V. 346

mobile communications 57 moving average (MA) 16
mobile geovisualization, and location-based Mozolin, M. 412
visual exploration 57 Mrvar, A. 54, 55
model fitting 1520 Muller, W. 183
modifiable areal units problem (MAUP) 20, 95, multi-agent systems (MAS) 410
338 multi-criteria decision making 228
from conceptualization to problem solving multiform bivariate matrix 52
11618 MULTILOC 432
configurations applied to variables 114 Multimap 34
description 1068 multiple-class cross-entropy error function 381
discovery and impact assessment 11516 multiple hypothesis testing 219
effect of spatial aggregation mechanism 11415 Multiplicative Competitive Interaction Models
effects accounting framework 120 (MCI) 422, 425, 432
fractals 11819 Munroe, S. 419, 420
fundamental impacts 10812 Murray, A. 116
Geographical Systems 117 Myers, D.E. 192
geostatistics 118, 119
looking for solutions 117 n adjustment 19
mapping 112 Nadel, L. 484
optimal zoning systems 118 Nagarwalla, N. 349, 358
origin of term 1056 naive geography 476
potential solutions 11820 Nakanishi, M. 422, 425
processes 11215 Nakaya, T. 285
research history 11518 National Academy, USA 293
scale dependency 117 National Center for Geographic Information and
scale effect 11213, 117 Analysis (NCGIA) 116, 466
scale-insensitive tools 118 nature, and complexity 399400
selected statistics for hypothetical nearest neighbor analysis, clustering 3078
configurations 115 nearest neighbor metrics 3645
weighting methods 118 NEDYMAS (Netherlands Dynamic
zoning effect 113, 11315 Micro-Analytic Simulation Model) 286
Moellering, H. 116 negative exponential distance decay function 259
Mller, J. 306 negative exponential model 1535
Mollie, A. 13 correlation matrices 155
Moloney, K.A. 93, 96 correlograms 154, 156
Monte Carlo approach 211, 212 weight matrices 154
Monte Carlo hypothesis testing 312 Neighbourhood Statistics Service 34
Monte Carlo simulation 78, 84, 98, 216, 252, Nelissen, J.H.M. 282, 286
279, 3045, 308, 310 Nelson, L.S. 345
Montello, D.R. 483 nested designs 193
Moody, J.E. 38891 nested sampling 190
Moon, F.C. 405 network K function methods 4527
Moore, Gordon 399 network spatial methods 445
Moores Law of Integrated Circuits 398, 3989, network spatial phenomena 4435
413 distribution of parking lots 444
Moran, P.A. 261 sites of traffic accidents 444
Moran scatterplots 756, 76 network training 37982
Morans I 67, 92, 96, 190, 2612, 263, 343 network Voronoi diagrams 44752
Morfield, P. 360 networks, spatial analysis on
Morimoto, Y. 7980 directed network Voronoi diagrams 44850
Morris, A. 227, 230 GIS-based tools 461
Morrison, J.L. 6, 10 global network auto K function 4524
Mosaic 467 global network cross K function 454
INDEX 501
global network Voronoi cross K function 4567 Okabe, A. 117, 445, 446, 447, 448, 450, 452, 455,
local network auto K function 4545 457, 461
local network Voronoi cross K function 455 Okano, K. 450
network K function methods 4527 OKeefe, J. 484
network kernel method 45761 OKelly, M.E. 421, 4278, 431, 432
network Voronoi diagrams 44752 Olea, R.A. 196, 200, 203
ordinary network Voronoi diagram 4478 OLeary, E.S. 357
types of methods 462 Oliver, M.A. 159, 162, 164, 168, 192
uniform network transformation 4467 OLoughlin, J. 118
weighted network Voronoi diagrams 4501 Olwell, D.H. 346, 347
networks, uniform and nonuniform 447 Open GIS (OGIS) consortium 645
neural networks see also feedforward neural open source software 34
networks Openshaw, S. 30, 32, 105, 106, 108, 112, 115,
and Bayesian approaches 391 116, 118, 209, 219, 278, 290, 311, 312, 313,
limitations 391 398, 401, 402, 466
origin and use of term 372 optimal bandwidth selection 247
potential developments 391 Orcutt, G.H. 278, 283
neutral models 99 Ord, J.K. 8, 17, 21, 91, 92, 93, 96, 155, 219, 255,
Newell, J. 301, 311, 344, 358 261, 262, 266, 267, 307, 343, 471
Newey, W.K. 268 ordinary kriging (OK)
Newman, M.E.J. 50, 54 block 167
Nielsen, J. 46 derived map of precipitation 169
Nijkamp, P. 407, 412 predictions 165
Nikitenko, D. 229 punctual 167
Nocedal, J. 383 weights 165, 167
nomothesis 476 ordinary least squares (OLS) 17, 261, 265
non-linear dynamical systems 406 fitting models to semi-variograms 165
non-parametric statistics 211 ordinary network Voronoi diagram 4478
OSullivan, D. 408
non-spatial processes 14
outcomes, manipulating 475
non-spherical error covariance matrix 259
outliers 734
non-stationarity 2, 16874
dataset for detection 75
two-dimensional 202
output patterns, spatial data mining 6781
non-stationary mean 1703
overdispersion, in generalized linear
non-stationary mean parameter 160
modeling 1920
nonfuzzy approach, risk of ignoring
Overton, W.S. 183
information 227
nonuniform networks 446
p-value 210
normal distribution 220
Paas, G. 289
normality, assumed 93
Pace and Gilleys continuous version of nearest
Nuckols, J.R. 357
neighbors 1468
nugget effect 167, 193, 196, 331 correlation matrices 148
nuisance parameter approach 268 correlograms 1489, 149
null hypothesis 210 Pace, K. 1468
Nyberg, F. 357 Pace, R.K. 267
Paelinck, J. 255
Oberthur, T. 227, 228, 235 Paez, D. 228
object view 67 Page, E.S. 346, 350
observational data 6 Pang, M.Y.C. 408
Occams Razor 473 parallel coordinates plot 50
Odeh, I.O.A. 225, 227, 232 parallelism 399400
Oden, N.L. 94 parameters 812, 211, 226
Odland, J. 113 parametric significance testing, implications of
Office for National Statistics 29 spatial autocorrelation 979
502 INDEX
Pardo-Igzquiza, E. 165, 173, 174, 178, 203 point referenced data 321
Parker, D.C. 409, 411 modeling 331
partial-join based approach, spatial data point spread function 11
mining 80 points on a plane, randomly and non-randomly
participation index interest measure 84 distributed 446
Patil, G.P. 313 Poisson cusum 3467
Patil, P. 260 Poisson distribution 10, 19, 93
pattern recognition 42 Poisson-Poisson 93
patterns 43, 89, 299 Poisson processes, homogeneous 71
Pattie, C. 118 policy evaluation, area-based 2778
Peano curve 404 polygons, representing spatial data 810
Pearson, D.M. 97 polynomial trend models 170
Pearsons product moment correlation popularization, of spatial analysis 34
coefficient (r) 15 population, defining 20
Pebesma, E.J. 165 population inferences 201
Peitgen, H.-O. 405, 406 population microdata example 281
Plissier, R. 96 population size, and comparability 10
Penninga, F. 84 populations 2201
PENSIM 283 Posterior distribution for = 12. 216
pensions, estimation 2834 posterior predictive loss approach 330
Penttinen, A. 101 Powell, S. 306
Perle, E.D. 113, 116 Power, C. 228, 235
permutation 98 Pownall, C.E. 2789, 284
permutation test 21112 PPGIS 291
Pesaran, M.H. 260 Preece, J. 46
Peschel, J.M. 233 Prendergast, G. 423
Peterson, C. 413 Press, W.H. 383
Peterson, G.D. 92 prevalence measures 80
Pettitt, A.N. 193 Price, P.N. 10
Peuquet, D. 36, 470 principal coordinates of neighbor matrices
Pham, D.L. 233 (PCNM) 100
Phillips, J.D. 405, 407, 408 process inference 220
Phipps, M. 409 process models 218
Piachaud, D. 286 process scripts 4689
Piccioni, M. 203 processes, stochastic 20
pilot studies 95 progressive sampling 197
Pinkse, J. 269 properties, fundamental 78
Pipkin, J.S. 225 Propper, C. 282, 286
Pitts, W. 372 Prucha, I.R. 260, 262, 265, 268, 269
pixels 7, 9, 106 public awareness 34
place cells 484 Public Participation GIS (PPGIS) 291
Plaisant, C. 41, 57 Pyle, I. 404
planar kernel functions 457
planar spatial methods 4435 Q-statistics 355, 3589, 361, 3636, 370
Plante, M. 96 quadrats 71
Plog, S. 306 qualitative data 100
Plumlee, M. 50 Quattrochi, D.A. 117, 119
Pocock, S. 259 queen contiguity 12830
point data correlation matrices 133, 1356, 136
interpolation of surfaces 2267 correlogram 137
irregularly located. see irregularly located neighbors in 129, 136
point data subset of unstandardized weight
regular lattice 138 matrix 130
point process 65 Quenouille, M.H. 190
INDEX 503
questionnaires, fuzzy memberships 232 calculations 42730

Quinlan, J. 64 chain combinations 4378
choice-based data 4278
R 324, 341 combinatoric issues 4335
radial bias function networks 391 computable location models 4356
Raffy, M. 177 consumer choice 424
Ramstein, G. 177 consumer demand and behavior 4201
random effects models 20 data collection and organization 4335
random function (RF) model 15961, 177, 179 data issues 4278
random labeling 305 demography of the trade area 426
random labeling simulations 316 flexible sites 435
random sampling 185 gravity models 428.429, 435, 438
random variable, random function (RF) model heuristics and short cuts 435
15960 impact assessment 42930
randomization procedure 99 interaction matrix 434
randomization significance testing, implications location allocation models 4306
of spatial autocorrelation 979 macro spatial analysis 427
randomization tests 98 market effectiveness and penetration 4289
Rao, L. 32, 118 models 4214, 432
raster model 105, 106 performance assessment of stores 429
rates, spatial variation 10 primary trade area 4256
ratio shortest path distance to Euclidean distance, required sites 435
Kokuryo, Tokyo 445 retail location models and competing
ratios, standardized 10 destinations 4267
Raubertas, R.F. 344, 348 retail location models and spatial interaction
real-time learning 384 4323
recursion 400 shopping centers 437
redistricting 1068, 107 SI based location model 438
Redmond, G. 286 spatial interaction 4245
Rees, P. 217 spatial interaction modeling 4267
reference feature-centric model 79 spatial retail location 41920
Reggiani, A. 407 strategic planning examples 4378
region classification problem 2256 temporal and seasonal variations in trade
regression models, fitting 17 areas 430
regular lattice areas 126 vertex substitution 436
regular lattice data 142 Reuscher, T. 357
regular lattice point data 138 reweighting 279
regular systematic sampling 185 Rey, S.J. 261, 268
regularization 3867 Reynolds, H. 116
regularized error function 387 Reynolds, P. 357
Reismann, M. 376, 389 rezoning 106
Remmel, T.K. 101 RF model, geostatistics see random function
remote sensing 106, 117, 177, 225, 28991 (RF) model
combining with microsimulation 290 Ribeiro, P.J. 178
database attributes for linkage to Richardson, Lewis 404
microsimulation 289 Richardson, S. 15, 21
replicability 474 Richmond, A. 163
representation, of surface 7 Ridwan, M. 232
resampling 98 Ripley, B.D. 7, 93, 183, 255, 308, 343, 376,
research needs, spatial data mining 825 389, 452
residential histories, detection of clustering 3635 Ripleys K-function 67, 71, 72, 84, 92, 93,
residential mobility 357 30810, 343, 358
retail locational analysis risk 31518, 35960
analysis with retail trade area models 4247 Rizzo, D.M. 412
504 INDEX
Robert, C. 323, 324 Schmidt, J. 118

Roberts, S.W. 347, 348 Schmoyer, R. 261
Robinove, C.J. 225 school rezoning 106
Robins G-estimation procedure 360 Schreckenberg, M. 409
Robins, J. 360 Schweitzer, D.M. 405
Robinson, A.H. 112, 115, 118 science, beyond traditional practice 4734
Robinson, P.M. 260, 262, 268, 269 scientific visualization 42
Robinson, V.B. 225, 227, 229, 230, 231, 234 Scuderi, L. 413
Rogers, J.P. 84 SDM see spatial data matrix (SDM)
Rogerson, P.A. 26, 200, 203, 347, 348, 351, 466 search windows 94
rook contiguity 1267 Seber, G.A.F. 197
correlation matrices 132, 135, 135 secondary variables 1713
correlogram 137 See, L. 228, 290
neighbors in 127, 135 seed dispersal 901
subset of unstandardized weight matrix 128 Seifu, Y. 262
Rosenblatt, F. 372 Seldin, M. 421
Rossi, G. 347 Self-Organizing Maps (SOM) 53
Rothman, K. 360 Semantic Import (SI) 230
row normalizing 1256 semi-supervised learning, spatial data
Rumelhart, 3823, 384 mining 723
Russo, D. 193 semi-variogram cloud 162
Rust, R.T. 421 semi-variograms 8
automatic fitting 174
Saaty, T.L. 475 bounded 164
Saatys Analytic Hierarchy Process 475 bounded model 163
Sabel, C.E. 358 directional, of precipitation 166
Saccucci, M.S. 347 estimating experimental 1623
SAGE (Spatial Analysis in a GIS estimation and model fitting 160
Environment) 323 exponential model 1634
sample points 7 fitting a model 1635
samples, representative 183 Gaussian model 163, 164
sampling designs, efficiency 7 non-stationary 1734
sampling, fuzzy set theory 227 non-stationary, and kriging 1745
sampling issues 96, 184 non-stationary models 16874
sampling schemes 1867 omnidirectional, of precipitation 166
Sampson, P.D. 173 power model 164
Santos, J. 227 precipitation: direction with smallest
SAR see simultaneous spatial variance 171
autoregressive (SAR) precipitation: raw data and residuals from
Satoh, T. 446, 447, 448, 452, 457 polynomial trend 170
SaTScan 31213 spherical model 163
Sawicki, D.S. 116 sensors 76
scale dependency, modifiable areal units sequential Gaussian simulation (SGS) 168
problem (MAUP) 117 server GIS 4678
scale effect 110 Shekhar, S. 66, 70, 72, 77, 79, 80
modifiable areal units problem (MAUP) Shen, G. 405
11213, 117, 119 Shewhart charts 3456
scan statistics 31113, 344 Shi, W. 408
scan test, Bernoulli form 358 Shiryaev, A.N. 348
scatterplots 75, 85 Shiryaev-Roberts method 348
scenario planning model 432 Shmueli, G. 338
Schabenberger, O. 173 Shneiderman, B. 43
Schaefer, J.A. 233 Siegel, S. 211
Scheiner, J. 357 significance 210
INDEX 505
Sikdar, P.K. 116 SAGE (Spatial Analysis in a GIS

Silipo, R. 53 Environment) 323
Silverman, B.W. 457 SaTScan 31213
Silverman, D. 369 STACAS (SpaTial AutoCorrelation and
SIMBRITAIN 287 ASsociation analysis) 34
Similarity Relation (SR) 230 statistical inference 2212
SimLeeds 291 TRANUS GIS module 34
simple random sampling 185 WinBUGS 324, 3401
simulated annealing 203 soil-land inference model (SoLIM) 227
simulation models, fuzzy set theory 22930 Sokal, R.R. 94, 96, 97
simultaneous spatial autoregressive (SAR) 16, 17 Sone, A. 267
Singer, B.H. 350 Sonesson, C. 346, 348
Sinha, B.K. 183, 358 Sosin, D. 338
Sipser, M. 403 space
Skellam, J.G. 343 Euclidean conception 28
Skeppstrm, K. 50 partitioning 1067
Skubic, M. 229 space-time interaction 344, 359
Skupin, A. 47, 53 space-time paths 358
small area data 281 spacefills 52
Smiley, F.E. 306 patenkov, O. 54
Smirnov, O. 267 spatial accuracy 82
spatial aggregation 89, 112, 11415
Smith, D. 292
spatial analysis
Smith, J. 413
accomplishments of fuzzy set theory 22630
Smithson, M. 236
coupling with GIS 304
smooth spatial effects (SSE) estimator 269
defining 1, 267
smoothing, of variation 9
development 23
social policy 2778, 2867
as distinct from spatial data analysis 27
ARC/INFO 469
early applications 343
Sderberg, B. 413
early development 278
software
and geocomputation 4012
Accession 32 ideal 29
ArcGIS 33, 36, 470 importance of 477, 482
ArcInfo 33 influence of GIS 28
AZM 32 integration with GIS 347
close-coupled component object model main types 2
(COM) 334 new directions 4845
CommonGIS 48 opportunities and challenges 4656
Excel 470 past and present 4813
GeoBUGS 341 popularization of 34
GeoDa 312, 34, 48 relationship to geographical information
geographical information systems (GIS) 47 systems (GIS) 2930
geographically weighted regression role of 4834
(GWR) 31, 250 scope 484
GeoSurveillance 352 technical barriers 356
GeoTools 291 spatial attributes 75
GeoVISTA Studio 478 spatial autocorrelation 2, 8, 76, 81, 89, 117, 190,
geovisualization 478 218, 235, 257, 259, 304
Gstat 165 correction procedures for parametric tests 98
interchangeable components 46970 effect on tests of significance 15
MATLAB 341 effects of extent 956
Microsoft VBA 34 effects of ignoring 261
open source 34 implications for parametric and randomization
R 324, 341 significance testing 979
506 INDEX
and knowledge discovery techniques 66 computational process 812

modeling 69 correlation-based queries 81
quantification 67 data input 645
tests 2612 discovering co-location patterns 79
spatial autoregression model (SAR) 69, 812, distance-based approach 7980
218, 258, 25960, 263 event-centric model 80
classification of algorithms 82 improving computational efficiency 84
spatial behavior, fuzziness 225 interest measures of patterns 83
spatial clustering join-less approach 801
clustering point process 712 Markov random field-based Bayesian
complete spatial randomness, cluster, and classifiers 69
decluster 71, 71 output patterns 678
marked spatial point process 72 partial-join based approach 80
spatial data mining 702 preprocessing spatial data 85
spatial co-location rules 7781 research needs 825
spatial covariance 259 semi-supervised learning 723
spatial cross-regressive models 257 spatial autoregression model (SAR) 69
spatial data spatial clustering 702
characteristics 243 spatial co-location rules 7781
data-related problems 19 spatial indexing approach 81
distinctive properties 21 spatial interest measures 823
examples 1 spatial outliers 737
forms 125 spatio-temporal data mining 834
fundamental properties 78 specific difficulties 634
implications of data properties for statistical foundation 657
analysis 1220 statistical interpretation models for spatial
interoperability 41 patterns 84
observational 6 transaction-based approaches 79, 80
particular problems 217 visualization of spatial relationships 85
preprocessing spatial data 85 spatial data properties, impact at stages of
process models 218 analysis 21
properties 616 spatial data sets 41
properties due to chosen representation 810 spatial dependency 18, 19, 90, 91, 244, 257, 268,
properties due to measurement process 10 401, 471
properties introducing complications 1320 spatial econometrics 255
relationships to non-spatial 64 spatial error autocorrelation 259, 268
using properties to tackle problems 1213 spatial heterogeneity 119, 401, 471
visualizing 1415 spatial indexing approach, spatial data mining 81
spatial data analysis 27 spatial intensity, estimating 31315
whirling vortex 300 spatial interest measures 823
spatial data manipulation, and analysis 26 spatial interpolation 119
spatial data matrix (SDM) 5, 6, 10 spatial join 77
processes involved in construction 12 spatial literacy 4723, 477
spatial data mining spatial moving average (SMA) process 260, 263
application domain 678 spatial multiplier 258
association rule-based approaches 79 spatial non-stationarity 218, 243, 244
Attribute values in space with independent spatial outliers
identical distribution and spatial examples 74
autocorrelation. 66 iterative algorithms 77
clustering-based map overlay approach 79 least square regression 76
clustering marked spatial data point scatterplots and spatial statistic 77
processes 72 spatial data mining 737
co-location rule approaches 7881 spatial join 77
comparison with classical data mining 82 tests for 747
INDEX 507
visual representations 85 nested designs 193

spatial parameters, estimation 84 nested sampling 190
spatial patterns 84, 8992, 90 nugget effect 196
spatial processes, types 1314 optimal designs to minimize kriging
spatial random effects models 20 variance 1936
spatial reaction function 257 optimal geometric designs for covariogram
spatial regression estimation 1913
conditional approach 264 progressive sampling 197
decision rule 265 of random fields using geostatistics 1907
description 255 random sampling 185
early interest 255 sample size and configuration 1923
estimation 2657 sampling reduction 1967
higher order models 260 schemes 1867
instrumental variables/methods of moments second-phase sampling 198202, 203
estimation 2678 secondary data 203
joint tests 264 shortcomings of use of kriging variance
Lagrange multiplier (LM) test 263 200, 202
likelihood ratio test statistic 263 simple random sampling 185
maximum likelihood (ML) based tests 262 simulated annealing 203
maximum likelihood (ML) estimation 2657 spatio-temporal issues 204
mixed regressive, spatial autoregressive stratified random sampling 188, 190
model 257 stratified sampling 188
possible applications 256 systematic random sampling 1878,
research developments 26970 190, 192
semi-parametric methods 268 systematic sampling 185, 1878, 192
spatial autocorrelation tests 2612 systematic unaligned sampling 192
spatial autoregression model (SAR) 258 uniform random sampling 185
spatial error models 25960 use of heuristics in optimization 203
spatial lag models 2578, 265, 2678 weighting kriging variance 203
specification search 2645 spatial scales 99100
specification tests 2605 spatial scan statistic 31213, 349
specifying model 25660 spatial stationarity 92, 934, 96
useful texts 256 spatial statistics 767 see also global spatial
spatial relationships 667, 85 statistics; local spatial statistics
spatial sampling characteristics 93
adaptive sampling 1978 deciding which to use 100
anisotropy 193, 196 overview 925
choice of covariogram fitting model 196 recent developments 97
cluster adaptive sampling 199 similarity functions and significance testing
clustered sampling 190 procedures 93
configurations 18490 spatial structures 91, 184
current research directions 2024 spatial surveillance
Depending Areal Units Sequential Technique average run length (ARL) 34951
(DUST) 1945 cusums 348
description and challenges 1834 distance-based methods 349
distance-based criteria 194 generalized linear mixed model 349
efficiency of designs 18890 growth in interest 352
hierarchical sampling 190 Healy charts 3501
incorporating multivariate information 2023 methods development 3489
increasing density 198 model-based 349
isotropy 196 monitoring many local statistics 3512
Minimization of the Mean of the Shortest monitoring single local statistic 3501
Distances 1945 spatial issues 34952
multi-phase sampling 203 spatial scan statistic 349
508 INDEX
statistical process control 348 importance of geography 21720

weighting 348 inferential framework 2089
spatial variation 16, 1625 inferential questions 21415
spatial weighting function 245 inferential task 208
fixed 246 population and process 2201
spatialization 534 process model 208, 209
spatially adaptive weighting function 246 relationship between and 210
spatially referenced data, useful texts 321 software 2212
spatio-temporal data mining 834 statistical process control
spatio-temporal dynamics 36 spatial surveillance 348
spatio-temporal models 332 temporal sequences of observations 3458
spatio-temporal phenomena 472 Staufer, P. 381
spatio-temporal sequential patterns, Steadman, P.J. 34
algorithm 84 Steel, D.G. 119
spatio-temporal systems, complexity 400 Stefanakis, E. 232
spherical model 1601 Stehman, S.V. 183
range for moving window, showing selected Stein, A. 190, 193, 203
variograms with automatically fitted Stellman, J.M. 357
models 176 Stillwell, J.C.H. 277
structured component for a moving stochastic imaging 168
window 175 stochastic processes 20, 65
Spiegelhalter, D.J. 329 STORELOC 432
Spiekermann, K. 285 Stoyan, D. 101
Spottiswoode, W. 2 Strahler, A.H. 225
Spowage, M. 292 strange attractors 4067
squared difference statistic 8 stratified random sampling 188, 190
Srikant, R. 64, 78, 84 stratified sampling 188
Srivastava, R.M. 8, 13, 160, 164, 167, 193 street burglaries, Tokyo 453
STACAS (SpaTial AutoCorrelation and strength, borrowing 13
ASsociation analysis) software 34 structured heterogeneity 337
stakeholders 4745 subroutine libraries 46970
standardized mortality ratio 9 Sui, D. 118, 467, 471
standardized ratios 10 superpopulations 20
static world-view 3589 surface, representation 7
stationarity 202 surfaces 50
stationary mean parameter 160 surveillance 344
stationary spatial covariance function 160 Sutherland, Holly 286, 289
stationary variance 160 SYNTHESIS 2823
statistical data analysis, fuzzy memberships 234 synthetic spatial information system 279
statistical inference systematic random sampling 185, 1878,
Akaike Information Criteria (AIC) 221 190, 192
approaches to 3212 systematic unaligned sampling 192
basic concepts 2089
classical inference 208, 20913 t-statistic 210
clusters 303 Tagashira, N. 117
computational approach 208 Taheri, S.M. 236
estimating parameters 211 Taillie, C. 313
exploratory data analysis (EDA) 221 Takagi-Sugeno type inference 235
formal inferential frameworks 20917 Takeyama, M. 408, 469
geocomputation 21820 Tango, T. 358, 365
geographically weighted regression Tangos statistic 348
(GWR) 252 Tate, N.I. 118, 119, 163
hypothesis testing 21011 TAX 282
importance of 207 tax and income modeling 2824
INDEX 509
Taylor, G.H. 84, 105, 112, 115, 118, 233 Tukey, J.W. 42
Taylor, M.F. 279 Turnbull, B.W. 311, 358
Taylor, P.J. 228, 472 Turnbulls test 358
team working 474 Turton, I. 413
temporal change, detection 344 two-sample t-test 210, 21213, 213
temporal surveillance type-2 fuzzy sets 230
average run length (ARL) 3457 type I errors 19, 210
cumulative sum (CUSUM) charts 3456 type II errors 210
cusums for exponential data 347
exponentially weighted moving average Ulam, S. 211
(EWMA) chart 3478 uncertainty 226, 4756
other methods 3478 unconstrained transitions, cellular automata
Poisson cusum 3467 (CA) 409
Shewhart charts 345 uncorrelated heterogeneity 337
Shiryaev-Roberts method 348 undercounting 11
Teng, C.H. 233 Ungerer, M.J. 25, 30, 33, 36, 470
Tesfatsion, L. 411 uniform network transformation 445, 4467
tessellation 406 uniform random sampling 185
test statistic 210 Unwin, A. 42
testing and interval estimation 323 Unwin, D.J. 19, 42, 50
tests, for spatial outliers 75 Upton, 255
tests of significance, effects of Urban, D.L. 100
autocorrelation 15 usability, geovisualization tools 467
Thill, J.-C. 227, 453, 475 user modifiable areal unit problem 30
Thisse, J.-F. 401 Uttal, D.H. 482
Thomas, G.S. 332
Thompson, S.K. 183, 197 validation, microsimulation outputs 293
Tibshirani, R. 387, 389 value estimation, parameters 812
Tiefelsdorf, M. 262 van Deursen, W.P.A. 469
time and dynamics 4701 Van Groenigen, J.W. 191, 192, 193, 194, 196,
time-space stationarity, cellular automata 200, 203
(CA) 409 variability, loss of detail 7
time, uni-directional flow 8 variables, estimating spatial structure 162
Tobler, W. 48, 49, 66, 116, 118, 119, 304, 408, variance-covariance matrix 112, 119, 130
422, 471 variance inflation factor 19
Toblers First Law 7, 66, 208, 304, 422, 471 variogram clouds 75, 76, 85
Toblers migration model 118 variograms, locality of 1778
Tobn, C. 57 Vatsavai, T.E.B. 72
Tomaszewski, B. 56 Verhulst, P.F. 361
Tomlin, C.D. 469 Verkuilen, J. 230
Tomlinson, R.F. 27 Verstraete, J. 230, 235
Torrens, P. 409, 410, 411 Vesanto, J. 53
Torres, R. 226 Virtual Decision-Making Environments
Townshend, J.R.G. 117 (VDME) 292
Train, K.E. 232 Visual Analytics 567
training data 68, 72, 234 visual data exploration 43, 46 see also
Tranmer, M. 119 geovisualization; visualization
transaction-based approaches, spatial data visual exploratory data analysis 41
mining 80 Visual Information Seeking Mantra 43
transformation, assignment by 2324 visualization 6 see also geovisualization; visual
TRANUS GIS module software 34 data exploration
travel mobility 357 description 42
trend surface model fit 19 of spatial data 1415
triangular irregular networks (TIN) 230 spatial relationships 85
510 INDEX
three dimensional space of 44 subset of unstandardized weight matrix for

value of 41 bishop contiguity 129
visualization methods 45 subset of unstandardized weight matrix for
Voas, D. 283, 293 queen contiguity 130
Voogd, H. 475 subset of unstandardized weight matrix for
Voronoi diagrams rook contiguity 128
additively weighted 451 three nearest neighbors 144, 1445
inward directed 451 two nearest neighbors 142, 142
multiplicatively weighted network 452 weighted least squares estimators 19
network 449 weighted least squares (WLS), fitting models to
other types 452 semi-variograms 165, 1734
outward directed 450 Weighted Means of Shorter Distance (WMSD)
planar 449 criterion 200
Vythoulkas, P.C. 232 weighted network Voronoi diagrams 4501
weighting functions 2456
Waagepetersen, R. 306 spatially adaptive 246
Wackernagel, H. 173 weighting matrices 245
Wagner, H.H. 90, 92 weighting schemes 125, 139
Wakefield, 338 continuous 139
Wald test 2623 discrete 139
Waller, L.A. 302, 303, 304, 305, 343, 344, 365 inverse distance 1523
Walsh, S. 117 irregularly located areas 1556
Wanek, D. 233 limit models 14850
Wang, Y.C. 118 negative exponential model 1535
Ward, M. 43, 44, 357 Pace and Gilleys continuous version of nearest
Ware, C. 50 neighbors 1468
Warrender, C.E. 67 weighting, spatial surveillance 348
Warrick, A.W. 192 Wentz, E.A. 28, 32, 400, 401, 405
Warrick/Myers (WM) criterion 192 Wertheimer II, R. 283
Wartenburg, D.E. 96 Wesseling, C.G. 165
Washington West, K.D. 268
census geography 11012 what-if simulations 2801
per capita income for blacks 111 White, H. 268
statistics for per capita income for blacks 111 White, R. 406, 408, 409
wavelet decomposition 100 White, R.W. 407
Wealands, S.R. 228 Whittle, P. 17, 191, 255
Webster, R. 159, 162, 163, 164, 165, 168, whole-map statistics 209
192, 196 Widrow, B. 372
Wegener, M. 285 Wiegand, T. 93, 96
Weidong, L. 101 Williams, F.L.R. 316
Weigend, A.S. 388 Williams, G.P. 405, 407, 408
weight matrices 12530 Williamson, P. 279, 283, 293
bishop contiguity 1278 Wilson, A. 2789, 284, 288
inverse distance 152 Wilson, A.G. 407
limit models 150 Wilson, J.P. 229
nearest neighbor 140, 1402 WinBUGS 20, 324, 3401
negative exponential model 154, 155 Windows Live Local 34
neighbors in bishop contiguity 128 Witlox, F. 227
neighbors in queen contiguity 129 Wolfram, S. 409
neighbors in rook contiguity 127 Wolter, C. 347
queen contiguity 12830 Wong, D. 95, 106, 108, 112, 116, 120, 407
regular lattice areas 126 Woodall, W.H. 344
rook contiguity 1267 Worsley, K.J. 351
row normalizing 1256 Wright, S.J. 383
INDEX 511
Wrigley, N. 19 Young, L.J. 119, 178

Wu, F. 229, 409 Yu, X. 235
Wu, J. 117
Wulder, M. 96 Zadeh, L.A. 225
Zeng, T.Q. 227, 230, 235
Xie-Beni validity index 233 Zhan, F.B. 234
Xie, Y. 409 Zhang, P. 81
Zheng, D. 233
Yamada, I. 347 Zhou, Q. 227, 230, 235
Yamada, Y. 452, 453 Zhu, A.X. 227, 231
Yanar Tahsin, A. 227, 231 zonal anisotropy 164
Yeh, A.G.-O. 409 zoning effect, modifiable areal units problem
Yoo, B. 429 (MAUP) 113, 11315, 114
Yoo, S.S. 80 zoning problem 108, 119
Yoon, M.J. 264 Zubrzycki, S. 190

Be OK

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Be OK

Uploaded by

Copyright:

Available Formats

The SAGE

Los Angeles London New Delhi Singapore

Chapter 2 Robert Haining 2009 Chapter 14 Luc Anselin 2009

First published 2009

Apart from any fair dealing for the purposes of research or

SAGE Publications Ltd

SAGE Publications Inc.

SAGE Publications India Pvt Ltd

SAGE Publications Asia-Pacific Pte Ltd

Library of Congress Control Number: 2008921399

British Library Cataloguing in Publication data

Typeset by CEPHA Imaging Pvt. Ltd., Bangalore, India

Notes on Contributors vii

2. The Special Nature of Spatial Data 5

3. The Role of GIS 25

4. Geovisualization and Geovisual Analytics 41

5. Availability of Spatial Data Mining Techniques 63

7. The Modifiable Areal Unit Problem (MAUP) 105

8. Spatial Weights 125

9. Geostatistics and Spatial Interpolation 159

10. Spatial Sampling 183

11. Statistical Inference for Geographical Processes 207

12. Fuzzy Sets in Spatial Analysis 225

13. Geographically Weighted Regression 243

14. Spatial Regression 255

15. Spatial Microsimulation 277

16. Detection of Clustering in Spatial Data 299

17. Bayesian Spatial Analysis 321

18. Monitoring Changes in Spatial Patterns 343

19. Case-Control Clustering for Mobile Populations 355

20. Neural Networks for Spatial Data Analysis 375

21. Geocomputation 397

23. Spatial Analysis on a Network 443

24. Challenges in Spatial Analysis 465

25. The Future for Spatial Analysis 481

Andrew B. Lawson is Professor in the Department of Biostatistics, Bioinformatics and

Chris Brunsdon is Professor of geographic information at the Department of Geography,

Christopher D. Lloyd is Lecturer in the School of Geography, Archaeology and Palaeoecology

David Martin is Professor in the School of Geography, University of Southampton. He is

David W. Wong is Professor in the Department of Geography and Geoinformation Science, at

Dimitris Ballas is a Senior Lecturer in the Department of Geography at the University of

Geoffrey M. Jacquez is President of BioMedware Incorporated, and Adjunct Associate

Graham Clarke is Professor in the School of Geography, Faculty of Environment at the

Lance A. Waller is Professor in the Department of Biostatistics at the Rollins School of

Manfred M. Fischer is Professor of Economic Geography and Director of Institute for

Marie-Jose Fortin is Professor in the Department of Ecology and Evolutionary Biology

Michael F. Goodchild is Professor of Geography at the University of California, Santa Barbara.

Peter A. Rogerson is Professor of Geography and Biostatistics at the University at Buffalo.

Reginald G. Golledge is a Professor of Geography at the University of California, Santa

Robin A. Dubin is Professor of Economics in the Weatherhead school of Management at Case

Robert Haining is Professor of Human Geography at Cambridge University. Between

Shashi Shekhar is McKnight Distinguished University Professor in the University of

A. Stewart Fotheringham is Science Foundation Ireland Research Professor and Director of

Sudipto Banerjee is Associate Professor in the Division of Biostatistics at the University

Bayesian interpolation and prediction (methods and smoothness of spatial processes. He is

Vincent B. Robinson is Associate Professor in the Department of Geography and Planning at

Toshiaki Satoh is currently a researcher in Research & Development Center of PASCO

1.1. WHAT IS SPATIAL ANALYSIS? different types of sensors, location is also

It is difficult to say exactly when spatial

maintain a balance between concepts, theories

where many new tools and methods for REFERENCES

of environmental and physical attributes. relationship to other polygons captured using

is inversely related to the number of 2.1.3. Properties due to

A spatial database has been assembled but