Professional Documents
Culture Documents
Handbook of
Spatial Analysis
The SAGE
Handbook of
Spatial Analysis
Edited by
A. Stewart Fotheringham and
Peter A. Rogerson
A catalogue record for this book is available from the British Library
ISBN 978-1-4129-1082-8
Acknowledgement: Research presented in Chapter 13 was supported by a grant to the National Centre for
Geocomputation by Science Foundation ireland (03/RP1/1382) and by a Strategic Research Cluster grant
(07/SRC1/1168) from Science Foundation ireland under the National Development Plan. The author gratefully
acknowledges this support.
Contents
1. Introduction 1
A. Stewart Fotheringham and Peter A. Rogerson
6. Spatial Autocorrelation 89
Marie-Jose Fortin and Mark R.T. Dale
22. Applied Retail Location Models Using Spatial Interaction Tools 419
Morton E. OKelly
Index 487
Notes on Contributors
Atsuyuki Okabe received his PhD from the University of Pennsylvania and the degree
of Doctor of Engineering from the University of Tokyo. He is currently Professor of the
Department of Urban Engineering at the University of Tokyo where he served as Director of
the Center for Spatial Information Science (19982005). Professor Okabes research interests
include GIS, spatial analysis, and spatial optimization, and he has published many papers
in journals, books and conference proceedings on these topics. He is a co-author of Spatial
Tessellations: Concepts and Applications of Voronoi Diagrams (John Wiley, 2000), the editor
of GIS-based Studies in the Humanities and Social Sciences (Taylor & Francis, 2005). He
serves on the Editorial Boards of seven international journals including International Journal
of Geographical Information Science.
Press, 2006) and co-author of The Atlas of the Island of Ireland: Mapping Social and Economic
Change (Maynooth: AIRO/ICLRD, 2008).
Eric Delmelle is Assistant Professor in the Geography and Earth Sciences Department at the
University of North Carolina (Charlotte) where he teaches GIS, geovisualization and spatial
optimization. He received his PhD in geography from the State University of New York at
Buffalo. His research interests focus on spatial sampling optimization and geostatistics, non-
linear allocation problems, geovisualization and GIS.
population genetics. Dr. Jacquez is currently Principal Investigator on three grants from the
National Cancer Institute to develop spatial statistical methods and software. He also publishes
extensively in the fields of spatial statistics, GIS and epidemiology.
Harvey J. Miller is Professor and Chair of the Department of Geography at the University of
Utah. His research and teaching interests include GIS, spatial analysis and geocomputational
techniques applied to understanding how transportation and communication technologies shape
individual lives and urban morphology. Since 1989, he has published approximately 50 papers
in journals, books and conference proceedings on these topics. He is co-author of Geographic
Information Systems for Transportation: Principles and Applications (Oxford University Press,
2001) and co-editor of Geographic Data Mining and Knowledge Discovery (Taylor and Francis,
2001) and Societies and Cities in the Age of Instant Access (Springer, 2007). Harvey serves on
the editorial boards of several scientific journals and in 20052011 he is serving as co-Chair
of the Transportation Research Board, Committee on Spatial Data and Information Science of
U.S. National Academies.
Jaymie R. Meliker is Assistant Professor of Preventive Medicine in the Medical Center at State
University of New York at Stony Brook. He received his PhD in 2006 from the Department of
Environmental Health Sciences, University of Michigan School of Public Health. Dr. Melikers
research contributes to the fields of exposure science, GIScience, health geography, and envi-
ronmental epidemiology by developing methodologies for integrating sources of spatial, tempo-
ral, and spatio-temporal variability in environmental health applications. Prior to joining Stony
Brook, he worked as a Research Scientist at BioMedware, Inc., pioneering the development
of spatio-temporal software and statistical algorithms for addressing public health concerns.
Luc Anselin is Faculty Excellence Professor and Director of the Spatial Analysis Laboratory
in the Department of Geography at the University of Illinois, Urbana-Champaign. He is also a
Senior Research Associate at the National Center for Supercomputing Applications at UIUC.
Dr. Anselins research deals with various aspects of spatial data analysis and geographic
information science, ranging from exploratory spatial data analysis to geocomputation, spatial
statistics and spatial econometrics. He has published widely on topics dealing with spatial and
regional analysis, including a much cited book on Spatial Econometrics (Kluwer, 1988); over
a hundred refereed journal articles and book chapters, as well as a large number of reports and
technical publications.
to analyze spatial and spatio-temporal patterns. His recent areas of interest include spatial
point process methods in alcohol epidemiology and conservation biology (sea turtle nesting
patterns), and hierarchical models in disease ecology. Dr Waller is Chair of American Statistical
Association Section on Statistics and the Environment (2008). He is also President-Elect
of International Biometric Society Eastern North American Region (2008), and serves as
Associate Editor of Biometrics, Bayesian Analysis.
Mark Dale is Professor in the Department of Biological Science and Dean in the Faculty of
Graduate Studies and Research at the University of Alberta, Canada. He received his PhD from
Dalhousie University, Canada. His current research interests involve methods for detecting and
analyzing the spatial relationships of plants in populations and communities and spatial analysis
and spatial statistics with applications in ecological systems. Professor Dale is co-author of
Spatial Analysis: A Guide for Ecologists. (Cambridge University Press, 2005) and he served
as an associate editor for Canadian Journal of Botany.
He is the recipient of numerous awards including the Educator of the Year Award from
the University Consortium for Geographic Information Science, a Lifetime Achievement
Award from Environmental Systems Research Institute, Inc., the American Society of
Photogrammetry and Remote Sensing Intergraph Award and the Horwood Critique Prize of
the Urban and Regional Information Systems Association.
Morton E. OKelly is Professor and Chair of the Department of Geography at the Ohio
State University. His research interests include location theory, transportation, network design
and optimization, spatial analysis and GIS. Dr. OKelly co-authored two books: Geography of
Transportation, 2nd edition (Prentice Hall, 1996) and Spatial Interaction Models: Formulations
and Applications (Kluwer Academic: Amsterdam, 1989), as well as over 75 research papers
in peer-reviewed journals, book chapters and conference proceedings.
Peter M. Atkinson is Professor and Head of School of Geography and Director of the
University Centre for Geographical Health Research at the University of Southampton.
His research interests focus on geostatistics, spatial statistics, remote sensing, and spatially
distributed dynamic modelling applications for environmental problems and hazards.
He is co-editor of International Journal of Remote Sensing Letters and associate editor
of International Journal of Applied Earth Observation and Geoinformation. Professor
Atkinson is also Fellow of the Royal Geographical Society and Fellow of the Royal
Statistical Society.
Pusheng Zhang is currently with the Microsoft Virtual Earth team. He received his PhD
in Computer Science from the University of Minnesota. His research interests include local
search engine design, spatial and temporal databases, data mining and geographic information
retrieval. Dr Zhang is a member of the IEEE Computer Society.
Ranga Raju Vatsavai received his PhD in Computer Science from the University of Minnesota
where he also worked as Research Fellow in Remote Sensing Laboratory. Currently Dr Vatsavai
is employed at the Oak Ridge National Laboratory. His broad research interests are centered
on spatial and spatio-temporal databases and data mining.
publications including technical reports, book reviews, and published research notes. He has
presented more than 100 papers at local, regional, national, and international conferences
in geography, regional science, planning, psychology, and statistics. Professor Golledge
received an Association of American Geographers (AAG) Honors Award in 1981. He is an
Honorary Life-Time Member of the Institutes of Australian Geographers and a Fellow of the
American Association for the Advancement of Science. He received an International Geography
Gold Medal Award from the LAG in 1999. In 1998 he was elected Vice President of the AAG;
in 19992000 he was elected AAG President.
Urka Demar is a lecturer at the National Center for Geocomputation at the National
University of Ireland, Maynooth. She has a PhD in Geoinformatics from the Royal Institute
of Technology, Stockholm, Sweden. Her research interests include Geovisual Analytics and
Geovisualisation. She is combining computational and statistical methods with geovisualisation
for knowledge discovery from spatial data. Additionally, she is interested in spatial analysis
and mathematical modelling of spatial phenomena. She has an established cooperation with
researchers at the Helsinki University of Technology with whom she is working on spatial
analysis of networks for crisis management.
Vijay Gandhi is Masters Student in Computer Science at the University of Minnesota, Twin
Cities. After graduating from Computer Science and Engineering at Madras University he
worked in the field of business intelligence and data warehousing. Currently he is involved in
research on spatial databases and spatial data mining.
equivalents. Firstly, the data are typically 2 Those techniques collectively known as
not independent of each other. Attribute exploratory data analysis which consist of
values in nearby places tend to be more methods to explore data (and also model
similar than are attribute values drawn outputs) in order to suggest hypotheses or to
from locations far away from each other. examine the presence of unusual values in
the data set. Often, exploratory data analysis
This is a useful property when it comes
involves the visual display of spatial data
to predicting unknown values because we
generally linked to a map.
can use the information that an unknown
attribute value is likely to be similar to 3 Those techniques that examine the role of
neighbouring, known values. The subfield randomness in generating observed spatial
of geostatistics has grown up based on this patterns of data and testing hypotheses about
premise. However, if data values do exhibit such patterns. These include the vast majority
spatial autocorrelation, this causes problems of statistical models used to infer the process
for statistical techniques that assume data are or processes generating the data and also to
drawn from independent random samples. provide quantitative information on the likelihood
Special statistical methods, such as spatial that our inferences are incorrect.
regression models, have been developed
4 Those techniques that involve the mathematical
to overcome this problem. Equally, it is modelling and prediction of spatial processes.
often hard to defend the assumption of
stationarity in spatial processes. That is, it
is often assumed that the process generating This book will cover examples of all four
the observed data is the same everywhere. types of spatial analysis.
Spatial non-stationarity exists where the
process varies across space. Again, special
statistics, such as Geographically Weighted
Regression, have been developed to handle 1.3. SPATIAL ANALYSIS IN
this problem. PERSPECTIVE
of spatial analysis were overly concerned the reasons why the analysis of spatial data
with form rather than with process and needs separate treatment;
were rightly criticized for this focus. In
addition, it is possible that expectations the main areas of spatial analysis;
for quantitative methods may have initially
the key debates within spatial analysis;
been too high. For example, many believed
that spatial modelling, when coupled with examples of the application of various spatial
adequate data and rapidly increasing com- analytical techniques;
puting power, would lead society to solve
many of the pressing issues in urban and problems in spatial analysis; and
regional areas.
Significant advances in spatial analysis areas for future research.
during the past two decades have brought
about a new era of interest in the field. Although there is inevitable (and desir-
The period of relative decline has now been able) variability in the structure and nature
replaced by one of great enthusiasm for the of the individual chapters, in a broad
potential of spatial analysis. This potential sense the contributions have the following
has been recognized and embraced by aims:
researchers from many fields, ranging from
public health and criminal justice, to ecology
and environmental studies, as evidenced by describe the current situation within the
various contributions to this volume. eld, highlighting the main advances
It is now widely recognized in a broad that have taken place, as well as current
debates;
range of disciplines that spatial analysis has
an important role to play in making sense describe the problems that still exist, indicating
of the large volumes of spatial data we now where future research may be best directed;
have available and the demand for spatial
analysis has never been stronger. It thus is indicate key works in the eld and provide an
an important time to produce this Handbook extensive bibliography for the area;
of Spatial Analysis describing many of the
major areas of spatial analysis. describe the use of the technique in several
disciplines; and
This chapter describes some of the special properties of spatial data from properties
or distinguishing features of spatial data that are due to the chosen representation of
opening the way to methodological issues geographical space and from properties that
that will be treated in more depth in later are a consequence of measurement processes
chapters. The use of the term special by which data are collected for the purpose
should not be taken to imply that no other of storage in the spatial data matrix (SDM).
types of data possess these features. Spatial The SDM is what the analyst works with.
data analysis is a sub-branch of the more We conclude by considering the implications
general field of quantitative data analysis of these properties for the methodology of
and has sometimes suffered from not paying spatial data analysis.
sufficient attention to that fact. Many of the Geographic Information Science (GISc) is
data properties that will be encountered are the generic label that is frequently used, par-
found in other types of (non-spatial) data but ticularly by geographers, to define the area of
when found in spatial data, may possess a science that involves the analysis of spatially
particular structure or properties may arise in referenced data that is data where each
particular combinations. case has some form of locational co-ordinate
The chapter will first define what is meant attached to it. Data is the lynch pin in the
by spatial data and then identify properties. process of doing science and it is essential
It will be helpful, in order to put structure on that methodologies for spatial data analysis
this discussion, to distinguish fundamental are tuned to the properties of spatial data.
6 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
The science undertaken with spatial data suppressed if analysis is concerned with only
is usually observational rather than experi- a single time period but may be retained
mental. This is important. Much spatial data if there are to be a series of comparative
are not collected under controlled situations. studies through time or if different attributes
We often cannot choose the values of were recorded at different times and the
independent variables in order to generate a analyst needs to be aware of this. Such
satisfactory experimental design. There is no data may come from a variety of different
replication (in order, for example, to assess sources including national censuses; public
the effects of measurement error) and the or private agency records (e.g., national
analyst must take the world as he or she health services, police force areas, consumer
finds it. There may be further problems in surveys); satellite imagery; environmental
specifying what the appropriate locational surveys; and primary surveys. The data may
co-ordinate is when studying certain types be collected from a census or from a sampling
of processes and outcomes. All this has process. For the purposes of analysis data
implications for the quality of spatial data and from different sources may be required. Stud-
for the methodologies that can be employed. ies in environmental epidemiology utilise
We worry not only about the quality of our health, demographic, socio-economic and
data but exactly what it is we are observing environmental data. These data may come
in any given situation. A consequence of this with differing degrees of quality and may not
is that much of the data collected may be all be collected on the same areal framework
used to build a model of the situation under (Brindley et al., 2005).
study which can then be used to estimate To understand the properties of spatial
parameters and test hypotheses. We shall data we need to understand the relationship
see that some of the fundamental properties between equation (2.1) and the real world
of spatial data raise major problems in from which the data are taken. In order to
this regard. undertake data analysis the complexity of the
real world must be captured in finite form
through the processes of conceptualization
and representation (Goodchild, 1989; Guptill
2.1. SPATIAL DATA AND THEIR and Morrison, 1995; Longley et al., 2001).
PROPERTIES We shall focus here only on the issues
associated with capturing spatial variation,
A spatial datum comprises a triple of
but the reader should note that there are
measurements. One or more attributes (X)
conceptualization and representation issues
are measured at a set of locations (i) at time t,
associated with the way attributes and time
where t may be a point or interval of time.
are captured as well.
So, if k attributes are measured at n locations
The first step in this process, which
at time t, we can present the spatial data in
ultimately leads to the construction of
the form:
the SDM, involves conceptualizing the
geography of the real world. There are
{xj (i; t) ; j = 1, . . ., k; i = 1, . . ., n}. (2.1) two views of the geographical world in
GISc the field and the object views.
The field view conceptualizes space as
Equation (2.1) expresses in shorthand much covered by surfaces with the attribute
of the content of the SDM. The record of varying continuously across the space. This
when the observation was taken (t) may be is particularly appropriate for many types
THE SPECIAL NATURE OF SPATIAL DATA 7
Testing for spatial autocorrelation was whereas space has no such order. The two
one of the high-profile research agendas in dimensional nature of space means that
geography during the quantitative revolution. dependency structures might vary not just
Geographers adapted spatial autocorrelation with distance but direction too giving rise
statistics based on the join-count statistic, to anisotropic dependency structures with
the cross product statistic and the squared structure along the northsouth axis differing
difference statistic that had been developed from the eastwest axis. The presence of
for quantifying spatial structure on regular spatial autocorrelation, that attribute values
areal frameworks (grids). These statistics are not statistically independent, has funda-
were developed to test for statistically signifi- mental implications for the conduct of spatial
cant spatial autocorrelation on irregular areal analysis.
frameworks (Cliff and Ord, 1973). The Spatial autocorrelation, in statistical terms,
null hypothesis (no spatial autocorrelation) is a second order property of an attribute
was assessed against a non-specific alter- distributed in geographic space. In addition
native hypothesis (spatial autocorrelation is there may be a mean or first-order component
present). We shall see how this argument was of variation represented by a linear, quadratic,
developed in later years with the introduction cubic (etc.) trend. We can think of these
and use by geographers of models for spatial as two different scales of spatial variation
variation. although the distinction may be hard to make
In the earth sciences, dealing principally and quantify in practice. As Cressie (1991)
with point data from surfaces, the quan- remarks: What is one persons (spatial)
tification of structure was based on the covariance may be another persons mean
use of the empirical semi-variogram which structure (p. 25). It has often been remarked
uses a squared difference statistic (Isaaks that spatial variation is heterogeneous. This
and Srivastava, 1989). The advantage of type of decomposition (plus a white noise
the latter route was that it led naturally to element to capture highly localized hetero-
model specification and model fitting using geneity) is one way of formally capturing that
theoretical semi-variograms. Of course these heterogeneity using what are termed global
quantitative measures and tests of hypothesis models. Another approach is to only analyze
depend on the scale of analysis. That is, they spatial subsets, that is allow model structure
depend on the size of the polygons in terms to vary locally.
of which data are reported, the inter-point
distance between samples on a continuous
surface. Thus the chosen representation has
2.1.2. Properties due to the
an important influence on the quantification
chosen representation
of this fundamental property and hence
its presence within any spatial dataset. If We have already noted that the extent to
samples are taken at sufficient distances apart which our data retains fundamental properties
the level of spatial autocorrelation is likely to depends on the chosen representation. We
be much reduced relative to the case where now turn to look at other properties that
samples are taken close together. stem directly or indirectly from the chosen
Autocorrelation statistics are also used representation.
to capture temporal structure in attribute Representing spatial variation using poly-
values but there are important differences gons is employed in many branches of
with the spatial situation. Time has a natural science that handle spatially referenced
uni-directional flow (from past to present) data. Two of the generic consequences of
THE SPECIAL NATURE OF SPATIAL DATA 9
working with data aggregates are: intra- at the cost of statistical precision. Data errors
areal unit heterogeneity and inter-areal unit or small random fluctuations in numbers
heteroscedasticity. of events (household burglaries; disease
Whether the data refer to a continu- outcomes) will have a big effect on the
ously varying phenomenon (field view) or calculation of rates when populations are
aggregations of individuals like households small. Take the case of a standardized
(object view) the effect of bundling data into mortality ratio. If the expected count is
spatial aggregates has the effect of smoothing small, for example 2.0, then the ratio itself
variation. In the case of environmental data (observed count divided by the expected
and the use of pixels then the degree of count) rises or falls by 0.5 with each
smoothing will clearly depend on the size of addition or subtraction of a single case. This
the pixels. The larger the pixels the greater will have implications for determining the
the degree of smoothing. A non-intrinsic statistical significance of counts whether
partition, where the polygons are defined in there are significantly more cases than would
terms of attribute variability with the aim be expected on the basis of chance alone. It
of maximizing within unit homogeneity and will also have implications for determining
maximizing between-unit heterogeneity will the statistical significance of differences in
not produce this effect to the same extent. counts between areas which in turn raises
This second process shares common ground problems for the detection of significant
with the process of regionalization to which crime hotspots or disease clusters.
it is sometimes compared. In summary, there is a trade-off that is
Intra-unit heterogeneity is a particular linked to the number of individual elements
problem for many types of social science in a polygon. A polygon containing few
data particularly in those cases where area individuals will tend to be more homo-
boundaries are chosen arbitrarily as was the geneous but statistical quantities, such as
case with the UK census for example prior rates and ratios, tend to be unreliable in
to 2001. Attributes reported for an area may the sense that small errors and random
represent percentages or means of attribute fluctuations can impact severely on the
values associated with the individuals (people calculated values. Polygons containing many
or households) that have been aggregated and individuals will generate robust rates and
the analyst may have no information on the ratios but often conceal much higher levels
variability around the mean. If an ecological of internal heterogeneity.
or contextual attribute is calculated for an In practice an area is sometimes partitioned
area (social capital say, or area deprivation) into polygons of varying size and this can
again the calculation is conditional on the yield a secondary effect on data properties.
chosen representation and the scale of the A rate calculated for a polygon where
partition. the denominator attribute is small has a
One of the conclusions that might be drawn larger variance than a rate computed for a
from this is that it is better to have small areal polygon where the denominator attribute is
aggregates rather than large ones. Assuming large. Moreover there is a mean-variance
spatial structure, a reasonable supposition dependence in the rate statistics. Take the
given the discussion in section 2.1.1, then case where the denominator is the number of
smaller areas should be more homogeneous households (n(i)). Rates are observed counts
than larger areas and their mean values of some attribute (number of burglaries) in
should be more representative of their areas polygon i (O(i)) divided by the number of
population. But such spatial precision comes households. It follows from the binomial
10 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
model for O(i) that: However there are problems here when
making comparisons. The standard error
tends to be large for areas with small
E[O(i)/n(i)] = (1/n (i)) E[O(i)] = p (i) ; populations and small for areas with large
populations because of the effect of popu-
Var[O(i)/n (i)] = (1/n (i))2 Var[O(i)] lation size on E(i). So extreme ratios tend
= p(i)(1 p(i))/n(i) to be associated with small populations but
(2.2) ratios that are significantly different from 1.0
tend to be associated with areas with large
populations (Mollie, 1996).
where E[ . . . ] and Var[ . . . ] denote mean These examples are intended to illustrate
and variance and p(i) is the probability the way in which data properties can
that any individual in area i (e.g., number be induced by the chosen representation.
of households) has the characteristic (e.g., In certain circumstances the geographical
been burgled) that is being counted. The structure of the representation (for example
mean and the variance in equation (2.2) are the geography of which areas have large
clearly not independent. It also follows from and which have small denominator values)
equation (2.2) that the standard error of the could induce a geographical structure on the
estimate of the rate p(i) which is: statistics which when mapped could then give
rise to a misleading impression about trends
or patterns in the data.
[ p(i) (1 p(i)) /n(i)]1/2
in adjacent areas because the source of there are under or overcounts arising from
the overcount is the set of nearby areas the reporting process. Spatially uniform
that have lost cases as a result of the data incompleteness raises problems for
location error. So, count errors in adjacent analysis but spatial variation in the level
areas may be negatively correlated (Haining, of data incompleteness with, for example,
2003, pp. 6770). Location error can be undercounting, more serious in some parts of
introduced into a spatial data set as a result the study area than others, can seriously affect
of having to put data, collected on different comparative work and the interpretation of
spatial frameworks, onto a common spatial spatial variation. Missing or inaccurately
framework. Areal interpolation methods are located cases in a point pattern of events may
used but these are based on assumptions result in failure to detect a local cluster of
about how attributes are distributed within cases (Kulldorff, 1998).
areal units and these assumptions often Incompleteness in cancer data leads to
cannot be tested. The consequence is that forms of under or overcounting which give
further levels and patterns of error are rise to spatial variation that is an artifact of
introduced into the database (Cockings et al., how the data were collected. In the case of
1997). official crime statistics geographical differ-
In the case of remotely sensed data, ences between large counties in England may
the values recorded for any pixel are not be due to differences in police investigative
in one-to-one relationship with an area of and reporting practices. On the intra-urban
land on the ground because of the effects scale, burglaries in suburban areas will, on
of light scattering. The form of this error the whole, be well reported for insurance
depends on the type and age of the hardware purposes, but in some inner-city areas there
and natural conditions such as sun angle, may be under reporting either because there
geographic location and season. The point is no incentive or because of fear of
spread function quantifies how adjacent pixel reprisals. The Census provides essential
values record overlapping segments of the denominator data for computing small area
ground so that the errors in adjacent pixel rates. However refusals to cooperate can lead
values will be positively correlated (Forster, to undercounting and the 1991 Census in the
1980). The form of the error is analogous to UK was thought to have undercounted the
a weak spatial filter passed over the surface population by as much as 2% because of
so that the structure of surface variation, in fears that its data would be used to enforce
relation to the size of the pixel unit, will the new local poll tax. Inner city areas
influence the spatial structure of error cor- show higher levels of undercounting than
relation. Linear error structures also arise in suburban areas where populations are easier
remotely sensed data (Short, 1999). Finally, to track. Finally, since there are 10-year gaps
we note that the effects of error propagation between successive censuses, population in-
may further complicate error properties when and out-flows in many areas may be such as
arithmetic or cartographic operations are to preserve the essential socio-economic and
carried out on the data and source errors demographic characteristics of the areas. On
are compounded and transformed via these the other hand some areas of a city, especially
operations (Haining and Arbia, 1993). inner-city areas, may experience population
Data incompleteness may induce false mobility and redevelopment which result in
patterns in spatial data. Data incompleteness marked shifts that have implications for the
refers to the situation where there are reliability of the data in the years following
missing data points or values or where the Census.
12 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
Figure 2.1 Processes involved in constructing the spatial data matrix and the data
properties that are present or introduced at each stage.
Finally, in the case of some imagery, some We divide this section into situations where
areas of the image may be obscured because spatial properties can be exploited to help
of cloud cover. A distinction should be drawn solve problems and situations where spatial
between data that are missing at random properties introduce complications for the
from data that are missing because of some conduct of data analysis.
reason linked to the nature of the population
or the area. Weather stations temporarily
out of action because of equipment failure 2.2.1. Taking advantage of spatial
produce data missing at random. On the data properties to tackle
other hand, mountainous areas will tend problems
to suffer from cloud cover more than
adjacent plains and there will be systematic Consider the following problems:
differences in land use between such areas.
This distinction has implications for how Samples of attribute values have been taken
successfully missing values can be estimated across an area. The analyst would like to
and whether the results of data analysis will construct a map to describe surface variation
be biased because some component of spatial using the information contained in the sample.
variation is unobservable. Perhaps instead the analyst just wishes to
Figure 2.1 provides a summary of the estimate the surface at a point, or set of points,
points raised in this section. where no sample has been taken and estimate
the prediction error.
dependency structures that reflect the way the or interesting features in data including pos-
process plays out across geographic space. sible data errors and formulating hypotheses.
Not all processes of interest are spatial Exploratory spatial data analysis (ESDA)
in the sense described above. Many of the undertakes these activities with respect to
processes of interest to geographers play spatial data so that cases can be located on
out across geographic space in response to a map and the spatial relationships between
the place-based characteristics of areas (the cases assumes importance because they carry
particular mix of attributes they possess) information that is likely to be relevant to
and the spatial relationships between those the analysis (Cressie, 1984; Haining et al.,
areas. Outcomes in places (whether for 1998; Fotheringham and Charlton, 1994). It
example economic, social, epidemiological is important to be able to answer questions
or criminological) are not necessarily merely such as: where does that subset of cases on
the consequence of the properties of those the scatterplot or that subset of cases on the
places as places but may also be the boxplot, occur on the map? What are the
consequence of relational and contextual spatial patterns and spatial associations in this
influences. The distance between places; geographically defined subset of the map? In
the difference between adjacent places in the case of regression modelling do the large
terms of relevant attributes; the overall positive residuals, for example, cluster in one
configuration of places across a region, are area of the map?
all facets of relation and context that may ESDA and the software that supports
impact on outcomes and modify the role of ESDA needs to be able to handle the spa-
place in influencing outcomes. Two places tial index and be able to handle the special
may be identical in terms of their place- queries that arise because of the spatial refer-
based characteristics but differ significantly encing of the data. Thus the map becomes an
in terms of their relational and contextual essential visualization tool (Dorling, 1992).
attributes with neighbouring areas and these The linkage between a map window and
differences may explain why (for exam- other graphics windows, so that cases can
ple) two similarly affluent neighbourhoods be simultaneously highlighted in more than
experience quite different levels of assault one window, becomes an essential part of the
and robbery; why two similarly deprived conduct of ESDA (Andrienko and Andrienko,
neighbourhoods experience quite different 1999; Monmonier, 1989).
levels of health outcomes. Visualizing spatial data raises particular
We now examine briefly how these fea- problems, in part because of some of the
tures of how attribute values are generated properties discussed in earlier parts of this
impact on the choice of methodology for chapter. We highlight two here. First, it has
the purpose of data analysis. We distinguish been noted that data values, particularly rates
between exploratory spatial data analysis and and ratios, may not be strictly comparable
model-based forms of analysis that allow because standard errors are population size
hypothesis testing and parameter estimation. dependent. So if areas vary substantially
in terms of population counts (used as
the denominators for a rate) then extreme
Exploratory spatial data analysis values and even patterns detected by visual
Exploratory data analysis (EDA) comprises a inspection might be associated with that
collection of visual and numerically resistant effect rather than real differences between
techniques for summarizing data properties, areas. Second, areas that partition a region
detecting patterns in data, identifying unusual might be very different in physical size.
THE SPECIAL NATURE OF SPATIAL DATA 15
This may mean that the viewer of a map normal distribution. Pearsons product
has their attention drawn to certain areas of moment correlation coefficient (r) is the
the cartographic display (those areas with statistic used to measure the association
physically large spatial units) whilst other between X and Y . If the observations on the
areas are ignored. This may be particularly two variables are independent (there is no
important if in fact it is the small areas spatial autocorrelation in either X or Y ), then
that have the larger populations so that it if the null hypothesis is of no association
is their rates and ratios (rather than the between X and Y then a test statistic
rates and ratios associated with the physically is given by:
larger but less densely populated areas) that
are the more robust. One solution to this 1/2
problem is to use cartograms so that areas are (n 2)1/2 |r| 1 r 2 (2.3)
transformed in physical extent to reflect some
underlying attribute such as population size
(Dorling, 1994). This comes at a cost because which is t distributed with (n 2) degrees of
the individual areas in the resulting cartogram freedom.
may be hard for the analyst to place. There These distributional results do not hold if
may be a need for a second, conventional, X and Y are spatially correlated. The problem
map linked to the cartogram, so the analyst is that when spatial autocorrelation is present
can highlight areas on the cartogram and see the variance of the sampling distribution of r,
where they are on the conventional map. which is a function of the number of pairs
Conventional visualization technology is of observations n, is underestimated by the
often based on the assumption that all conventional formula which treats the pairs
data values are of equal status so that of observations as if they were independent.
the viewer can extract information from The effect of spatial autocorrelation on tests
visual displays without worrying about the of significance have been extensively studied
statistical comparability of the data values (for reviews see Haining, 1990, 2003) and
that are displayed. This assumption may shown to be very severe when both X and Y
break down when dealing with spatially have high levels of spatial autocorrelation.
aggregated data (Haining, 2003). Clifford and Richardson (1985) obtain an
adjusted value for n (n ) which they call the
effective sample size. This value, n , can
Model tting and hypothesis testing be interpreted as measuring the equivalent
If n data values are spatially autocorrelated number of independent observations so that
then one of the consequences of this for the the solution to the problem lies in choosing
application of standard statistical inference the conventional null distribution based on n
procedures is that the information content rather than n. An approximate expression for
of the data set is less than would be the this quantity is:
case if the n values were independent. This
means that the degrees of freedom available
1
for testing hypotheses is not a simple function n = 1 + n2 trace Rx Ry (2.4)
of n. We shall take the example of testing for
significant bivariate correlation between two
variables to illustrate this point. where Rx and Ry are the estimated spatial
Suppose n pairs of observations, correlation matrices for X and Y respectively.
{(x(i), y(i))}i are drawn from a bivariate (For a discussion of estimators see Haining,
16 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
1990, pp.118120.) The null hypothesis of no property and thus might be thought of as the
association between X and Y is rejected if: simplest departure from spatial independence
can be written as follows (Besag, 1974;
1/2 Cressie, 1991, p. 407):
1/2
n 2 |r| 1 r 2 (2.5)
E X(i) = x(i) X( j) = x( j) jN(i)
exceeds the critical value of the t distribution
=+ w(i, j) [X( j) ] ,
with (n 2) degrees of freedom.
jN(i)
This illustrates a general problem. Since
the n observations are positively spatially i = 1, . . ., n (2.6)
autocorrelated, the information content of
the sample is over-estimated if n is used it
and:
needs to be deflated. The sampling variance
of statistics are underestimated leading the
analyst to reject the null hypothesis when no Var X(i) = x(i) X( j) = x( j) jN(i) = 2 ,
such conclusion is warranted at the chosen
significance level. For the effects of spatial i = 1, . . ., n
dependency on the analysis of contingency
tables see, for example, Upton and Fingleton
(1989) and Cerioli (1997). where E[ . . . | .] and Var[ . . . | .] denote
To make further progress in understanding conditional expectation and variance respec-
the importance of spatial data properties and tively, is a first-order parameter and
the complications they introduce we need is the spatial interaction parameter. The
to introduce models for spatial variation Markov property means observations are
or data generators for spatial variation. conditionally independent given the values
Such models are important. By specifying a at neighbouring sites. {w(i, j)} denotes
model to represent the variation in the data the neighbourhood structure of the system
(including the spatial variation), the analyst of areas and w(i, j) = 1 if i and j are
is able to construct tests of hypothesis with neighbours ( j N(i)) and w(i, i) = 0 for
greater statistical power than is possible if all i. W is the n n matrix of {w(i, j)}
testing is against a non-specific alternative. and is sometimes called the connectivity
There are a number of possible formal matrix. It is a requirement that lies between
models for spatial variation of which the (1/min ) and (1/max ) where min and max
simultaneous spatial autoregressive (SAR), are the smallest and largest eigenvalues
the conditional spatial autoregressive (CAR) of W. For a fuller introduction to the
and the moving average (MA) models are Markov property for spatial data including
probably the best known. We will briefly look how to construct higher-order spatial Markov
at the first two but the interested reader will models see, for example, Haining (2003,
need to follow up the literature to gain a pp. 297299). This approach allows the
fuller understanding of these models and their construction of a hierarchy of models of
properties (Whittle, 1954; Besag, 1974, 1975, increasing complexity. As noted in Haining
1978; Ripley, 1981; Cressie, 1991; Haining, (2003), however, the Markov property does
1978, 1990, 2003). not have the natural appeal it has in the case
A multivariate normal CAR model which of time series, because space has no natural
satisfies the first order (spatial) Markov ordering. So the neighbourhood structure can
THE SPECIAL NATURE OF SPATIAL DATA 17
often seem rather arbitrary especially in the compared with methods that take account
case of the non-regular areal frameworks of the spatial autocorrelation in the errors.
used to report Census and other social and Second, if the usual least squares formula for
economic data. the sampling variances of these regression
If the analyst of regional data does not estimates is applied, the variances will be
attach importance to satisfying a Markov seriously underestimated. The formulae are
property another option is available called no longer valid and conventional F and t
the SAR model specification. A form of this tests of hypothesis are also not valid. We shall
model was first introduced into statistics by take a very simple example to illustrate these
Whittle (1954). Let e be independent normal points, where the parameter to be estimated
IN(0, 2 I) where I is the identity matrix and tests of hypothesis relate to a constant
and e(i) is the variable associated with site mean .
i (i = 1, . . ., n). Define the expression: Suppose n independent observations {x(i)}
are drawn from a N (, 2 ) distribution. The
sample mean, x, is an unbiased estimator for
X (i) = + w (i, j) [X( j) ] , and the variance of the sample mean is:
jN(i)
+ e(i), i = 1, . . ., n. (2.7)
Var (x) = 2/n. (2.8)
where Cov(x(i), x( j)) denotes the spatial The estimator (2.14) is the best linear
autocovariance between x(i) and x( j). So, if unbiased estimator (BLUE) of . Note that
there is positive spatial dependence and 2 in the case of independence V = I (the
is known then 2/n underestimates the true identity matrix with 1s down the diagonal
sampling variance of the sample mean. If 2 and zeros elsewhere) and equation (2.14)
is unknown and is estimated by equation (2.9) reduces to the sample mean. In the case
then if there is positive spatial dependence V = I two modifications to the sample
the expected value of s2 is (see, for example, mean are occurring. First, the denominator
Haining, 1988, p. 579): for positive spatial dependence will be less
than n. Second, the presence of V1 in the
numerator of equation (2.14) downweights
E s2 = 2 [(2/n(n 1)) the contribution of any attribute x(i) which is
highly correlated with other attribute values
Cov (x(i), x( j))]
{x( j)} that is, where x(i) is part of a cluster
i j(i< j)
of observations.
(2.12)
The variance of is:
equation (2.11) that is the more serious, spatial dependency and intra-area hetero-
at least in the usual situation of positive geneity when modelling a discrete valued
spatial dependence, and that one way to deal response variable such as the count of the
with this is to adjust n in equation (2.8) number of cases of a disease across a region
thereby increasing the sampling variance of using the Poisson model. Spatial dependency
the sample mean. The size of the adjustment and heterogeneity are important causes of
to n will be sensitive to the estimates of the overdispersion. For example consider a local
spatial autocorrelation in the data or, if a diffusion process in which individuals are
spatial model is fitted to the data, the choice more likely to be infected if they are close
of model. The problem is further complicated to someone already infected. The result is
if, as is usually the case, V is not known and that counts of the number of cases will
so must be estimated from the data. reveal Poisson overdispersion because there
Before leaving the normal model it is will be areas with large counts (due to the
important to note that aggregated spatial local infection process) and areas with zero
data may violate another of the statistical counts where the process has not yet started.
assumptions of least squares regression. It These considerations require the analyst both
was remarked in section 2.1 how rates and to carry out tests for overdispersion and
ratios based on areas with very different where necessary take appropriate action.
population counts will have different stan- The effects of overdispersion in generalized
dard errors. It follows that the assumption of linear modelling are rather similar to those
homoscedasticity (or constant error variance) described for the normal model when spatial
is likely to be violated when developing autocorrelation is detected. If overdispersion
models to explain how rates or ratios is present, ignoring it tends to have little
vary over a region. Data transformations or impact on point estimates of the regression
weighted least squares estimators are used parameters (the maximum likelihood estima-
to address these problems (Haining, 1990, tor is consistent, although some small sample
pp. 4950) but such adjustments may need bias might be present). However, standard
to be implemented whilst also addressing error estimates for regression parameters are
the problems created by residual spatial underestimated. Type I errors associated with
autocorrelation (Haining, 1991). In addition the model are underestimated which is par-
to the problems created by failure to satisfy ticularly problematic in relation to predictors
statistical assumptions, spatial data often that are close to the significance threshold.
create data-related problems in regression If the objective is to build a parsimonious
modelling (Haining, 1990, pp. 332333). For model, the presence of overdispersion may
example, the fit of a trend surface model result in an analyst constructing a model
can be influenced by the configuration of more complicated than necessary, and that
the sample data points on the surface where, overestimates the variance explained.
as a result of the particular distribution, Ways of tackling this problem may depend
certain values have high leverage (Unwin and on the reasons for the overdispersion.
Wrigley, 1987); the particular shape of the A conventional approach is through the
study region may also influence the trend use of a variance inflation factor (Dobson,
surface model fit (Haining, 1990, p. 372). 1999). Where the cause is inter-area spa-
These and other issues are reviewed in tial autocorrelation then a discrete valued
Haining (1990, pp. 4050). auto-model may be used which is analogous
We conclude this section by remarking on to equation (2.6) (see Besag, 1974). More
the implications of intra-area and inter-area recently attention has focused on the use
20 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
of spatial random effects models using we have access to only one realization of the
CAR models fitted using WinBUGS (Law process and in order to give our inferences
et al., 2006). These models allow for some broader validity other assumptions need
overdispersion through the random effects to be invoked such as that this realization
term. This is an area of current research is representative of the underlying process.
in spatial modelling since the development There may be no way to test such an
of good modelling tools for discrete val- assumption.
ued response variables has rather taken a The modifiable areal units problem
back seat whilst attention for many years (MAUP) reminds us that results obtained
has focused perhaps disproportionately from analyzing aggregate data are dependent
on the normal model (Law and Haining, on the particular scale of the partition, and,
2004). at the given scale, the particular boundaries
used. In general, statistical relationships
between attributes are stronger the larger
the spatial aggregates because variances
2.3. DRAWING INFERENCES are reduced. Boundary shifts can influence
whether or not disease clusters or crime hot
One of the main purposes of undertaking spa- spots are detected at any scale because if
tial statistical analysis is to make population boundaries happen to cut through the middle
inferences on the basis of the data collected. of a cluster this may dilute the effect over
In concluding this chapter we consider some two or more areas.
of the inference pitfalls associated with the The analysis of aggregated data is par-
analysis of spatial data. ticularly problematic and not just because
What is the population about which of the MAUP. It is important to remember
inferences are made in an observational that conclusions drawn from aggregate data
science? If data are point samples from a can only be transferred to the individual
continuous surface then the population might level under certain conditions. The ecological
be the surface itself. Of course the realized fallacy is the uncritical transfer of findings
surface may be thought of as only one at the group level to the individual level. As
of many possible realizations (the rest not the famous example cites, the suicide rate
having been observed). However, with or in Germany in the 17th century may have
without the concept of a superpopulation been larger in areas with higher percentages
of surfaces, making inferences from point of Catholics but that does not mean Catholics
samples to the (realized) surface population were more prone to commit suicide than
does represent a legitimate target. This Protestants. Quite the reverse as individual
argument is less convincing when the data level data revealed. Aggregation bias raises
represent a complete census for example the serious problems for epidemiological studies
data refer to areas and a complete (or nearly based on aggregate data and is one reason
complete) enumeration has been carried why it is considered the weakest of the
out. What is the population about which different methodologies for assessing dose
inferences are being made now? A frequent response relationships even though this
answer to this is that the underlying process may be the only realistic way of obtaining
is stochastic (chance is an inherent part of the reasonably sound measures of exposure to an
process) so that inferences are directed at the environmental risk factor. The problem is that
process (its parameters and covariates) rather it is not difficult to construct examples where
than the map. The problem with this is that there are complete sign reversals when going
THE SPECIAL NATURE OF SPATIAL DATA 21
Figure 2.2 Spatial data properties and how they impact at different stages of analysis.
from the ecological to the individual level finite digital database and the way spatial
study (Richardson, 1992). data are collected and attributes measured.
The converse of the ecological fallacy Many of these properties were recognized
is the atomistic (or individualistic) fallacy early in geographys quantitative revolution
which assumes relationships identified at the most notably the lack of independence
individual level apply at the group level. in data values collected close together in
There may be group level or contextual space. Geographers then and since have
effects that need to be taken into account made important contributions to the devel-
as for example in the study of youth opment of relevant statistical theory and
offending, where the risk of becoming an practice.
offender may not depend only on personal Geographers continue to develop new
and household level risk factors but also methods for describing spatial variation and
neighbourhood and peer group effects. This new methods for modelling processes that
then raises the problem of defining what the operate across geographical space. At present
neighbourhood is. there are two strong traditions which provide
Figure 2.2 provides a summary of the focuses for research. On the one hand there
points raised in sections 2.2 and 2.3. are methodologies based on whole map or
global statistics that seek to capture data
properties through models that are fitted to
all the data. On the other hand there are
2.4. CONCLUSIONS methodologies based on local statistics that
process geographically defined subsets of the
Spatial data possess a number of dis- data and do not seek to impose a single
tinctive properties that derive from the statistic or model on the whole data set
fundamental nature of geographic space and (Anselin, 1995, 1996; Getis and Ord, 1996;
the way processes unfold in geographic Fotheringham and Brunsdon, 2000). They
space, the way that spatial variation is represent different ways of responding to the
represented for the purpose of storage in a need to develop methodologies to meet the
22 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
analytical challenges posed by the special Marechal, A., (eds), Geostatistics for Natural
nature of spatial data. Resources Characterization, pp. 2144. Dordrecht:
Reidel.
Cressie, N. (1991). Statistics for Spatial Data. New York:
Wiley.
REFERENCES
Dobson, A.J. (1999). An Introduction to Generalized
Andrienko, G.L. and Andrienko, N.V. (1999). Interactive Linear Models. Boca Raton: Chapman & Hall.
maps for visual data exploration. International Dorling, D. (1992). Stretching space and splicing
Journal of Geographical Information Science, 13: time: from cartographic animation to interactive
355374. visualization. Cartography and Geographic Informa-
Anselin, L. (1988). Spatial Econometrics: Methods and tion Systems, 19: 215227.
Models. Dordrecht: Kluwer Academic. Dorling, D. (1994). Cartograms for visualizing human
Anselin, L. (1995). Local indicators of spatial geography. Hearnshaw, H.M. and Unwin, D.J., (eds),
association LISA. Geographical Analysis, 27: Visualization in Geographic Information Systems,
93115. pp. 85102. New York: J. Wiley & Sons.
Anselin, L. (1996). The Moran scatterplot as an ESDA Fisher, R. (1935). The Design of Experiments.
tool to assess local instability in spatial association. Edinburgh: Oliver & Boyd.
In: Fischer, M., Scholten, H.J. and Unwin, D., (eds), Forster, B.C. (1980). Urban residential ground cover
Spatial Analytical Perspectives on GIS, pp. 111125. using LANDSAT digital data. Photogrammetric
London: Taylor & Francis. Engineering and Remote Sensing, 46: 547558.
Besag, J.E. (1974). Spatial interaction and the statistical Fotheringham, A.S., Brunsdon, C. and Charlton, M.
analysis of lattice systems. Journal of the Royal (2000). Quantitative Geography: Perspectives on
Statistical Society, B, 36: 192225. Spatial Data Analysis. London: SAGE.
Besag, J.E. (1975). Statistical analysis of non-lattice Fotheringham. A.S. and Charlton, M. (1994). GIS and
data. The Statistician, 24: 179195. exploratory spatial data analysis: an overview of
Besag, J.E. (1978). Some methods of statistical some research issues. Geographical Systems, 1:
analysis for spatial data. Bulletin of the International 315327.
Statistical Institute, 47: 7792. Gelman, A. and Price, P.N. (1999). All maps of
Brindley, P., Wise, S.M., Maheswaran, R. and Haining, parameter estimates are misleading. Statistics in
R.P. (2005) The effect of alternative representations Medicine, 18: 32213234.
of population location on the areal interpolation of Getis, A. and Ord, J.K. (1996). Local spatial statistics:
air pollution exposure. Computers, Environment and an overview. In: Longley, P. and Batty, M., (eds),
Urban Systems, 29: 455469. Spatial Analysis: Modelling in a GIS environment, pp.
Cerioli, A. (1997). Modied tests of independence in 261277. Cambridge: Geoinformation International.
2 2 tables with spatial data. Biometrics, 53: Goodchild, M.F. (1989). Modelling error in objects
619628. and elds. In: Goodchild, M. and Gopal, S.,
Cliff, A.D. and Ord, J.K. (1973). Spatial Autocorrelation. (eds), Accuracy of Spatial Databases, pp. 107113.
London: Pion. London: Taylor & Francis.
Clifford, P. and Richardson, S. (1985). Testing the Guptill, S.C. and Morrison, J.L. (1995). Elements of
association between two spatial processes. Statistics Spatial Data Quality. Oxford: Elsevier Science.
and Decisions, Suppl. No. 2: 155160.
Haining, R.P. (1978). The moving average model for
Cockings, S., Fisher, P.F. and Langford, M. (1997). spatial interaction. Transactions of the Institute for
Parametrization and visualization of the errors British Geographers, NS3: 202225.
in areal interpolation. Geographical Analysis, 29:
Haining, R.P. (1988). Estimating spatial means with
314328.
an application to remotely sensed data. Commu-
Cressie, N. (1984). Towards resistant geostatistics. nications in Statistics, Theory and Methods, 17:
In: Verly, G., David, M., Journel, A.G. and 573597.
THE SPECIAL NATURE OF SPATIAL DATA 23
Haining, R.P. (1990). Spatial Data Analysis in the Social Longley, P.A., Goodchild, M.F., Maguire, D.J. and
and Environmental Sciences. Cambridge: Cambridge Rhind, D.W. (2001). Geographical Information
University Press. Systems and Science. Chichester: Wiley.
Haining, R.P. (1991). Estimation with heteroscedastic Martin, D.J. (1998) Optimizing Census Geography: the
and correlated errors: a spatial analysis of separation of collection and output geographies.
intra-urban mortality data. Papers in Regional International Journal of Geographical Information
Science, 70: 223241. Science, 12: 673685.
Haining, R.P. (2003) Spatial Data Analysis: Theory and Martin, D.J. (1999). Spatial representation: the
Practice. Cambridge: Cambridge University Press. social scientists perspective. In: Longley, P.A.,
Haining, R.P. and Arbia, G. (1993). Error propaga- Goodchild, M.F., Maguire, D.J. and Rhind, D.W.,
tion through map operations. Technometrics, 35: (eds), Geographical Information Systems: Volume 1.
293305. Principles and Technical Issues, 2nd edition.
pp. 7189. New York: Wiley.
Haining, R.P., Wise, S.M. and Ma, J. (1998). Exploratory
Spatial Data Analysis in a geographic information Mollie, A. (1996). Bayesian mapping of disease. Markov
system environment. The Statistician, 47: 457469. Chain Monte Carlo in Practice: Interdisciplinary
Statistics, pp. 359379. London: Chapman & Hall.
Isaaks, E.H. and Srivastava, R.M. (1989). An Intro-
duction to Applied Geostatistics. Oxford: Oxford Monmonier, M.S. (1989). Geographic brushing:
University Press. exhancing exploratory analysis of the scatterplot
matrix. Geographical Analysis, 21: 8184.
King, G. (1997). A Solution to the Ecological Inference
Problem. Princeton, New Jersey: Princeton University Richardson, S. (1992). Statistical methods for
Press. geographical correlation studies. In: Elliot, P.,
Cuzich, J., English, D. and Stern, R., (eds),
Kulldorff, M. (1998) Statistical methods for spatial Geographical and Environmental Epidemiology:
epidemiology: tests for randomness. In: Gatrell, A. Methods for Small Area Studies, pp. 181204.
and Lytnen, M., (eds) GIS and Health, pp. 4962. Oxford: Oxford University Press.
London: Taylor & Francis.
Ripley, B.D. (1981). Spatial Statistics. New York: Wiley.
Law, J. and Haining, R.P. (2004) A Bayesian approach
to modelling binary data: the case of high intensity Unwin, D.J. and Wrigley, N. (1987). Towards a general
crime areas. Geographical Analysis, 36: 197216. theory of control point distribution effects in trend
surface models. Computers and Geosciences, 13:
Law, J., Haining R., Maheswaran, R. and Pearson, T. 351355.
(2006) Analysing the relationship between smoking
and coronary heart disease at the small area level. Whittle, P. (1954) On stationary processes in the plane.
Geographical Analysis, 38: 140159. Biometrika, 41: 434449.
3
The Role of GIS
David Martin
software systems and users which comprise 1998; Heywood et al., 2006; DeMers, 2002a;
the GIS community. Nevertheless, GIS has Longley et al., 2001) and it is not the purpose
contributed to the development of spatial of this chapter to cover again the basic
analytical methods more indirectly through a principles of GIS. It is, however, necessary
huge growth in the data resources, structures to offer working definitions of GIS and
and basic tools available. It is worth noting spatial analysis so that their relationship can
here that sometimes in the relevant literature be effectively reviewed. What has probably
it is not entirely clear whether authors are become the classic GIS definition is restated
referring to GIS in the narrower sense of geo- by Goodchild (2000), for example, as a
graphical information systems or the broader system for creating, storing, manipulating,
field of geographical information science visualizing and analyzing geographical infor-
(Goodchild, 1992). GIScience incorporates mation. Although slightly different terms
both GISystems and spatial analysis, and are used, the concept of GIS as a toolbox
the discussion in this chapter focuses on the containing these core functions has become
relationship between these two components. nearly universal. Whereas specialist database
The remainder of this chapter seeks to or visualization software may exist in isola-
explore the complex and much-contested tion, the combination of these elements in an
relationship between GIS and spatial anal- integrated software environment is generally
ysis. Section 3.2 considers the definitions considered necessary in order to justify the
of each and reviews the extent to which claim that a software tool is actually a GIS.
they have become integrated. We then turn, Fotheringham and Rogerson (1993) spec-
in sections 3.3 and 3.4, to examine some ify that spatial analysis is not just aspatial
different models whereby spatial analysis analysis applied to spatial data: it is inherent
and GIS software tools have been connected in the analytical procedures with which
and consider a selection of more detailed we are concerned here that they aim to
examples. The principal barriers and opportu- reveal and characterize explicitly spatial
nities for closer integration between GIS and patterns and processes. More subtly, there is
spatial analysis are presented in section 3.5 something of a distinction between spatial
and the chapter concludes by attempting to data manipulation and analysis, although
assess the likely convergence or divergence the exact dividing line is dependent on the
between these families of spatial processing commentators view of spatial analysis itself.
techniques in future. By its nature, this Techniques for spatial data manipulation,
chapter inevitably touches on many areas that perhaps most extensively developed in the
are discussed in more detail elsewhere in this language of cartographic modelling (Berry,
volume, but the focus here is to explore the 1987; DeMers, 2002b), offer an extensive
interaction between GIS and spatial analysis, suite of functions for reclassification, overlay,
and more specifically the contributory role mathematical, distance and neighbourhood
of GIS. operations on map layers which can be
assembled into sophisticated scenarios, of
which perhaps the most frequently cited
example is site suitability analysis. Although
3.2. GIS AND SPATIAL ANALYSIS: it is possible for the spatial data manipulation
MADE FOR EACH OTHER? tools within a GIS to be assembled in such a
way as to carry out spatial analysis tasks, they
There are very many GIS textbooks available are generally not considered to constitute
(for example Burrough and McDonnell, spatial analysis per se. There is thus a sense
THE ROLE OF GIS 27
in which spatial analysis requires spatial data indicating that geocomputational approaches
manipulation, but manipulation is not in itself may serve to fill gaps in the spatial analysis
analysis. toolkit rather than represent an entirely new
A distinction can be found between those development. In the following discussion, we
who adopt a relatively narrow definition of adopt a broad definition, which encompasses
spatial analysis as the extension of statistical a wide range of specialist GIS functions
analysis into the spatial domain, such as whose purposes are primarily analytical
Bailey (1988) and those who would offer a rather than operational. This approach is
much broader view, including visualization, helpful in understanding the extent to
cartographic modelling and computationally which GIS has contributed to the contextual
intensive geographical analyses. Bailey and environment of a wide variety of spatial
Gatrell (1995) choose to distinguish between thinking and analysis tasks, but has had rather
spatial analysis and spatial data analysis, less obvious impact on the generation of
the latter describing the situation in which tools for narrowly defined statistical spatial
methods are applied to the description and analysis.
explanation of processes operating in space The early development of GIS and spatial
through the use of observational data within analysis techniques were rather separate,
a conventional statistical framework. This with GIS growing from extensive inventory
narrower definition has strong roots in applications such as the Canada GIS (CGIS)
quantitative geography (see Fotheringham (Tomlinson et al., 1976) concerned with the
et al., 2000), but tends to marginalize practical management of natural resources,
specialized analytical operations within GIS while most spatial analytical techniques can
such as hydrological modelling using grid- trace their roots to the quantitative revolution
based functions or network-based modelling in academic geography of the 1960s and
for route-finding applications. These tools do 1970s. Typically, spatial analytical methods
not contribute to the more narrowly defined were developed in the context of limited
statistical spatial analysis but nevertheless spatial data and software tools, frequently
make an important contribution to analytical being programmed in isolation by the
GIS use. researcher to take advantage of the research
A further area of development is that potential of specific datasets. Widespread
which has been termed geocomputation adoption of such methods was impossible
(Longley et al., 1998), in which highly due to the absence of suitably structured
computationally-intensive techniques have data and widely available software tools.
been applied to categories of spatial analyt- The need for a large body of transferable
ical problems which simply could not have and well-structured spatial data, for example
been tackled by conventional means. The incorporating the topological relationships
critical reader may find few fundamentals to required for many types of analysis involving
distinguish geocomputation from a broadly- adjacency or contiguity, was a precondition
defined spatial analysis, except for the use for any broad adoption of spatial analysis
of new data types and computing environ- methods and it is in this development of
ments. This work is also characterized by spatial data infrastructure that GIS can be
a concentration on some of the areas in seen to have played a critical role. GIS pro-
which traditional analytical methods have vides the essential tools for manipulation and
been weak: particularly the application of pre-processing of spatial data that are likely
high-powered computing to the study of to be required by the spatial analyst. There
spatio-temporal dynamics, perhaps again is thus great attraction to the prospect of
28 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
GIS
Spatial analysis
the real applicability of spatial analysis, implementations whose objectives are facil-
citing the prevalence of simple distance ity management and inventory applications.
and the absence of direction from most The ONS example is a useful one to
spatial analytical work, despite its relevance illustrate the GIS-spatial analysis relationship
to practical decision-making. This view because it subsequently evolved to become
that the contribution of GIS to spatial the basis of a spatial referencing system for
analysis is strongly tied to its provision census outputs that provide a rich source
of the underlying representational models of socio-economic data for spatial analysis.
is consistent with Goodchilds (1987) Importantly, the contribution of the GIS
suggestion that an ideal GIS would be application was not in the provision of
one which incorporated a data model analysis tools per se but as the means
finely tuned to the needs of spatial analysis. of contributing to the wider spatial data
At that relatively early stage in GIS take-up, infrastructure, including user awareness and
he was able to observe that no contemporary debate. In many ways this is a microcosm
commercial product met the ideal and that of the historical role of GIS in spatial
there would be little economic incentive analysis.
for the development of a GIS incorporating Couclelis (1998) makes some observa-
such a spatial analytical model, while tions about the contrast between GIS and
applications rather than abstract concepts geocomputation which are also illustrative
are the drivers of proprietary software of the GIS-spatial analysis relationship.
development. GIS has been characterized by large scale,
Very many GIS users are not actually high-visibility practical applications, result-
concerned with statistical spatial analysis, but ing in great commercial and organizational
have entirely valid requirements involving interest, combined with the intuitive and
the management, query and reporting of visual appeal of map-based manipulation
spatially referenced data. For example, the by computer. Geocomputation, and spatial
UK census agency, the Office for National analysis more generally, does not enjoy these
Statistics (ONS), implemented a GIS for advantages: the more sophisticated analytical
the design of the 2001 census of pop- methods are often lacking in immediate or
ulation, starting with a prototype system obvious commercial application, are often
in the mid-1990s (Martin, 1999b). The hard to visualize and are far from intuitive to
initial objective was the simple replacement novice users. We can conclude that, although
of the existing labour-intensive process of related, GIS software is not the principal
creating maps for the guidance of census driver of spatial analytical tool development.
enumerators. A significant multi-user GIS Almost always, advanced spatial analysis
involving sophisticated data management of methods are developed separately from
multiple data sources, including a national GIS, but in an environment in which data
address-level database, was established with availability, especially in standard formats,
no spatial analytical ambitions, the primary is due to the wider adoption of GIS.
objective being to deliver printed maps Widespread use of GIS has brought about
and address listings for 175,000 census spatial data infrastructures and exchange
enumeration districts. Although aspects of mechanisms that make possible the practical
this system could clearly have been devel- implementation of spatial analyses that would
oped with spatial analytical purposes in otherwise have been quite intractable. GIS
mind, it shared its principal objectives with have thus come to provide the environment
perhaps the majority of commercial GIS rather than the tools for innovative spatial
30 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
analysis, with explicit software connections analysis tasks, it is hard to identify strong
between the two coming much later, if at all. advantages to this approach. Indeed the
isolation from the data layers available in
GIS and the obvious risk of reinventing the
wheel in the authoring of such tools serve
3.3. CLOSE COUPLED, LOOSE to make such a strategy unattractive. At the
COUPLED, UNCOUPLED? opposite extreme, where spatial analysis
functions are fully integrated within GIS
Ungerer and Goodchild (2002) provide a tab- software, there is a risk of promoting nave
ular representation of strategies for coupling or inappropriate use of complex techniques
GIS and spatial analysis, which is itself based due to a lack of specialist insight in spatial
on a classification by Goodchild et al. (1992). analysis. Openshaw (1996) identifies one
The coupling strategies are further illustrated element of this in what he terms the user
in Figure 3.2 and range from isolated, through modifiable areal unit problem in which
loose and close coupled to integrated: only in the well-recognized modifiable areal unit
the case of full integration are spatial analysis problem (Openshaw, 1984) is compounded
functions actually performed within the GIS by the availability of software that allows
software itself. The extent of integration users extensive opportunities for creating
possible will to some extent be determined their own spatial aggregation schemes with-
by the software architecture of specific GIS out any necessary understanding of the
employed. While it is clearly possible to impacts on spatial analysis of the resulting
write stand-alone software to perform spatial data. Of the intermediate positions, loose
External
data
Loose Spatial
data GIS
coupled analysis
GIS
Close Spatial
coupled analysis data
Spatial GIS
Integrated
analysis
data
Figure 3.2 Models of relationship between spatial analysis and GIS software (after Ungerer
and Goodchild, 2002 and Goodchild et al. 1992).
THE ROLE OF GIS 31
coupling generally involves file import and with associated attributes, generic input
export at each analysis stage but little new and output file formats are used and the
programming, whereas close coupling seeks software operates independently of any
to overcome this necessity by investing in GIS. An editing tool has been developed in
programming that more smoothly moves data Microsoft Visual Basic (VB) which provides
between the two software applications, for a user interface to the developers Fortran
example by developing software routines that regression program and produces outputs
directly access the GIS database as shown in which are intended for further analysis in
Figure 3.2. other software, including GIS. It is in the
very nature of geographically weighted
regression that the results are themselves
spatial data, comprising parameter estimates
3.4. CASE STUDIES and other statistics relating either to every
sample location or every point on a regular
In this section we briefly review a range spatial grid. Interpretation of these results
of case studies in which spatial analysis requires cartographic visualization, but it is
software is more or less closely coupled expected that the user will undertake this
to GIS. Examples are provided of each using other software, for which purpose two
of the situations illustrated in Figure 3.2. GIS output file formats are offered. Code is
Some of these examples will be encountered also available for running GWR within the
elsewhere in this book, but the objective in statistical package R, although this provides
considering them here is not to provide an no direct data management or mapping
overview of the analysis methods, but to functions.
review the role of GIS in the implementation A second example is GeoDa (http://
of these spatial analysis tools. www.csiss.org/clearinghouse/GeoDa/) which
incorporates limited data manipulation, but
has a range of spatial analysis functions
and visualization tools and works with the
3.4.1. Isolated
less sophisticated GIS data structures such
Some isolated spatial analysis tools have as Shapefiles. Anselin (1999) explains how
very specific and limited applications while this type of exploratory spatial data analysis
others are well-developed spatial analysis can bridge the gap between cartographic
toolkits. These programs rarely justify the visualization and statistical analysis. GeoDa
term of GIS in their own right, as one or is a tool for exploratory spatial data analysis
more of the basic GIS operations (often (ESDA), and allows the user to work with
in the data creation, storage and manipula- linked plots and interactive visualizations,
tion domains) are entirely missing or very a distinctive characteristic of ESDA tools
elementary. (Brunsdon and Charlton, 1996). The spatial
GWR, the software produced by analysis methods present in GeoDa focus
Fotheringham et al. (2002) for geogra- on measures of spatial association, partic-
phically weighted regression (http://ncg. ularly the calculation of local indicators
nuim.ie/GWR), serves as an example of an and weights. Spatial data manipulation func-
explicitly spatial analysis method which has tions are limited, but do allow for point
been implemented entirely separately from and polygon data through tools for the
GIS software. In this case, although the input creation of centroids and Thiessen polygons.
data are conventional spatial coordinates The software can thus be used to provide
32 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
additional spatial analysis functions to the as a stand-alone tool but to supply a spatial
GIS user through file export, or to provide analysis function to the GIS user that is not
stand-alone analysis of suitably structured otherwise available within the GIS software
point or polygon data (Anselin, 2005). environment. In this sense it provides addi-
Accession (http://www.accessiongis.com/) tional external functionality to the GIS user,
provides another interesting example, who must manually export and transfer the
whereby a software tool has been necessary data.
produced specifically for the calculation The history of AZM demonstrates some-
of geographical accessibility. Higgs (2005) thing of the separate origins of GIS and
provides an extensive review of health spatial analysis tools noted above. Openshaw
accessibility modelling in GIS, but notes (1977) describes an automated zoning proce-
that attempts to incorporate public transport dure (AZP) initially developed to run on an
accessibility are underdeveloped. This tool exemplar dataset comprising a limited set of
has been designed to undertake precisely regular cells, which could be aggregated into
that task, and thereby illustrates an approach clusters according to a variety of objective
to the concerns of Miller and Wentz (2003) functions. Although the method was of
by combining conventional spatial network demonstrable practical utility, the absence
analysis with the very unconventional spaces of widely available topologically structured
of public transport timetables. The software census or administrative area boundaries
offers a wider range of conventional GIS and the small problem size that could
functions than GWR or GeoDa but is still be handled by available computing power
not a fully developed general-purpose GIS, meant that the method was hardly applied
its unique functionality being the spatial until Openshaw and Rao (1995) returned
analysis of accessibility using a combination to the problem, using 1991 census data
of timetable and network data. and mid-1990s computing to demonstrate its
practical large-scale application. Effectively,
the practical application of the method had
to wait until GIS development had fostered
3.4.2. Loose coupled
the general availability of the necessary
AZM (http://www2.geog.soton.ac.uk/users/ data in a suitable topological structure.
martindj/davehome/software.htm) is a loose- AZM is based around Openshaws AZP and
coupled tool because it does not undertake is closely related to the system used to
any data management or display itself, but create output areas for the 2001 census
requires data import and export from a GIS. of population in England and Wales, itself
In this case, the software is intended for a loose-coupled configuration with zone
automated zone design and best-matching of design software processing topologically
incompatible zonal systems (Martin, 2003a) structured data exported from an ArcInfo GIS
and is dependent on an external GIS to application.
provide the topological data structure which
is a central requirement of zone design.
More recently, the software has been re-
3.4.3. Close coupled
engineered, again to take direct advantage of
widely-used Shapefiles, with the additional SAGE (Spatial Analysis in a GIS Environ-
topological structuring being undertaken ment) is another example of a system devel-
within the software. This is an interesting oped as a spatial analysis toolkit (Haining
example because its purpose is not to be used et al., 2001) but this time calling software
THE ROLE OF GIS 33
routines within the ArcInfo GIS. Although ArcInfo and ArcGIS software have moved
SAGE consisted of external custom-written to different operating systems and hardware
code, data were held within the GIS, whose architectures, and eventually the adoption
functionality was also called for specific of different scripting languages, making
data manipulation functions and cartographic SAGE unusable with more modern versions.
visualization. The software was developed External, non-commercial tools such as
specifically to overcome perceived analytical SAGE cannot realistically hope to track
shortcomings in the GIS, yet with a desire the relatively rapid software redesign cycle
not to reinvent those important functions of leading GIS software. The analytical
which were already well provided for. functions embedded in SAGE were not
Specifically, SAGE attempted to enhance the absorbed into the GIS software, so there
GIS functionality in the areas of visual- has actually been a decrease in the range
ization and statistical techniques. Although of tools available to the spatial analyst.
cartographic visualization is one of the Isolated and loose-coupled tools, relying only
central functional elements of GIS, scientific on generic spatial data transfer formats,
visualization, particularly that involving real- will probably survive several GIS software
time interaction with datasets, is generally versions without the need for significant
absent from GIS software. SAGE incor- reprogramming. Similarly, fully integrated
porated exploratory analyses through the tools have the potential to evolve with the
use of linked windows common to many GIS itself if they are actually adopted as part
ESDA applications. The specific motivation of the core product. Close-coupling however
for creation of SAGE was the analysis is perhaps the most problematic software
of health events. Haining et al. (2001) architecture, carrying a high risk of being left
explain the rationale for creating a spatial behind by developments in the GIS and the
analysis software suite integrated with a greatest maintenance burden for the spatial
proprietary GIS, citing the inconvenience analysis programmer if they are to ensure the
of having to transfer data between two continued utility of their tool.
software tools, but also the unnecessary Ungerer and Goodchild (2002) describe
duplication of effort when external tools a close-coupled component object model
need to provide their own basic mapping (COM) approach to linking GIS and spatial
and spatial manipulation functions which are analysis software. Their tool is an extension
already well-provided for by GIS. At the written for ArcInfo which undertakes spatial
core of the spatial analysis tool were two interpolation by creating an instance of a
separate programs, one providing the spatial statistics package, using it to run an analysis
analysis and the other a linkage tool, both on the GIS data and then placing the results
running in client/server mode with the GIS. within the GIS. This is just one step short
SAGE provided a range of classification of writing spatial analysis functions that are
and regionalization functions in addition to fully integrated with the host GIS. Their
spatial statistical analyses. implementation uses Microsoft Visual Basic
The fate of systems such as SAGE for Applications (VBA) which has become
is typical of many such attempts in that common as a macro language across multiple
although a great deal was achieved, the software packages, overcoming some of the
lack of true integration between the two restrictions of software-specific macro pro-
software systems and the academically driven gramming languages found, for example, in
motivation for the analysis program resulted earlier GIS versions. Clearly a programming
in divergence. Subsequent releases of the language of this type could be used to
34 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
develop entirely integrated spatial analysis assembled using AML were programmed in
tools but this example demonstrates its power C and called from within the AML routines
as a means for finding a common lan- so that the resulting analysis functions were
guage for close-coupling GIS with external presented to the user as additional commands
statistical software. within the GIS. Embedding of this type is
generally robust against incremental updating
of GIS software but becomes obsolete when
major changes to software architecture take
3.4.4. Integrated
place, affecting the spatial database and
In addition to those analytical functions macro programming language on which it
which are actually included as part of is based.
the core software, examples of customized Evans and Steadman (2003) describe a
spatial analysis operations fully integrated more modern application, interfacing a land
within GIS may be found at all periods use transport model known as TRANUS with
in GIS development. These are generally a desktop GIS. The objectives are to quickly
the result of spatial analysts being able to visualize the results of the transport model
directly access macro programming func- and to provide a means of exporting data
tions. Early instances involved languages for further analysis in additional software
such as ArcInfos Arc Macro Language environments. The TRANUS GIS module
(AML), while more recent examples are has been built using ArcObjects technol-
likely to use Microsoft VBA, perhaps inter- ogy from ESRIs ArcGIS which effectively
facing directly with components of the GIS allows Microsoft VB to be used to customize
software such as ArcObjects. interfaces and develop further software.
Ding and Fotheringham (1991) describe Automated procedures handle the transfer
an application called STACAS (SpaTial of results between the transport modelling
AutoCorrelation and ASsociation analysis) and GIS tools. In this instance visualization
that was completely embedded within the in the GIS is not the final objective, with
GIS software, being assembled from ArcInfo model outputs being passed on from the GIS
functions and custom-written programs. As to other external analysis tools. Effectively
with GeoDa described above, analysis of the GIS provides the visualization and post-
spatial association requires knowledge of the processing of specialized model results. The
spatial relationships between GIS objects, for GIS environment is additionally relevant as
example the adjacencies between polygons the context for the creation and manipulation
and distances between points or polygon of many of the data layers that contribute
centroids. It is also necessary to link attribute to the original transport modelling. Interest-
values with these locations and of value to ingly, the authors note that a question mark
display the resulting measures of association hangs over the demand for such integrated or
in cartographic form. For all of these closely coupled solutions.
reasons there is a considerable attraction to
embedding the analytical functions within
the GIS environment where the spatial rela-
tionships and support functions are already 3.5. BARRIERS AND
available. Ding and Fotheringhams solution OPPORTUNITIES
was to construct their analysis routines using
ArcInfos own macro programming language, Brown (2000) argues strongly that after
AML. Calculations that could not be readily so many years of discussion, not enough
THE ROLE OF GIS 35
progress has been made towards the genuine spatial statistical functions, there is every
integration of spatial analysis and GIS, possibility that they find increasing use in
especially when considered from the per- the presentation of results and visualizations
spective of the substantive researcher who from complex analyses run externally, for
has practical analysis requirements but is example of climate change, environmental
not able to engage in the development of sensitivity or neighbourhood property
software tools. He notes that the growth prices. The increasing pool of low-level
of GIS has been propelled by the spread of users remains at the same time one of the
less sophisticated GIS (such as ArcView) that greatest opportunities for spatial analytical
are less readily turned to spatial analytical development, yet a barrier to the emergence
applications. The result is that while there of a well-skilled user base.
is widespread use of GIS, this is often Goodchild (2000) sees four tensions
nave or at least goes little further than in the popularization of spatial analy-
cartographic visualization. It follows from sis through incorporation of tools within
this reasoning that it is the spatial analytical GIS software: (a) populism and elitism,
tools embedded within the simplest GIS (b) visual and numeric, (c) open and closed,
software, not the most sophisticated, that and (d) local and global. The first of
will actually determine the future uptake these, populism and elitism, is very much
and development of spatial analysis methods. concerned with the difficulty noted above:
Given the enormous contextual influence of although GIS use is becoming massively
GIS on the practical use of spatial analysis, more widespread, this does not directly
prevalent standards of GIS training can be increase the ability of users to appropriately
seen to have a significant impact on the engage with sophisticated spatial analysis
overall level of spatial analytical methods methods. There is in reality no organization
demanded and employed. with the authority to either restrict or
Public awareness of spatial data continues educate GIS users in this respect, so the
to increase massively through the popularity spatial analysis community must address
of web-based mapping tools, of which itself to the challenge of awareness-raising
Multimap (http://www.multimap.com/), the among an ever-multiplying community of
Neighbourhood Statistics Service (http:// low-level GIS users. The incorporation of
www.neighbourhood.statistics.gov.uk/), Win- visualization functions in spatial analysis
dows Live Local (http://local.live.com/) and tools, for example in GeoDa described
Google Earth (http://earth.google.com/) above, goes some way towards the enhanced
provide just a few examples. These communication of spatial analysis concepts
developments bring spatial data and to more advanced GIS users who might
concepts onto the desktops of millions who otherwise be unlikely to engage with purely
will remain unaware that there has even been statistical aspects. An increasing tendency
a debate about the role of GIS in spatial towards open-source software development
analysis. Such tools embody various simple may eventually assist in exposing underlying
GIS analysis functions such as route-finding algorithms but it is inevitably the case that
(Multimap, Windows Live Local), tagging only a small proportion of users will concern
and grouping of spatial objects (Google themselves with such a level of technical
Earth) and interactive choropleth mapping detail. The fourth tension between local and
(Neighbourhood Statistics). While it seems global analysis represents a continuum, with
unlikely that these populist tools will a need for analytical techniques appropriate
develop a need for much more sophisticated for each scale of analysis.
36 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
stand-alone spatial analysis software have a Anselin, L. (2005). Exploring Spatial Data with GeoDa:
long way to go. Again, it is the need for much A Workbook. Urbana-Champaign: University of
more sophisticated handling of space and Illinois.
time and the incorporation of different types Bailey, T.C. (1998). Review of statistical spatial analysis
of spatial computation that are the underlying in GIS. In: Fotheringham, A.S. and Rogerson, P. (eds),
Spatial Analysis and GIS, pp. 1345. Philadelphia:
themes.
Taylor and Francis.
In the preceding sections we have
reviewed various examples of the relation- Bailey, T.C. and Gatrell, A.C. (1995). Interactive Spatial
Data Analysis. Harlow: Longman.
ships between GIS and spatial analysis
which, despite differences of detail, display Batty, J.M. (2003). Agent-based pedestrian modelling.
remarkably little change over the last two In: Longley, P.A. and Batty J.M. (eds), Advanced
Spatial Analysis: The CASA Book of GIS. pp. 81106.
decades. It seems improbable that GIS Redlands, CA: ESRI Press.
software intended for an increasingly wide
Berry, J.K. (1987). Fundamental operations in computer-
user base will ever incorporate a high level
assisted map analysis. International Journal of
of spatial analytical functionality as the use of Geographical Information Systems, 1(2): 119136.
complex and advanced methods will never be
Brown, L.A. (2000). The GIS/SA interface for substantive
a concern of the ordinary GIS user. Although
research(ers): a critical need. Geographical Systems,
the absolute levels of spatial analytical 2: 4347.
functionality in GIS continues to increase, the
Brunsdon, C. and Charlton, M. (1996). Developing an
gap between populist software and research- exploratory spatial analysis system in XLisp-Stat. In:
oriented analytical tools cannot be closed in Parker, D. (ed.), Innovatons in GIS 3 pp. 135146.
relative terms. More realistically, a ground- London: Taylor and Francis.
swell of open software standards and, poten- Burrough, P.A. and McDonnell, R.A. (1998). Principles
tially, grid-based computing applications may of Geographical Information Systems. Oxford: Oxford
make practical communication between GIS University Press.
software and the more sophisticated analysis Couclelis, H. (1998). Geocomputation in context. In:
tools much easier. There is thus no greater Longley, P.A., Brooks, S.M., McDonnell, R.A. and
prospect of true convergence between GIS Macmillan, B. (eds), Geocomputation: A Primer
and spatial analysis than at any previous time, pp. 1730. Chichester: Wiley.
yet the two fields will continue to grow and DeMers, M.N. (2002a). Fundamentals of Geographic
feed off one another. What we still need are Information Systems, Second Edition Update.
more realistic expectations of what drives New York: Wiley.
the design of commercial software and a DeMers, M.N. (2002b). GIS Modelling in Raster.
concerted effort on more sustainable ways of New York: Wiley.
embedding spatial analytical tools within the Ding, Y. and Fotheringham, S. (1991). The Integration
broader GIS landscape. of Spatial Analysis and GIS: the Development of
the STACAS Module for ArcInfo. Technical Paper
915, National Center for Geographic Information
and Analysis, Buffalo, NY: NCGIA.
Evans, S. and Steadman, P.J. (2003). Interfacing
REFERENCES
land-use transport models with GIS: the Inverness
model. In: Longley, P.A. and Batty, J.M. (eds),
Anselin, L. (1999). Interactive techniques and
Advanced Spatial Analysis: The CASA Book of GIS,
exploratory spatial data analysis. In: Longley, P.,
pp. 289308. Redlands, CA: ESRI Press.
Goodchild, M., Maguire, D. and Rhind, D. (eds),
Geographical Information Systems: Principles, Fotheringham, A.S., Brundson, C. and Charlton, M.
Techniques, Applications and Management, Second (2000). Quantitative Geography: Perspectives on
Edition, pp. 253266. Chichester: Wiley. Spatial Data Analysis. London: Sage.
38 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
Fotheringham, A.S., Brunsdon, C. and Charlton, M. Longley, P.A. and Batty J.M. (eds) (2003b). Advanced
(2002). Geographically Weighted Regression. Spatial Analysis: The CASA Book of GIS. Redlands,
Chichester: Wiley. CA: ESRI Press.
Fotheringham, A.S. and Rogerson, P.A. (1993). GIS and Longley, P.A., Brooks, S.M., McDonnell, R.A. and
spatial analytical problems. International Journal of Macmillan, B. (eds) (1998). Geocomputation:
Geographical Information Systems, 7(1): 319. A Primer. Chichester: Wiley.
Fotheringham, A.S. and Wegener, M. (2000). Spatial Longley, P.A., Goodchild, M.F., Maguire, D.J. and
Models and GIS: New Potential and New Models. Rhind, D.W. (2001). Geographic Information Systems
London: Taylor and Francis. and Science. Chichester: Wiley.
Goodchild, M.F. (1987). A spatial analytical perspective Marble, D. (2000). Some thoughts on the integration of
on geographical information systems. International spatial analysis and Geographic Information Systems.
Journal of Geographical Information Systems, Geographical Systems, 2: 3135.
1(4): 32734.
Martin, D. (1999a). Spatial representation: the social
Goodchild, M.F. (1992). Geographical information scientists perspective. In: Longley, P., Goodchild, M.,
science. International Journal of Geographical Maguire, D. and Rhind, D. (eds). Geographical
Information Systems. 6(1): 3145. Information Systems: Principles, Techniques, Applica-
Goodchild, M.F. (2000). The current status of GIS and tions and Management, Second Edition, pp. 7180.
spatial analysis. Geographical Systems, 2: 510. Chichester: Wiley.
Goodchild, M.F., Haining, R., Wise, S. and 12 others Martin, D. (1999b). The use of GIS in census planning.
(1992). Integrating GIS and spatial data analysis: In: Stillwell, J., Geertman, S. and Openshaw, S.
problems and possibilities. International Journal of (eds), Geographical Information and Planning, Berlin:
Geographical Information Systems, 6(5): 40723. Springer. pp. 283298.
Haining, R., Wise, S. and Ma, J. (2001). Providing Martin, D. (2003a). Extending the automated zoning
spatial statistical data analysis functionality for the procedure to reconcile incompatible zoning systems.
GIS user: the SAGE project. International Journal of International Journal of Geographical Information
Geographical Information Science, 15(3): 239254. Science, 17(2): 181196.
Higgs, G. (2005). A literature review of the use Martin, D. (2003b). Reconstructing social GIS. Transac-
of GIS-based measures of access to health care tions in GIS, 7(3): 305307.
services. Health Services and Outcomes Research Miller, H.Z. and Wentz, E.A. (2003). Representation and
Methodology, 5(2): 11939. spatial analysis in geographic information systems.
Heywood, I., Cornelius, S. and Carver, S. (2006). An Annals of the Association of American Geographers,
Introduction to Geographical Information Systems, 93 (3): 574594.
Third Edition. London: Pearson. Openshaw, S. (1977). A geographical solution to
Langran, G. (1992). Time in Geographic Information scale and aggregation problems in region-building,
Systems. London: Taylor and Francis. partitioning and spatial modelling. Transactions
of the Institute of British Geographers, NS 2(4):
Longley, P.A. and Batty J.M. (1996a). Analysis, 459472.
modelling, forecasting, and GIS technology. In:
Longley, P.A. and Batty J.M. (eds), Spatial Analysis: Openshaw, S. (1984). The Modiable Areal Unit
Modelling in a GIS Environment. pp. 116. Problem. Concepts and Techniques in Modern
Cambridge: GeoInformation International. Geography 38. Norwich: Geo Books.
Longley, P.A. and Batty, J.M. (eds) (1996b). Spa- Openshaw, S. (1996). Developing GIS-relevant zone-
tial Analysis: Modelling in a GIS Environment. based spatial analysis methods. In: Longley, P.A. and
Cambridge: GeoInformation International. Batty, J.M. (eds) (1996). Spatial Analysis: Modelling
in a GIS Environment, pp. 5573. Cambridge:
Longley, P.A. and Batty J.M. (2003a). Advanced
GeoInformation International.
spatial analysis: extending GIS. In: Longley, P.A.
and Batty, J.M. (eds), Advanced Spatial Analysis: Openshaw, S. and Rao, L. (1995). Algorithms for
The CASA Book of GIS. pp. 118. Redlands, reengineering 1991 Census geography. Environment
CA: ESRI Press. and Planning A, 27(3): 425446.
THE ROLE OF GIS 39
Peuquet, D. (2002). Representations of Space and Time. Ungerer, M.J. and Goodchild, M.F. (2002). Integrating
New York: Guilford. spatial data analysis and GIS: a new implemen-
tation using the component object model (COM).
Tomlinson, R.F., Calkins, H.W. and Marble, D.F. (1976).
International Journal of Geographical Information
Computer Handling of Geographical Data. Paris:
Science, 16(1): 4153.
UNESCO Press.
4
Geovisualization and
Geovisual Analytics
Urka Demar
new scientific insight. Visual data exploration patterns, trends and relationships that make
implies generation of new ideas through it easier to quickly perceive the signif-
creation, inspection and interpretation of icant aspects and characteristics of the
visual representations and can be considered data. The main purpose of visualization
a part of Exploratory Data Analysis (EDA) is to provide insight into data, which
(Tukey, 1977). When looking at spatial data, is usually done by displaying them with
we are talking about Exploratory Spatial reduced complexity, while at the same
Data Analysis (ESDA) (Unwin and Unwin, time preserving the interesting structure
1998). Visual exploration is essential as characteristics and minimizing the loss of
the first step of data analysis and serves information. Scientific visualization was
to uncover any indications of what there first defined 20 years ago (McCormick
actually is in the data, to prompt ideas and et al., 1987) as the use of computing
generate hypotheses. It is usually followed technology to create visual displays with
by confirmatory data analysis and as the last the goal to facilitate thinking and problem
step by visual communication where results solving. The term data visualization some-
are presented and disseminated in visual form times stands as a synonym for scientific
(DiBiase, 1990). This last step is the focus of visualization and is usually defined as
traditional cartography, which is beyond the visualization of data that have a natural
scope of this chapter. geometric structure. A more general term
The rest of this chapter is structured as information visualization refers to graphical
follows: the following section introduces representations of any type of data, including
the general visualization terminology, abstract structures, such as trees, networks
describes what role visualization plays in or graphs. Even though borders between
data exploration, presents one of the many these different terms are sometimes blurred,
possible classifications of visualization in all cases the emphasis is on supporting
methods and lists some examples of general knowledge construction from visual displays
(not necessarily spatial) visualization of data (Card et al., 1999; Fayyad et al.,
methods. The rest of the chapter focuses on 2002).
geospatial data, presents the state-of-the-art Knowledge construction from data is
in geovisualization research, lists a brief the process of actively manipulating data
selection of geovisualization software and in order to discover patterns, relationships
shows several examples. Finally, a new or other abstract knowledge representations
emerging discipline of Geovisual Analytics that facilitate the understanding of the
is introduced together with some of the future phenomenon under investigation. All knowl-
challenges in geovisualization research. edge construction is therefore a form of
pattern recognition. The most formidable
pattern recognition apparatus known to the
human race is the human brain, which can
4.1. INFORMATION VISUALIZATION analyze complex events in a short time
AND VISUAL DATA interval, recognize important patterns and
EXPLORATION make decisions much more effectively than
any computer can do. The question is how
Visualization is the graphical (as opposed to enable this formidable apparatus to work
to textual or verbal) presentation of data. in the knowledge construction process. Given
It translates complex data into visual displays that vision is the predominant sense and that
where a human can look for structure, computers have been created to communicate
GEOVISUALIZATION AND GEOVISUAL ANALYTICS 43
with humans visually, computerized data There are many ways to represent data
visualization provides an efficient connection graphically. There are also many ways of
between data and mind to support the grouping these displays according to some
data exploration process (Keim and Ward, orderly fashion, such as for example if
2003). their focus is geometric or symbolic, if the
The main goal of visual data exploration display is static or dynamic, according to
is to get an idea of what the data contain, the amount of structure the visualization
or what the data look like. This process method requires, etc. One of the more
does not provide a complete understanding comprehensive classifications is presented by
of the phenomenon behind the data that Keim and Ward (2003), who construct a
is not the point. Visual data exploration is three-dimensional space of visualizations by
intended to provide ideas about the general classifying the methods according to three
characteristics of the data which are to serve orthogonal criteria: the data type, the type of
as a basis for new hypotheses. These can the visualization method and the interaction
then be further tested using confirmatory method (Figure 4.1). Table 4.1 names some
data analysis methods (for example, statistics examples of each type of visualization
or other mathematical methods). The obser- methods according to Keim and Wards
vations can also be used to choose an classification to give the reader some
appropriate method for further scientific in- idea what kind of methods we are talking
depth analysis (Keim and Ward, 2003). about. A more comprehensive coverage of
Visual data exploration is usually per- information visualization techniques can be
formed in three steps according to the Visual found for example in Card et al., (1999),
Information Seeking Mantra (Shneiderman, Ware (2000) or other recent books on
1996): overview first, zoom and filter, then information visualization.
details-on-demand. One of the fundamental
concepts in this process is interaction.
The user can typically interact with the
visualization in a number of different ways, 4.2. GEOVISUALIZATION AND
such as browsing, selecting, querying and SPATIAL DATA EXPLORATION
manipulating the graphical parameters or
displaying other available information about Geovisualization or visualization of geospa-
the data all with the goal to discover tial data (any data with a given geographic
interesting patterns which are valid, novel, location) is defined as the use of visual
useful and comprehensible. A valid pattern representations in order to employ the vision
is general enough to apply to new data. to solve spatial problems (MacEachren et al.,
Novel means that the pattern is non- 1999). It can be considered as a perceptual-
trivial and unexpected. Usefulness refers cognitive process of interpreting and under-
to the property that the pattern can be standing georeferenced visual displays and
used for either decision-making or further provides theory, methods and tools for
scientific investigation. Comprehensibility visual exploration, analysis, synthesis and
means that the pattern is simple enough presentation of geospatial data (MacEachren
to be interpretable by humans, which is and Kraak, 2001). While its roots lie in
important because the analysts trust in cartography and geographic techniques for
the exploration result depends on how representing spatial data, geovisualization
comprehensible it is to him/her (Miller and integrates these traditions with scientific
Han, 2001). and information visualization principles and
44 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
Interaction
lays
disp
projection
ed
ays
filtering
form
ispl
ans
Dd
lays
lays
zoom
ys
r
D/3
pla
disp
t
disp
ally
distortion
l dis
rd 2
tric
sed
sed
hica
nda
me
brushing and linking
-ba
l-ba
sta
rarc
geo
icon
pixe
hie
1-d
ime
2-d nsio
ime nal
mu nsio Visualization
ltidi nal
text m
/hyp ension
hier er te al
arc xt
algo hies
rith and
ms gra
and phs
soft
Data war
e
Figure 4.1 The three-dimensional space of visualizations (redrawn after Keim and Ward
(2003)).
of the computer which results in a faster the entire data set in order to obtain proper
and more effective knowledge discovery. understanding of the underlying phenomenon
In practice, however, how to enable such and form appropriate hypotheses. Visual
synergy is not yet fully understood and the exploration is therefore a complex process
problem of integrating combined and visual which requires training and expertise to be
exploration tools in the best manner is not performed properly (G. Andrienko et al.,
trivial to solve (MacEachren et al., 1999; 2006).
Shneiderman, 2001; MacEachren and Kraak, An important issue to consider when devel-
2001). oping new geovisualization tools is therefore
Visual data exploration of spatial data has how users use these tools and how the tools
several advantages: it is intuitive and does support particular exploration tasks. These
not require understanding of complex math- questions can be answered by investigating
ematical and computational methodology. It the usability properties of the tools. Usability
is also effective when little is known about is defined as the extent to which a computer
the data, when the exploration goals are system supports users to achieve specified
vague or when the data are noisy and/or goals and does so effectively, efficiently, and
heterogeneous (Keim, 2002). On the other in a satisfactory way (Nielsen, 1993). The
hand, during visual exploration the analyst idea behind usability is that information sys-
typically looks at data from various perspec- tems designed with their users psychology
tives, at various scales and combines use and physiology in mind are easier to learn
of multiple techniques and approaches. No and more efficient and satisfying to use.
single visualization is capable of providing The principle of usability originates from
all the required views of the data, from user-centred design in HumanComputer
the general overview to indicating various Interaction (HCI), which is a discipline that
anomalies and patterns. It is therefore often explores the quality of interaction between
necessary for the analyst to simultaneously the users and information systems. One of the
use several techniques for various purposes. basic requirements for developing a usable
Different exploration tasks might also require and useful information system is knowledge
different visualizations. The fundamental about users and how they use the system.
questions to address prior to any exploration This is the basic principle of the user-centred
is what is the current task, what way of design, which is a philosophy where the
thinking does it require and which tools best needs, wants, and limitations of the users
support the task and way of thinking at hand of an information system are given attention
(Gahegan, 2005). Additionally, it is of course at every stage of the design process (Preece
also important to find out which visualization et al., 2002).
methods are available and what type of data Design of exploratory geovisualization
and phenomena they are suitable for. This tools has been technology driven for many
is not the only complexity issue: during the years. Tools and systems were developed
actual exploration, the analyst is required from a purely technical point of view,
to decompose the exploration problem into where knowledge about users did not play
smaller subproblems in a proper and efficient a major role. In recent years, however, the
manner which might be different for each approach has shifted towards user-centred
exploration task. In the last exploration step, design with the aim of providing useful and
the fragmentary knowledge resulting from usable geovisualization tools which support
each of the subproblem explorations needs to analytical reasoning (Fuhrmann et al., 2005).
be merged into a consistent interpretation for While the importance of geovisualization
GEOVISUALIZATION AND GEOVISUAL ANALYTICS 47
tools for exploration of spatial data has they differ from the information visualization
been generally recognized, the issues of systems in several ways. For example,
usability testing for geovisualization are data representation in GIS packages is
not exactly the same as those in human limited to predefined object- (point, line,
computer interaction and how the visual area) or field-based representations, while
tools support human analytical reasoning is information visualization software does not
still not fully explained. Traditional usability usually have this assumption and treats
methods borrowed from humancomputer all data types as equal, regardless if this
interaction therefore need to be adapted makes sense geographically or not. This can
accordingly. The key issue in visual data be beneficial to reveal patterns that would
exploration is the intuitive search process otherwise remain obscured in traditional
in a visualized environment. It is therefore geographic representations. Most of the GIS
necessary to incorporate physiological and also offer only limited support for dynamics,
psychological findings about the process animation, interactivity between a number
of human vision as well as knowledge of different visualizations and any integrated
of the relation between geospatial objects computational methods (although there are
and their representation in the process of some attempts to implement data mining
system engineering (Fuhrmann et al., 2005; methods in the context of GIS, see for
N. Andrienko and G. Andrienko 2006a). example, Lacayo and Skupin, 2007).
The potentials and limitations of information On the other side of the story, there
visualization tools have been explored in exist numerous information visualization
numerous recent experiments focusing on environments that support development of
some aspect of the usability of geovisualiza- visual exploration systems for multivariate
tion tools (for example, N. Andrienko et al., data. Examples of well-known information
2002; Suchan, 2002; Tobn, 2002; Edsall, visualization environments are XGobi, R
2003; Haklay and Tobn, 2003; Slocum et al., and SPSS, but for this chapter, those that
2003; Griffin, 2004; van Elzakker, 2004; focus on spatial data are more relevant.
Ahonen-Rainio, 2005; Koua, 2005; Robinson Three that deserve a description here are
et al., 2005; Tobn, 2005; G. Andrienko et al., GeoVISTA Studio, CommonGIS and GeoDa,
2006; Demar, 2006, 2007a), but much still but this selection is far from exhaustive and
remains to be investigated. new tools and environments are developed
continuously.
GeoVISTA Studio is a java-based
collection of various geographic and other
4.3. GEOGRAPHIC INFORMATION visualizations and computational data
SYSTEMS AND mining methods (MacEachren et al., 1999;
GEOVISUALIZATION SOFTWARE Gahegan et al., 2000; Takatsuka, 2001; Dai
and Hardisty, 2002; Gahegan et al., 2002;
Todays geovisualization is much more than Gahegan and Brodaric, 2002; Takatsuka and
just map design, even though it is firmly Gahegan, 2002; Guo, 2003; MacEachren
rooted in cartographic traditions of map et al., 2003; Edsall, 2003; Guo et al., 2004;
design and display. Most of the contemporary Guo et al., 2005; Robinson et al., 2005).
commercial Geographic Information Systems Its components are implemented as Java
(GIS) provide a set of mapping tools, with Beans, which are self-contained software
appropriate symbology, graphical represen- components that can be easily connected
tation, classification and so on; nevertheless into a customized data exploration system
48 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
(a) (b)
Figure 4.2 An example of (a) a choropleth map of the proportion of residents in Social
Class 1 in the Electoral Divisions (EDs) of the Republic of Ireland and (b) an area cartogram
of the same phenomenon where the areas of EDs are scaled according to the population
size. Dark colour indicates a high proportion and light colour a low proportion of residents
of Social Class 1 (i.e., rich residents) in a particular ED. On the cartogram in (b), the pattern
in Dublin can be clearly seen: the South side has the largest proportion of rich people, and
there are three areas in the north-east, north-west and south-west of the city where the
proportion of the rich is the lowest. This pattern can be barely recognized in the choropleth
map in (a), but the cartogram distortion makes it very eye-catching.
is to reveal patterns that are not apparent in of the same phenomenon (Figure 4.2b). The
the conventional map. Typical examples are figure shows two displays of the spatial
linear cartograms, where the space (usually variation in the proportion of residents in
represented as a spatial network) is distorted Social Class 1 in the Electoral Divisions
according to some distance other than the (EDs) of the Republic of Ireland in 2002.
geometric one, for example travel time. Such Residents in Social Class 1 are the most
cartograms are commonly used to represent affluent. The map on the left (Figure 4.2a)
public transit systems in larger cities is drawn using the Irish National Grid
any subway map or a map of commuter projection in which the polygons are scaled
rail services is typically a linear cartogram. in proportion to their land area. It is difficult
Another principle is to stretch the space to see what spatial variations there are in
continuously according to the distribution of the main urban centres, and the boundaries
values of some attribute, but to preserve the are visually intrusive. The areas in the
general shape and adjacency of polygons to cartogram on the right (Figure 4.2b) have
produce an area cartogram (Tobler, 2004). been redrawn so that their areas are in
Figure 4.2 shows an example of a choropleth proportion to their population this is an area
map (Figure 4.2a) versus the area cartogram cartogram or a density equalized projection.
50 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
The urban centres (starting from Dublin as above the sea level or the depth of the sea
the largest distorted area located on the bottom (Kreuseler, 2000). In some cases, the
east coast, followed by Waterford, Cork, attribute mapped to the z-axis represents time
Limerick, Galway and Sligo in clockwise- and instead of the surfaces, trajectories of
order along the coastline) become dominant movements of objects are projected through
in the display, and we can easily see the display space. This type of geovisual-
the spatial variation in the proportions of ization is very common in time-geography
affluent residents across the country a (Kraak and Koussoulakou, 2004) and in
spatial pattern which was not obvious in the transportation studies (Kwan, 2000, 2004).
traditional choropleth map (Figure 4.2a). The In the third type of the surfaces the z-axis
cartogram in this figure was produced using attribute represents neither a real geographic
the algorithm and software by Gastner and dimension nor time, but some other variable
Newman (2004). of interest, such as the population density, the
Another example of a fairly common temperature, the density of human activity or
geovisualization are 3D displays. These travel (Kwan, 2000), or in geosciences the
project the three locational dimensions onto magnetic variation or the kriging variance
a 2D display using a set of perceptual (Carr, 2002). Figure 4.3 shows a surface
depth cues to reinforce this projection, where the z-axis represents the concentration
such as perspective, occlusion and parallax of radon in the groundwater. The surface
motion (Ware and Plumlee, 2005). Here is covered with two maps of the area,
we present some examples of 3D geovi- one showing the bedrock and another one
sualizations, but only in the context of showing locations of fractures (Demar and
visual knowledge discovery from spatial Skeppstrm, 2005). Visual exploration of
data. The reader can explore other issues, this representation clearly indicates that high
such as the use of 3D georepresentations values of radon in this area (the highest peaks
in Virtual Reality and Virtual Environments, of the surface) occur only on a particular type
elsewhere (two starting points for that would of bedrock, which is shown with medium
be Fisher and Unwin (2002) and Bodum grey shade.
(2005)). Figure 4.4 shows a screenshot of a visual
One of the most common methods of exploratory system built using GeoVISTA
representing multivariate geospatial data in Studio. The system consists of a multiform
three-dimensions for knowledge discovery bivariate matrix, a geoMap and a parallel
are surfaces, which are sometimes also coordinates plot (PCP), which all share the
referred to as 2.5D representations when same colour scheme (except the spaceFills
displayed on the screen, as they are not liter- in the matrix). This principle of colouring
ally three dimensional. A general approach the graphical entities belonging to the same
to produce a surface is to map the two data element with the same colour in
basic geographic dimensions, longitude and all visualizations is called visual brushing.
latitude, to the x and y-axis respectively All visualizations are also connected by
and show the variable of interest on the interactive selection and brushing through
z-axis. Over this surface some other type mouse-over operation interaction, which
of geographic information can be draped to unfortunately cannot be adequately presented
provide texture: a thematic map or a satellite through a simple screenshot image, but is
image. Traditionally the attribute mapped essential for successful data exploration.
to the z-axis represents the third dimension The parallel coordinates plot (PCP) maps
in the real world, such as the elevation the n dimensional space onto the two display
GEOVISUALIZATION AND GEOVISUAL ANALYTICS 51
Figure 4.4 A GeoVISTA-based system displaying a synthetic spatial dataset (Demar, 2006)
based on the famous iris data (Fisher, 1936).
52 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
Figure 4.5 A xed row matrix of bivariate visualizations, again a component from
GeoVISTA Studio and displaying the same data as in the previous gure.
Figure 4.6 Visually discovering relationships between the spatio-temporal attributes from
the SOM component planes visualization. The image was produced using a spatio-temporal
data set of emergency response data (patenkov et al., 2007) and the SOM toolbox for
Matlab.
abstract spaces with the goal to facilitate shows the in-degree of each publication,
data exploration and knowledge construc- i.e., how many other publications cite it.
tion (Fabrikant et al., 2002). Examples of The direction of the arrows is ignored in
spatializations are spatial representations of Figure 4.7(b) and the size of vertices in this
scientific co-authorship networks (Newman, picture represents the betweenness centrality
2004), proteinreceptor interaction networks (from social network analysis (Freeman,
in medicine, genealogies and citation net- 1979)) which measures the importance of
works (Batagelj and Mrvar, 2003). each vertex. Note that vertex representing the
Figure 4.7 shows an example of a spa- paper by MacEachren et al., (1999) has a
tialization: two different visualization of a high in-degree as well as high betweenness
citation network of the GeoVISTA Studio because many other papers cite it, while the
related papers from the reference list of this relatively large betweenness of Guo et al.,
chapter. In Figure 4.7(a), arrows indicate (2005) is a result of the fact that this paper
citations, i.e., the paper that the arrow points cites many other papers (even though its
from cites the paper which the arrow points never cited itself and has a low in-degree
to. The size of the vertices in Figure 4.7(a) compare with Figure 4.7(a).
GEOVISUALIZATION AND GEOVISUAL ANALYTICS 55
Edsall 2003
Takatsuka 2001
Guo 2003
optimal one, such as in, for example, crisis (Tobn, 2005), which implies that the
management. In order to deal efficiently with cognitive processes that must be supported
time pressure and stress in such situations, in geovisualization are different and possibly
Geovisual Analytics tools need to provide more complex than when non-spatial data are
support for a shared collaborative work investigated. Experiments have also shown
during a process where key parameters that there exist significant interpersonal
change quickly, such as, for example, for spa- differences in the way people visually explore
tial decision support in emergencies. Open spatial data, how they interpret what they
issues here range from developing distributed see and what exploration strategies they form
system architectures to intelligent solutions (G. Andrienko et al., 2006; Demar, 2006,
that support fast knowledge capture, rational 2007a). All this suggests that visual data
reasoning and time-critical spatial decision exploration is inherently complex. What can
making. be done to alleviate the complexity? How
A related topic is mobile geovisualization is the ability to use the tools related to
and location-based visual exploration. users background and experience? These are
Present technological advances in mobile just some questions to be considered. In
communications and the ubiquity of various order to resolve them, work on technological
mobile devices (mobile phones, PDAs, advances should be combined with work on
BlackBerries, etc.) are likely to change the human spatial cognition to fully reveal the
way people use information systems and potential of visual representations to support
this includes tools for geovisualization and spatial analytical reasoning, spatial problem
Geovisual Analytics. The emerging location- solving and spatial decision making.
based personalization raises not only
technical questions such as how to perform
on-the-fly location-based computation or ACKNOWLEDGEMENTS
how to display as much information as
possible without losing the clarity on a small The author would like to thank Mark
display of most of todays mobile devices, Gahegan from The University of Auckland
but also conceptual issues, for example the for kindly consenting to read the first
use of individually personalized dynamic draft of this chapter and providing helpful
egocentric maps (Meng, 2004; Meng, comments and suggestions. Thanks goes also
2005) instead of the traditional geocentric to Olga patenkov from Helsinki University
visualizations that remain static for a longer of Technology and Martin Charlton from
period and aim to communicate geographic the National Centre of Geocomputation,
information to a variety of users. National University of Ireland, Maynooth,
Finally, one of the recurrent topics in who prepared the illustrations showing the
visualization research are cognitive and SOM component planes and the cartograms
perceptual questions and evaluation of the respectively. Finally, research presented in
tools. Not only are visualization tools difficult this paper was supported by a grant to
to evaluate objectively, the results of such the National Centre for Geocomputation by
evaluations might not be replicable nor Science Foundation Ireland (03/RP1/1382)
generalizable and are in general difficult and by a Strategic Research Cluster grant
to interpret (Plaisant, 2004). Additionally, (07/SRC1/1168) from Science Foundation
there exist some evidence that there may Ireland under the national Development Plan.
be fundamental differences between infor- The author gratefully acknowledges this
mation visualization and geovisualization support.
58 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
Demar, U. (2007b). Knowledge discovery in environ- Freeman, L.C. (1979). Centrality in social networks:
mental sciences: visual and automatic data mining Conceptual clarication. Social Networks, 1:
for radon problems in groundwater. Transactions in 215239.
GIS, 11(2): 255281.
Fuhrmann, S., Ahonen-Rainio, P., Edsall, R.M.,
DiBiase, D. (1990). Visualization in the Earth Sciences. Fabrikant, S.I., Koua, E.L., Tobn, C., Ware, C.
Earth and Mineral Sciences, 59(2): 1318. and Wilson, S. (2005). Making useful and useable
geovisualization: design and evaluation issues. In:
Dykes, J.A. and Mountain, D.M. (2003). Seeking Dykes, J., MacEachren, A.M. and Kraak, M.-J.
structure in records of spatio-temporal behaviour: (eds). Exploring Geovisualization, pp. 553566.
visualization issues, efforts and applications. Amsterdam: Elsevier.
Computational Statistics and Data Analysis,
43(4): 581603. Gahegan, M. (2000). On the application of inductive
machine learning tools to geographical analysis.
Dykes, J.A. (2005). Facilitating interaction for geo- Geographical Analysis, 32(2): 113139.
visualization. In: Dykes, J., MacEachren, A.M.
Gahegan, M., Takatsuka, M., Wheeler, M.
and Kraak, M.-J. (eds), Exploring Geovisualization,
and Hardisty, F. (2000). GeoVISTA Studio: a
pp. 265292. Amsterdam: Elsevier.
geocomputational workbench. In: Proceedings of
Dykes, J.A., MacEachren, A.M. and Kraak, M.-J. Geocomputation 2000. Univeristy of Greenwich, UK.
(2005). Advancing geovisualization. In: Dykes, J.,
Gahegan, M. and Brodaric, B. (2002). Computational
MacEachren, A.M. and Kraak, M.-J. (eds), Exploring
and visual support for geographical knowledge
Geovisualization, pp. 693704. Amsterdam: Elsevier.
construction: lling in the gaps between exploration
Edsall, R.M. (2003). The parallel coordinate plot in and explanation. In: Proceedings of the Spatial Data
action: design and use for geographic visualization. Handling 2002. Ottawa, Canada.
Computational Statistics and Data Analysis, 43: Gahegan, M., Takatsuka, M., Wheeler, M. and
605619. Hardisty, F. (2002). Introducing Geo-VISTA Studio:
van Elzakker, C.P.J.M. (2004). The use of maps in the an integrated suite of visualization and compu-
exploration of geographic data. PhD thesis. Utrecht tational methods for exploration and knowledge
University, Utrecht, The Netherlands. construction in geography. Computers, Environment
and Urban Systems, 26: 267292.
Fabrikant, S.A., Skupin, A. and Couclelis, H. (2002).
Gahegan, M. (2005). Beyond tools: visual support
Spatialization: Spatial Metahphors and Methods for
for the entire process of GIScience. In: Dykes, J.,
Handling Non-Spatial Data. Web document (last
MacEachren, A.M. and Kraak, M.-J. (eds),
accessed 12 July 2007), http://www.geog.ucsb.edu/
Exploring Geovisualization, pp. 8399. Amsterdam:
sara/html/research/ucgis/spatialization_ucsb.pdf
Elsevier.
Fayyad, U., Grinstein, G.G. and Wierse, A. (eds) Gastner, M.T. and Newman, M.E.J. (2004). Diffusion-
(2002). Information Visualization in Data Mining based method for producing density-equalizing
and Knowledge Discovery. San Francisco: Morgan maps. In: Proceedings of the National Academy of
Kaufmann Publishers. Sciences, 101(20): 74997504.
Fisher, R.A. (1936). The use of multiple mea- Grifn, A. (2004). Understanding how scientists use
surements in taxonomic problems. Annals of datadisplay devices for interactive visual comput-
Eugenics, 7(2): 179188. In: Fisher, R.A. (1950). ing with geographical models. PhD thesis. The
Contributions to Mathematical Statistics. New York: Pennsylvania State University, Pennsylvania, USA.
John Wiley & Sons.
Guo, D. (2003). Coordinating computational and visual
Fisher, P.F. and Unwin, D.J. (eds) (2002). Virtual Reality approaches for interactive feature selection and
in Geography. London: Taylor and Francis. multivariate clustering. Information Visualization,
2003(2): 232246.
Fotheringham, A.S., Brunsdon, C. and Charlton, M.
(2000). Exploring Spatial Data Visually, Chapter Guo, D., Gahegan, M. and MacEachren, A.M. (2004).
4 in Quantitative Geography Perspectives on An Integrated Environment for High-dimensional
Spatial Data Analysis, 6592. Sage Publications. Geographic Data Mining. In: Proceedings of
London, UK. GIScience 2004. University of Maryland, USA.
60 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
Guo, D., Gahegan, M., MacEachren, A.M. and Zhou, B. Kwan, M.P. (2000). Interactive geovisualization of
(2005). Multivariate analysis and geovisualization activity-travel patterns using three dimensional
with an integrated geographic knowledge discovery geographical information systems: a methodological
approach. Cartography and Geographic Information exploration with a large data set. Transportation
Science, 32(2): 113132. Research Part C, 8: 185203.
Haklay, M. and Tobn, C. (2003). Usability evaluation Kwan, M.P. (2004). GIS methods in time-geographic
and PPGIS: Towards a user-centred design approach. research: geocomputation and geovisualization of
International Journal of Geographical Information human activity patterns. Geograska Annaler B,
Science, 17(6): 577592. 86: 267280.
International Cartographic Association (ICA) (2008). Lacayo, M. and Skupin, A. (2007). A GIS-based module
ICA Commission on GeoVisualization, website for training and visualization of self-organizing
of the commission, http://geoanalytics.net/ica (last maps. Working paper, accepted to the Workshop
accessed: 11 September 2008). of the ICA Commission on Visualization and Virtual
Environments, From Geovisualization to Geovisual
Inselberg, A. (2002). Visualization and data mining of
Analytics, Helsinki, August 2007.
high-dimensional data. Chemometrics and Intelligent
Laboratory Systems, 60: 147159. MacEachren, A.M., Wachowitz, M., Edsall, R.,
Haug, D. and Masters, R. (1999). Constructing
Jiang, B. and Harrie, L. (2004). Selection of streets from
knowledge from multivariate spatio-temporal data:
a network using self-organizing maps. Transactions
integrating geographical visualization with know-
in GIS, 8: 335350.
ledge discovery in database methods. International
Keim, D.A. (2002). Information visualization and visual Journal of Geographic Information Science, 13(4):
data mining. IEEE Transactions on Visualization and 311334.
Computer Graphics, 7(1): 100107. MacEachren, A.M. and Kraak, M.-J. (2001). Research
Keim, D.A. and Ward, M. (2003). Visualization. In: challenges in geovisualization. Cartography and
Berthold, M. and Hand, D.J. (eds), Intelligent Data Geographic Information Science, 28(1): 312.
Analysis, 2nd edn, pp. 403428. BerlinHeidelberg: MacEachren, A., Dai, X., Hardisty, F., Guo, D. and
Springer Verlag. Lengerich, G. (2003). Exploring High-D Spaces with
Kohonen, T. (1997). Self-Organizing Maps, 2nd edn. Multiform Matrices and Small Multiples. In: Proceed-
BerlinHeidelberg: Springer Verlag. ings of the International Symposium on Information
Visualization 2003. Seattle, Washington, USA.
Koua, E.L. and Kraak, M.-J. (2004). Alternative
visualization of large geospatial datasets. The MacEachren, A.M., Gahegan, M., Pike, W., Brewer, I.,
Cartographic Journal, 41: 217228. Cai, G. and Lengerich, E. (2004). Geovisualization for
knowledge construction and decision support. IEEE
Koua, E.L. (2005). Computational and visual support Computer Graphics and Applications, 24(1): 1317.
for exploratory geovisualization and knowledge
construction. PhD thesis. Utrecht University, Utrecht, McCormick, B.H., DeFanti, T.A. and Brown, M.D.
The Netherlands. (1987). Visualization in Scientic Computing A
Synopsis. IEEE Computer Graphics and Applications,
Kraak, M.-J. and Koussoulakou, A. (2004). A visual- 7(7): 6170.
ization environment for the spacetime cube. In:
Fisher, P.F. (ed.) Developments in Spatial Data Meng, L. (2004). About egocentric geovisualization. In:
Handling, 11th International Symposium on Spatial Proceedings of the 12th International Conference on
Data Handling, pp. 189200. BerlinHeidelberg: Geoinformatics, Gvle, Sweden, June 2004.
Springer Verlag. Meng, L. (2005). Egocentric design of map-based
Kreuseler, M. (2000). Visualization of geographically mobile services. The Cartographic Journal, 42(1):
related multidimensional data in virtual 3D scenes. 513.
Computers & Geosciences, 26: 101108. Miller, H.J. and Han, J. (eds) (2001). Geographic
Data Mining and Knowledge Discovery. London and
Kreuseler, M. and Schumann, H. (2002). A exible
New York: Taylor & Francis.
approach for visual data mining. IEEE Transactions
on Visualization and Computer Graphics, 8(1): Mller-Hannemann, M. (2001). Drawing trees, series
3951. parallel digraphs and lattices. In: Kaufmann, M. and
GEOVISUALIZATION AND GEOVISUAL ANALYTICS 61
Wagner, D. (eds), Drawing Graphs Methods and for non-geographic information visualization.
Models. Lecture Notes in Computer Science, 2025: Cartography and Geographic Information Science,
4670. BerlinHeidelberg: Springer Verlag. 30(2): 99119.
National Visualization and Analytics Center (NVAC) Skupin, A. and Hagelman, R. (2005). Visualizing
(2005). Illuminating the Path: Creating the R&D demographic trajectories with self-organizing maps.
Agenda for Visual Analytics. Available at: http://nvac. Geoinformatica, 9(2): 159179.
pnl.gov/agenda.stm (last accessed 17 July 2007).
Slocum, T.A., Cliburn, D.C., Feddema, J.J. and
Newman, M.E.J. (2004). Who is the best connected Miller, J.R. (2003). Evaluating the usability of a
scientist? A study of scientic coauthorship networks. tool for visualizing the uncertainty of the future
In: Ben-Naim, E., Frauenfelder, H. and Toroczkai, Z. global water balance. Cartography and Geographic
(eds), Complex Networks, pp. 337370. Berlin Information Science, 30(4): 299317.
Heidelberg: Springer Verlag.
patenkov, O., Demar, U. and Krisp, J.M. (2007).
Nielsen, J. (1993). Usability Engineering. San Francisco: Self-organising maps for exploration of spatio-
Morgan Kaufmann Publishers. temporal emergency response data. In: Proceedings
of Geocomputation 2007. Maynooth, Ireland.
Plaisant, C. (2004). The challenge of information
visualization evaluation. In: Proceedings of the IEEE Stasko, J. and Zhang, E. (2000). Focus+context
Conference on Advanced Visual Interfaces AVI04, display and navigation techniques for enhancing
Gallipoli, Italy. radial, space-lling hierarchy visualizations. In:
Proceedings of InfoVis2000, IEEE Symposium on
Preece, J. Rogers, Y. and Sharp, H. (2002). Interaction
Information Visualization, pp. 5768. Salt Lake City,
Design: Beyond HumanComputer Interaction.
Utah, USA.
New York: John Wiley and Sons.
Suchan, T.A. (2002). Usability studies of geovisual-
Roberts, J.C. (2005). Exploratory visualization with mul-
ization software in the workplace. In: Proceedings
tiple linked views. In: Dykes, J., MacEachren, A.M.
of the National Conference for Digital Government
and Kraak, M.-J. (eds), Exploring Geovisualization,
Research, Los Angeles, USA.
pp. 159180. Amsterdam: Elsevier.
Takatsuka, M. (2001). An application of the self-
Robinson, A.C., Chen, J., Lengerich, E.J., Meyer, H.G.
organizing map and interactive 3-D visualization
and MacEachren, A.M. (2005). Combining usability
to geospatial data. In: Proceedings of the
techniques to design geovisualization tools for
Sixth International Conference on Geocomputation,
epidemiology. In: Proceedings of Auto-Carto 2005,
Brisbane, Australia.
Las Vegas, USA.
Takatsuka, M. and Gahegan, M. (2002). GeoVISTA
Seo, J. and Shneiderman, B. (2002). Interactively
Studio: a codeless visual programming environment
Exploring Hierarchical Clustering Results. IEEE
for geoscientic data analysis and visualization.
Computer, 35(7): 8086.
Computers & Geosciences, 28: 11311144.
Shneiderman, B. (1996). The eyes have it: a task by
Tobler, W. (2004). Thirty Five Years of Computer
data type taxonomy for information visualization.
Cartograms. Annals of the Association of American
IEEE Proceedings of Visual Languages. Boulder,
Geographers, 94(1): 5873.
Colorado, USA.
Tobn, C. (2002). Usability Testing for Improving Inter-
Shneiderman, B. (2001). Inventing discovery tools:
active Geovisualization Techniques. Working paper,
combining information visualization with data
Centre for Advanced Spatial Analysis, University
mining. In: Proceedings of the 12th International
College London, available at: http://www.casa.ucl.
Conference on Algorithmic Learning Theory. Lecture
ac.uk/working_papers/Paper45.pdf (last accessed
Notes in Computer Science, 2226: 1728. Berlin
13 July 2007).
Heidelberg: Springer Verlag.
Tobn, C. (2005). Evaluating geographic
Silipo, R. (2003). Neural networks. In: Berthold, M. and
visualization tools and methods: an approach
Hand, D.J. (eds), Intelligent Data Analysis, 2nd edn,
and experiment based upon user tasks. In:
pp. 26932. BerlinHeidelberg: Springer Verlag.
Dykes, J. MacEachren, A.M. and Kraak, M.-J.
Skupin, A. and Fabrikant, S.A. (2003). Spatialization (eds), Exploring Geovisualization, pp.645666.
methods: a cartographic research agenda Amsterdam: Elsevier.
62 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
Tomaszewski, B., Robinson, A.C., Weaver, C., Vesanto, J. (1999). SOM-based data visualization
Stryker, M. and MacEachren, A.M. (2007). Geovisual methods. Intelligent Data Analysis, 3: 111126.
analytics and crisis management. In: Proceedings
Ware, C. (2000). Information Visualization: Perception
of the 4th International ISCRAM Conference, Delft,
for Design. San Francisco, USA: Morgan Kaufmann
The Netherlands.
Publishers.
Tukey, J.W. (1977). Exploratory Data Analysis. Reading,
Ware, C. and Plumlee, M. (2005). 3D geovisu-
Massachusetts: Addison-Wesley.
alization and the structure of visual space. In:
Unwin, A. and Unwin, D. (1998). Exploratory spatial Dykes, J. MacEachren, A.M. and Kraak, M.-J.
data analysis with local statistics. The Statistician, (eds), Exploring Geovisualization, pp. 567576.
47(3): 415421. Amsterdam: Elsevier.
5
Availability of Spatial Data
Mining Techniques
Shashi Shekhar, Vijay Gandhi, Pusheng Zhang
and Ranga Raju Vatsavai
This chapter is organized as follows. In among spatial objects, such as overlap, inter-
section 5.2, we provide an overview of spatial sect, and behind are often implicit. Table 5.1
data. Section 5.3 presents important statis- lists non-spatial relationships and their cor-
tical concepts used in spatial data mining. responding spatial relationship. One possible
Spatial Data Mining techniques, the main way to deal with implicit spatial relationships
focus of this chapter, are explained in is to materialize the relationships into tra-
section 5.4. Specifically, we present major ditional data input columns and then apply
accomplishments in mining output patterns classical data mining techniques (Agrawal
known as predictive models, semi-supervised and Srikant, 1994; Jain and Dubes, 1988;
approaches, outliers, co-location rules, and Quinlan, 1993). However, the materialization
clustering. In section 5.5, we briefly review can result in loss of information. Another way
the computational processes for spatial data to capture implicit spatial relationships is to
mining techniques. Finally, in section 5.6, we develop models or techniques to incorporate
identify areas of spatial data mining where spatial information into the spatial data
further research is needed. This chapter does mining process. We discuss a few case studies
not discuss spatial statistics or algorithm- of such techniques in section 5.4.
level computational processes in depth as The representation of spatial data and use
these topics are beyond the scope of this of spatial operators has been standardized
chapter. by the Open GIS (OGIS) consortium for
interoperability of spatial applications, such
as Geographic Information Systems. OGIS
defines standard spatial data types which can
5.2. DATA INPUT be used in combination to represent a spatial
object. Some examples of OGIS data types
The data inputs of spatial data mining are include Point, Curve, Surface, and Geometry
more complex than the inputs of classical Collection. In addition to specifying data
data mining because they include extended types, the OGIS standard also includes three
objects such as points, lines, and polygons. categories of spatial operations: (a) basic
The data inputs of spatial data mining spatial operations which can to applied to all
have two distinct types of attributes: non-
spatial attribute and spatial attribute. Non-
spatial attributes are used to characterize
non-spatial features of objects, such as name, Table 5.1 Relationships among non-spatial
population, and unemployment rate for a city. data and spatial data
They are the same as the attributes used Non-spatial relationship Spatial relationship
in the data inputs of classical data mining. (explicit) (often implicit)
Spatial attributes are used to define the Arithmetic Set-oriented: union, intersection,
spatial location and extent of spatial objects membership,
(Bolstad, 2002). The spatial attributes of a Ordering Topological: meet, within,
spatial object most often include information overlap,
Is instance of Directional: North, NE, left,
related to spatial locations, e.g., longitude,
above, behind,
latitude and elevation, as well as shape. Subclass of Metric: e.g., distance, area,
Relationships among non-spatial objects perimeter,
are explicit in data inputs, e.g., arithmetic Part of Dynamic: update, create,
relation, ordering, is instance of, subclass of, destroy,
Membership of Shape-based and visibility
and membership of. In contrast, relationships
AVAILABILITY OF SPATIAL DATA MINING TECHNIQUES 65
geometry datatypes, e.g., to find the boundary by a Poisson process) using the average
of a spatial object, (b) operations to test distance between a point and its nearest
for topological relationship between objects, neighbor. For a random pattern, this average
e.g., to find if two spatial objects overlap, distance is expected to be 1/(2 density),
and (c) operations to perform spatial analysis, where density is the average number of
e.g., to calculate the shortest distance path points per unit area. If for a real process,
between two spatial objects. the computed distance falls within a certain
A recent topic of research is the representa- limit, then we conclude that the pattern is
tion of spatial data which have an associated generated by a random process; otherwise it
temporal aspect. A location based service is is a non-random process.
an example in which a service is offered
based on the location and time of an entity.
Current OGIS standards do not yet support Lattice
such systems. A lattice is a model for a gridded space
in a spatial framework. Here the lattice
refers to a countable collection of regular or
irregular spatial sites related to each other
5.3. STATISTICAL FOUNDATION via a neighborhood relationship. Several
spatial statistical analyses, e.g., the spatial
Readers of this handbook will be exposed to
autoregressive model and Markov random
more statistical foundations in later chapters.
fields can be applied on lattice data.
Here we address only the basic concepts
needed to follow the rest of this chapter.
Statistical models (Cressie, 1993) are often
used to represent observations in terms of
Geostatistics
Geostatistics deals with the analysis of spatial
random variables. These models can then
continuity and weak stationarity (Cressie,
be used for estimation, description, and pre-
1993), which is an inherent characteristic of
diction based on probability theory. Spatial
spatial datasets. Geostatistics provides a set
data can be thought of as resulting from
of statistics tools, such as kriging (Cressie,
observations on the stochastic process Z(s) :
1993), to the interpolation of attributes at
s D, where s is a spatial location and D is
unsampled locations.
possibly a random set in a spatial framework.
One of the fundamental assumptions of
Here we present three spatial statistical
statistical analysis is that the data samples
problems one might encounter: point process,
are independently generated: like successive
lattice, and geostatistics.
tosses of coin, or the rolling of a die.
However, in the analysis of spatial data,
Point process the assumption about the independence of
A point process is a model for the spatial samples is generally false. In fact, spatial
distribution of the points in a point pattern. data tends to be highly self correlated. For
Several natural processes can be modeled example, people with similar characteristics,
as spatial point patterns, e.g., positions of occupation and background tend to cluster
trees in a forest and locations of bird habitats together in the same neighborhoods. The
in a wetland. Spatial point patterns can be economies of a region tend to be similar.
broadly grouped into random or non-random Changes in natural resources, wildlife, and
processes. Real point patterns are often temperature vary gradually over space. The
compared with a random pattern (generated property of like things to cluster in space is so
66 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
Figure 5.1 Attribute values in space with independent identical distribution and spatial
autocorrelation.
fundamental that geographers have elevated objects under study (e.g., urban, forest,
it to the status of the first law of geography: water) are often much larger than 30 m. As
Everything is related to everything else but a result, per-pixel-based classifiers, which
nearby things are more related than distant do not take spatial context into account,
things (Tobler, 1979). In spatial statistics, often produce classified images with salt and
an area within statistics devoted to the pepper noise. These classifiers also suffer in
analysis of spatial data, this property is terms of classification accuracy.
called spatial autocorrelation (Shekhar and The spatial relationship among locations in
Chawla, 2003). For example, Figure 5.1 a spatial framework is often modeled via a
shows the value distributions of an attribute contiguity matrix. A simple contiguity matrix
in a spatial framework for an independent may represent a neighborhood relationship
identical distribution (Figure 5.1(a)) and defined using adjacency, Euclidean distance,
a distribution with spatial autocorrelation etc. Example definitions of neighborhood
(Figure 5.1(b)). using adjacency include a four-neighborhood
Knowledge discovery techniques which and an eight-neighborhood. Given a uni-
ignore spatial autocorrelation typically form gridded spatial framework, a four-
perform poorly in the presence of spatial neighborhood assumes that a pair of locations
data. Often the spatial dependencies arise influence each other if they share an edge.
due to the inherent characteristics of the An eight-neighborhood assumes that a pair
phenomena under study, but in particular of locations influence each other if they share
they arise due to the fact that the spatial either an edge or a vertex.
resolution of imaging sensors are finer than Figure 5.2(a) shows a gridded spatial
the size of the object being observed. For framework with four locations, A, B,
example, remote sensing satellites have C, and D. A binary matrix representa-
resolutions ranging from 30 m (e.g., the tion of a four-neighborhood relationship is
Enhanced Thematic Mapper of the Landsat 7 shown in Figure 5.2(b). The row-normalized
satellite of NASA) to 1 m (e.g., the IKONOS representation of this matrix is called a
satellite from SpaceImaging), while the contiguity matrix, as shown in Figure 5.2(c).
AVAILABILITY OF SPATIAL DATA MINING TECHNIQUES 67
A B C D A B C D
A 0 1 1 0 A 0 0.5 0.5 0
A B
B 1 0 0 1 B 0.5 0 0 0.5
C 1 0 0 1 C 0.5 0 0 0.5
C D
D 0 1 1 0 D 0 0.5 0.5 0
Nest sites for 1995 Darr location Vegetation distribution across the wetland
0
10 10
20 20
30 30
40 40
50 50
60 60
70 70
80 80
nz = 5372 nz = 5372
Figure 5.3 (a) Learning dataset: the geometry of the Darr wetland and the locations of the
nests, (b) the spatial distribution of vegetation durability over the marshland, (c) the spatial
distribution of water depth, and (d) the spatial distribution of distance to open water.
and which are not. For example, vegeta- training data, and then tested on the remain-
tion durability was chosen over vegetation der of the data, called the testing data. In this
species because specialized knowledge about study a model was built using the 1995 Darr
the bird-nesting habits of the red-winged wetland data and then tested using the 1995
blackbird suggested that the choice of nest Stubble wetland data. In the learning data,
location is more dependent on plant structure, all the attributes are used to build the model
plant resistance to wind, and wave action than and in the training data, one value is hidden,
on the plant species. in this case the location of the nests. Using
An important goal is to build a model for knowledge gained from the 1995 Darr data
predicting the location of bird nests in the and the value of the independent attributes in
wetlands. Typically, the model is built using the test data, the goal is to predict the location
a portion of the data, called the learning or of the nests in the 1995 Stubble data.
AVAILABILITY OF SPATIAL DATA MINING TECHNIQUES 69
Modeling spatial dependencies using the We refer to this equation as the Spatial
SAR and MRF models Autoregressive Model (SAR). Notice that
Several previous studies (Jhung and Swain, when = 0, this equation collapses to
1996; Solberg et al., 1996) have shown that the classical regression model. The bene-
the modeling of spatial dependency (often fits of modeling spatial autocorrelation are
called context) during the classification many: the residual error will have much
process improves overall classification lower spatial autocorrelation (i.e., systematic
accuracy. Spatial context can be defined by variation). With the proper choice of W , the
the relationships between spatially adjacent residual error should, at least theoretically,
pixels in a small neighborhood. In this have no systematic variation. If the spatial
section, we present two approaches to autocorrelation coefficient is statistically sig-
modeling spatial dependency: the SAR and nificant, then SAR will quantify the presence
MRF-based Bayesian classifiers. of spatial autocorrelation. It will indicate the
extent to which variations in the dependent
variable ( y) are explained by the average of
Spatial autoregression model neighboring observation values. Finally, the
The spatial autoregressive model decom- model will have a better fit, (i.e., a higher
poses a classifier fC into two parts, R-squared statistic).
namely spatial autoregression and logis-
tic transformation. We first show how
spatial dependencies are modeled using the
Markov random eld-based Bayesian
framework of logistic regression analysis. In
classiers
Markov random field-based Bayesian clas-
the spatial autoregression model, the spatial
sifiers estimate the classification model fC
dependencies of the error term, or, the
using MRF and Bayes rule. A set of random
dependent variable, are directly modeled in
variables whose interdependency relationship
the regression equation (Anselin, 1988). If
is represented by an undirected graph (i.e., a
the dependent values yi are related to each
symmetric neighborhood matrix) is called
other, then the regression equation can be
a Markov random field (Li, 1995). The
modified as
Markov property specifies that a variable
depends only on its neighbors and is
y = Wy + X + (5.1) independent of all other variables. The
location prediction problem can be modeled
in this framework by assuming that the class
Here W is the neighborhood relationship label, li = fC (si ), of different locations,
contiguity matrix and is a parameter si , constitutes an MRF. In other words,
that reflects the strength of the spatial random variable li is independent of lj
dependencies between the elements of the if W (si , sj ) = 0.
dependent variable. After the correction term The Bayesian rule can be used to predict
Wy is introduced, the components of the li from feature value vector X and neighbor-
residual error vector are then assumed to hood class label vector Li as follows:
be generated from independent and identical
standard normal distributions. As in the case
of classical regression, the SAR equation has Pr X li , Li Pr li Li
Pr li X, Li = .
to be transformed via the logistic function for Pr (X)
binary dependent variables. (5.2)
70 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
before proceeding with any serious clustering of homogeneous Poisson processes: event-
analyses. to-nearest-event distances are proportional to
2 random variables, whose densities have a
substantial amount of probability near zero
(Cressie, 1993). Spatial clustering is more
Complete spatial randomness, cluster, statistically significant when the data exhibit
and decluster a cluster pattern rather than a CSR pattern or
In spatial statistics, the standard against decluster pattern.
which spatial patterns are often compared is Several statistical methods can be applied
a completely spatially random point process, to quantify deviations of patterns from a
and departures indicate that the pattern is complete spatial randomness point pattern
not distributed randomly in space. Complete (Cressie, 1993). One type of descriptive
spatial randomness (CSR) (Cressie, 1993) is statistic is based on quadrats (i.e., well
synonymous with a homogeneous Poisson defined area, often rectangular in shape).
process. The patterns of the process are Usually quadrats of random location and
independently and uniformly distributed over orientations in the quadrats are counted,
space, i.e., the patterns are equally likely to and statistics derived from the counters
occur anywhere and do not interact with each are computed. Another type of statistic is
other. However, patterns generated by a non- based on distances between patterns; one
random process can be either cluster patterns such type is Ripleys K-function (Cressie,
(aggregated patterns) or decluster patterns 1993).
(uniformly spaced patterns). After the verification of the statistical
To illustrate, Figure 5.4 shows realiza- significance of the spatial clustering, classical
tions from a completely spatially random clustering algorithms (Han et al., 2001) can
process, a spatial cluster process, and a be used to discover interesting clusters.
spatial decluster process (each conditioned
to have 80 points) in a square. Notice
in Figure 5.4(a) that the complete spatial
randomness pattern seems to exhibit some Clustering point process
clustering. This is not an unrepresentive real- As discussed in section 5.3, a point process
ization, but illustrates a well-known property is a model for the spatial distribution of
Figure 5.5 Marked spatial point process. Spatial locations for different female chimpanzees
at the Gombe National Park, Tanzania.
the spatial points in a point pattern. A point Besags L-function (Besag, 1977), which is
process in which each of the spatial locations a modified version of Ripleys K-function
is marked with a unique label is called (Cressie, 1993), is used to quantify the
a marked spatial point process. Clustering second-order interaction between point pro-
of marked spatial point processes is an cesses. This measure provides the correlation
interesting problem in many application between the observed and expected pairs of
domains. For example, in behavioral ecology, points at a certain distance from each other.
ecologists are interested in finding clusters Based on the value of this measure, marked
of individual chimpanzees based on their point processes can be clustered hierarchi-
space usage, which usually consists of several cally, to produce a dendrogram or a block
spatial points for each individual. An example diagonal matrix, which can be analyzed by
of marked spatial point processes is shown in domain experts to find a threshold level to
Figure 5.5. identify proper clusters.
The problem of clustering marked spatial
point processes is a generalization of the
problem of clustering spatial points, where
5.4.3. Semi-supervised learning
instead of a single spatial location for
each category, we have multiple spatial The methods described in the previous
locations for each category. Each category section are examples of supervised learn-
is a spatial point process. Classical cluster- ing algorithms. In supervised methods, the
ing approaches handle homogeneous spatial model is built using a training dataset.
points and hence cannot cluster marked For example, in a remote sensing image,
spatial point processes. A very limited training data will be a collection of
amount of research has been done in the area labeled pixels. Practically, it is very dif-
of clustering marked spatial point processes. ficult to collect labels for all training
(Han et al., 2001). data. Hence an approach which does not
A data mining technique for clustering require many labeled samples is needed.
marked spatial point processes is proposed Such an approach which uses less labeled
by (Shekhar et al., 2006). This algorithm is samples and a large number of unlabeled
based on the intuition that the intra-cluster samples is called semi-supervised learning
similarity must be significantly higher than (Vatsavai and Shekhar, 2005). Based on the
the inter-cluster similarity. During clustering, ExpectationMaximization (EM) algorithm,
AVAILABILITY OF SPATIAL DATA MINING TECHNIQUES 73
+ +
Use EM
+
+ + +
+ +
+
Classification is
Unlabeled samples are Each cluster is
dependent on labeled
added manually classified
samples
Supervised Semi-supervised Un-supervised
maximum likelihood, and maximum a poste- classification model. Figure 5.7 shows clas-
riori classifiers, the semi-supervised method sification of satellite imagery into different
utilizes a small set of labeled and a large classes. Figure 5.7(a) is obtained by using
number of unlabeled training samples to 100 labeled data points in the training dataset.
build a model. The model obtained using only 20 labeled
Figure 5.6 illustrates the difference data points is shown in Figure 5.7(b). As
between different approaches used in it can be seen the model with a lesser
classification. The supervised approach number of labeled data points is poorer as
shown in Figure 5.6(a) requires many compared to the model with a greater number
labeled data, in this case + and to build of data points. However, the model with a
a model. An unsupervised approach does not lesser number of labeled data points can be
require any training dataset to build a model. improved by including unlabeled data points
The semi-supervised approach shown in and using a semi-supervised technique. The
Figure 5.6(c) uses a small number of labeled resulting model is shown in Figure 5.7(c).
and a large number of unlabeled datasets to
build a model.
A semi-supervised approach is better than
5.4.4. Spatial outliers
using a supervised approach with a smaller
number of labeled samples. Figure 5.7 Outliers have been informally defined as
shows an example which proves that observations in a dataset which appear to be
including an unlabeled dataset and using inconsistent with the remainder of that set
a semi-supervised approach improves the of data (Barnett and Lewis, 1994), or which
74 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
Band 5
Band 5
12 3 12
26 26 13 12 3
13 80 43 13
80 43 80 43
29
29
60 2
60 29 2 60 2
40 40 40
20 20 20
1 1 1
0 0 0
0 20 40 60 80 100 120 140 160 0 20 40 60 80 100 120 140 160 0 20 40 60 80 100 120 140 160
Band 4 Band 4 Band 4
(a) Results with 100 labeled (b) Results with 20 labeled (c) Results with 20 labeled
samples, supervised samples, supervised and 80 unlabeled samples,
semi-supervised
deviate so much from other observations so growing metropolitan area is a spatial outlier
as to arouse suspicions that they were gen- based on the non-spatial attribute house age.
erated by a different mechanism (Hawkins,
1980). The identification of global outliers
can lead to the discovery of unexpected Illustrative examples
knowledge and has a number of practical We use an example to illustrate the dif-
applications in areas such as credit card ferences among global and spatial outlier
fraud, athlete performance analysis, voting detection methods. In Figure 5.8(a), the
irregularity, and severe weather prediction. X-axis is the location of data points in
This section focuses on spatial outliers, i.e., one-dimensional space; the Y -axis is the
observations which appear to be inconsistent attribute value for each data point. Global
with their neighborhoods. Detecting spatial outlier detection methods ignore the spatial
outliers is useful in many applications of location of each data point and fit the
geographic information systems and spatial distribution model to the values of the non-
databases, including transportation, ecology, spatial attribute. The outlier detected using
public safety, public health, climatology, and this approach is the data point G, which
location-based services. has an extremely high attribute value 7.9,
A spatial outlier is a spatially referenced exceeding the threshold of + 2 = 4.49 +
object whose non-spatial attribute values 2 1.61 = 7.71, as shown in Figure 5.8(b).
differ significantly from those of other This test assumes a normal distribution
spatially referenced objects in its spatial for attribute values. On the other hand,
neighborhood. Informally, a spatial outlier is S is a spatial outlier whose observed value
a local instability (in values of non-spatial is significantly different than its neighbors
attributes) or a spatially referenced object P and Q.
whose non-spatial attributes are extreme
relative to its neighbors, even though the
attributes may not be significantly different Tests for detecting spatial outliers
from the entire population. For example, Tests to detect spatial outliers separate
a new house in an old neighborhood of a spatial attributes from non-spatial attributes.
AVAILABILITY OF SPATIAL DATA MINING TECHNIQUES 75
7
6
Number of occurrence
6
Attribute values
5
P 5 2 +2
4
D 4
3
Q 3
2
2
L
1 1
0 0
0 2 4 6 8 10 12 14 16 18 20 2 0 2 4 6 8 10
Spatial attributes are used to characterize both locations may appear to be reasonable
location, neighborhood, and distance. Non- when examining the dataset non-spatially.
spatial attribute dimensions are used to Figure 5.9(a) shows a variogram cloud for
compare a spatially referenced object to its the example dataset shown in Figure 5.8(a).
neighbors. The spatial statistics literature This plot shows that two pairs (P, S) and
provides two kinds of bi-partite multidi- (Q, S) on the left-hand side lie above the main
mensional tests, namely graphical tests and group of pairs, and are possibly related to
quantitative tests. Graphical tests, which are spatial outliers. The point S may be identified
based on the visualization of spatial data, as a spatial outlier since it occurs in both
highlight spatial outliers. Example methods pairs (Q, S) and (P, S). However, graphical
include variogram clouds and Moran scatter- tests of spatial outlier detection are limited
plots. Quantitative methods provide a precise by the lack of precise criteria to distinguish
test to distinguish spatial outliers from the spatial outliers. In addition, a variogram
remainder of data. Scatterplots (Anselin, cloud requires non-trivial post-processing of
1994) are a representative technique from the highlighted pairs to separate spatial outliers
quantitative family. from their neighbors, particularly when
A variogram-cloud (Cressie, 1993) dis- multiple outliers are present, or density varies
plays data points related by neighborhood greatly.
relationships. For each pair of locations, A Moran scatterplot (Anselin, 1995) is
the square-root of the absolute difference a plot of a normalized attribute value
between attribute values at the locations (Z[ f ( i)] = (f (i) f )/f ) against the neigh-
versus the Euclidean distance between the borhood average of normalized attribute val-
locations are plotted. In datasets exhibiting ues (W Z), where W is the row-normalized
strong spatial dependence, the variance in (i.e., j Wi j = 1) neighborhood matrix,
the attribute differences will increase with (i.e., Wi j > 0 iff neighbor (i, j)). The
increasing distance between locations. Loca- upper left and lower right quadrants of
tions that are near to one another, but with Figure 5.9(b) indicate a spatial association of
large attribute differences, might indicate a dissimilar values: low values surrounded by
spatial outlier, even though the values at high value neighbors (e.g., points P and Q),
76 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
attribute values
(P,S) 1
attribute values
1.5 Q
0
1
S
1
0.5
2
0 3
0 0.5 1 1.5 2 2.5 3 3.5 2 1.5 1 0.5 0 0.5 1 1.5 2 2.5
Pairwise distance Z-Score of attribute values
Figure 5.9 Variogram cloud and Moran scatterplot to detect spatial outliers.
and high values surrounded by low values A location (sensor) is compared to its
(e.g., point S). Thus we can identify points neighborhood using the function S(x) =
(nodes) that are surrounded by unusually high [ f (x) EyN(x) ( f ( y))], where f (x) is the
or low value neighbors. These points can be attribute value for a location x, N(x) is the
treated as spatial outliers. set of neighbors of x, and EyN(x) ( f ( y)) is
A scatterplot (Anselin, 1994) shows the average attribute value for the neighbors
attribute values on the X-axis and the average of x. The statistic function S(x) denotes the
of the attribute values in the neighborhood difference of the attribute value of a sensor
on the Y -axis. A least square regression located at x and the average attribute value
line is used to identify spatial outliers. of xs neighbors.
A scatter sloping upward to the right indicates Spatial statistic S(x) is normally distributed
a positive spatial autocorrelation (adjacent if the attribute value f (x) is normally dis-
values tend to be similar); a scatter sloping tributed. A popular test for detecting spatial
upward to the left indicates a negative spatial outliers for normally distributed f (x) can be
autocorrelation. The residual is defined as the described as follows: spatial statistic Zs(x) =
vertical distance (Y -axis) between a point P |(S(x) s )/s | > . For each location
with location (Xp , Yp ) to the regression line x with an attribute value f (x), the S(x) is
Y = mX + b, that is, residual = Yp the difference between the attribute value at
(mXp + b). Cases with standardized residuals, location x and the average attribute value of
standard = ( )/ , greater than 3.0 or xs neighbors, s is the mean value of S(x),
less than 3.0 are flagged as possible spatial and s is the value of the standard deviation
outliers, where and are the mean and of S(x) over all stations. The choice of
standard deviation of the distribution of the depends on a specified confidence level. For
error term . In Figure 5.10(a), a scatterplot example, a confidence level of 95 percent will
shows the attribute values plotted against the lead to 2.
average of the attribute values in neighboring Figure 5.10(b) shows the visualization of
areas for the dataset in Figure 5.8(a). The the spatial statistic method described above.
point S turns out to be the farthest from the The X-axis is the location of data points
regression line and may be identified as a in one-dimensional space; the Y -axis is
spatial outlier. the value of spatial statistic Zs(x) for each
AVAILABILITY OF SPATIAL DATA MINING TECHNIQUES 77
Zs(x)
4.5 1
4
3.5 S 0
3
1
2.5 P
2
Q
2
0 1 2 3 4 5 6 7 0 2 4 6 8 10 12 14 16 18 20
Figure 5.10 Scatterplot and Spatial Statistic Zs(x ) to Detect Spatial Outliers.
data point. We can easily observe that point In the iterative algorithms (Kou et al., 2003),
S has a Zs(x) value exceeding 3, and will be only one outlier is detected in each iteration,
detected as a spatial outlier. Note that the two and then its attribute value is modified in
neighboring points P and Q of S have Zs(x) subsequent iterations so that it does not have
values close to 2 due to the presence of a negative impact in detecting a new outlier.
spatial outliers in their neighborhoods. The median-based algorithm (Kou et al.,
Designing computationaly efficient tech- 2003) reduces the impact of the presence of
niques to find spatial outliers is important. data points with extreme high or low attribute
One efficient method is to compute the global values.
statistical parameters using a spatial join
(Shekhar et al., 2003). In this method, the
algorithm computes the algebraic aggregate
functions in a single scan of a spatial self-join
5.4.5. Spatial co-location rules
from a spatial dataset using a neighborhood
relationship. The computed values from Boolean spatial features are geographic
the algebraic aggregate functions can be object types which are either present or
used to validate the outlier measure of a absent at different locations in a two-
dataset. dimensional or three-dimensional metric
A drawback in most of the techniques to space, e.g., the surface of the Earth. Examples
detect multiple spatial outliers is that some of Boolean spatial features include plant
of the data points are misclassified, i.e., either species, animal species, road types, cancers,
some of the true spatial outliers are ignored crime, and business types. Co-location pat-
or some points are wrongly identified as terns represent the subsets of the Boolean
spatial outliers. This misclassification occurs spatial features whose instances are often
because most algorithms tend not to take into located in close geographic proximity. Exam-
account the effect of an outlier in the neigh- ples include symbiotic species, e.g., Nile
borhood of another outlier. To overcome this crocodile and Egyptian plover in ecology, and
problem, iterative algorithms and a median- frontage roads and highways in metropolitan
based non-iterative algorithm can be used. road maps.
78 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
90
80
70
60
50
40
30
20
10 River
Roads
0
0 10 20 30 40 50 60 70 80 90
Figure 5.11 (a) Illustration of point spatial co-location patterns. Shapes represent different
spatial feature types. Spatial features in sets {+, ,} and (o, ) tend to be located
together. (b) Co-location between roads and rivers. (Courtesy: Architecture Technology
Corporation).
Co-location rules are models to infer associations with support values larger than a
the presence of Boolean spatial features user given threshold. The purpose of mining
in the neighborhood of instances of other association rules is to identify frequent item
Boolean spatial features. For example, Nile sets for planning store layouts or marketing
Crocodiles Egyptian Plover predicts the campaigns. In the spatial co-location rule
presence of Egyptian Plover birds in areas mining problem, transactions are often not
with Nile Crocodiles. Figure 5.11(a) shows explicit. The transactions in market basket
a dataset consisting of instances of several analysis are independent of each other.
Boolean spatial features, each represented by Transactions are disjoint in the sense of not
a distinct shape. A careful review reveals sharing instances of item types. In contrast,
two co-location patterns, i.e., {+, } the instances of Boolean spatial features
and {, } are embedded in a continuous space and
Co-location rule discovery is a process share a variety of spatial relationships (e.g.,
to identify co-location patterns from large neighbor) with each other.
spatial datasets with a large number of
Boolean features. The spatial co-location
rule discovery problem looks similar to, Co-location rule approaches
but, in fact, is very different from the Approaches to discovering co-location rules
association rule mining problem (Agrawal can be categorized into two classes, namely
and Srikant, 1994) because of the lack spatial statistics, and data mining approaches.
of transactions. In market basket datasets, Spatial statistics-based approaches use mea-
transactions represent sets of item types sures of spatial correlation to characterize
bought together by customers. The support the relationship between different types of
of an association is defined to be the fraction spatial features. Measures of spatial cor-
of transactions containing the association. relation include the cross K-function with
Association rules are derived from all the Monte Carlo simulation (Cressie, 1993),
AVAILABILITY OF SPATIAL DATA MINING TECHNIQUES 79
A1 C1 C1 A1 C1 C1 C1
A1 A1 A1
B1 B1 B1 B1 B1
A2 A2 A2 A2 A2
B2 B2 B2 B2 B2
C2 C2 C2 C2 C2
mean nearest-neighbor distance, and spatial Transactions over space can be defined by
regression models. Computing spatial corre- a reference-feature centric model. Under
lation measures for all possible co-location this model, transactions are created around
patterns can be computationally expensive instances of one user-specified spatial feature.
due to the exponential number of candidate The association rules are derived using the
subsets given a large collection of spatial a priori (Agarwal et al., 1993) algorithm.
Boolean features. The rules formed are related to the reference
Data mining approaches can be further feature. For example, consider the spatial
divided into a clustering-based map over- dataset in Figure 5.12(a) with three feature
lay approach and association rule-based types, A, B and C, each of which has two
approaches. A clustering-based map overlay instances. The neighborhood relationships
approach treats every spatial attribute as between instances are shown as edges.
a map layer. The spatial clusters (regions) Co-locations (A, B) and (B, C) may be
of point-data in each layer are candidates considered to be frequent in this example.
for mining associations. Given X and Y as Figure 5.12(b) shows transactions created
sets of layers, a clustered spatial association by choosing C as the reference feature.
rule is defined as X Y (CS, CC%), Co-location (A, B) will not be found since
for X Y = , where CS is the it does not involve the reference feature.
clustered support, defined as the ratio of Generalizing the paradigm of forming rules
the area of the cluster (region) that satisfies related to a reference feature to the case
both X and Y to the total area of the where no reference feature is specified is non-
study region S. CC% is the clustered trivial. Also, defining transactions around
confidence, which can be interpreted as the locations of instances of all features may
percentage of area of clusters (regions) of yield duplicate counts for many candidate
X that intersect with the area of clusters associations.
(regions) of Y . A distance-based approach was proposed
Association rule-based approaches can be concurrently by Morimoto (2001) and
divided into transaction- and distance-based Shekhar and Huang (2001). Morimoto
approaches. Transaction-based approaches defined distance-based patterns called
focus on defining transactions over space so k-neighboring class sets, in which instances
that an a priori-like algorithm can be used. of objects are grouped together based on
80 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
their Euclidean distance from each other. In of type B in their neighborhoods. The
Morimotos work, the number of instances conditional probability for the co-location
for each pattern is used as the prevalence rule is: spatial feature A at location l
measure, which does not possess an spatial feature type B in neighborhood is
anti-monotone property by nature. Since anti- 100%. This yields a well-defined prevalence
monotonicity is required for such algorithms, measure (i.e., support) without the need for
Morimoto used a non-overlapping constraint transactions. Figure 5.12(d) illustrates that
to get the anti-monotone property for the event-centric model will identify both
this measure. Also, it is possible that the (A, B) and (B, C) as frequent patterns.
instances of a k-neighboring class set are Prevalence measures and conditional prob-
different depending on the order the class ability measures, called interest measures, are
is added into the class set. This in turn defined differently in different models, as
yields different values of support of a summarized in Table 5.2. The transaction-
given colocation. Figure 5.12(c) shows based and distance-based k-neighboring class
two possible partitions for the dataset of sets materialize transactions and thus can
Figure 5.12(a), along with the supports for use traditional support and confidence mea-
co-location (A, B). sures. The event-centric approach defined
The distance-based approach by new transaction free measures, e.g., the
Shekhar and Huang (2001) eliminates participation index (see Shekhar and Huang
the non-overlapping-instance constraint. (2001) for details).
Their event-centric model finds subsets To find co-locations, much of the time is
of spatial features likely to occur in a spent in computing joins to identify instances
neighborhood around instances of given of candidate co-location patterns. To decrease
subsets of event types. For example, let this computation time, a partial-join based
us determine the probability of finding at approach (Yoo, 2004) or a join-less approach
least one instance of feature type B in the (Yoo, 2006) can be used. In the partial-
neighborhood of an instance of feature type join based approach, the number of instance
A in Figure 5.12(a). There are two instances joins for identifying candidate co-locations
of type A and both have some instance(s) are minimized by transactionizing a spatial
Table 5.4 Classication of algorithms for spatial autoregression model (Celik et al., 2006)
Method used Exact Approximate
Maximum Applying direct sparese matrix
likelihood algorithms, eigenvalue based ML based matrix exponential specication, graph theory approach, Taylor
1-D surface series approximation, Chebyshev polynomial approximation method,
Partitioning semiparametric estimates, characteristic polynomial approach, double
bounded likelihood estimator, upper and lower bounds via divide and
conquer, spatial autoregression local estimation
Bayesian None Bayesian matrix exponential specication, Markov Chain Monte Carlo
(MCMC)
Table 5.5 Interest measures of patterns for classical data mining and spatial data mining
Classical data mining Spatial data mining
Predictive model Classication accuracy Spatial accuracy
Cluster Low coupling and high cohesion in feature space Spatial continuity, unusual density, boundary
Outlier Different from population or neighbors in feature Signicant attribute discontinuity in geographic
space space
Association Subset prevalence, Spatial pattern prevalence
Pr [B T | A T , T : a transaction] Pr [B N (A) | N : neighborhood ] cross
Correlation K -Function
P Legend
A P P A P A = nest location
A = actual nest in pixel
P P
P = predicted nest in pixel
A A A A A A
(a) (b) (c) (d)
Figure 5.13 (a) The actual locations of nests. (b) Pixels with actual nests, (c) Location
predicted by a model. (d) Location predicted by another model. Prediction (d) is spatially
more accurate than (c).
Anselin, L. (1988). Spatial Econometrics: Methods and Li, S. (1995). Markov random eld modeling. Computer
Models. Dordrecht, Netherlands: Kluwer. Vision. Berlin: Springer Verlag.
Anselin, L. (1994). Exploratory spatial data analysis and Morimoto, Y. (2001). Mining frequent neighboring class
geographic information systems. In: Painho, M. (ed.), sets in spatial databases. In: Proc. ACM SIGKDD
New Tools for Spatial Analysis, pp. 4554. International Conference on Knowledge Discovery
and Data Mining.
Anselin, L. (1995). Local indicators of spatial
association: LISA. Geographical Analysis, 27(2): Mamoulis, N., Cao, H. and Cheung, D.W. (2005).
93115. Mining frequent spatio-temporal sequential patterns.
Fifth IEEE International Conference on Data Mining.
Barnett, V. and Lewis, T. (1994). Outliers in Statistical
Data. 3rd edn. New York: John Wiley. Penninga, F. (2005). 3D Topographic Data Modelling:
Why Rigidity Is Preferable to Pragmatism.
Besag, J.E. (1974). Spatial interaction and statistical
analysis of latice systems. Journal of Royal Statistical Quinlan, J. (1993). C4.5: Programs for Machine
Society, Ser. B, 36: 192236. Learning. New York: Kaufmann Publishers.
Besag, J.E. (1977). Comments on Ripleys paper. Roddick, J.-F. and Spiliopoulou, M. (1999).
Journal of the Royal Statistical Society. A bibliography of temporal, spatial and
spatio-temporal data mining research. SIGKDD
Bolstad, P. (2002). GIS Fundamentals: A First Text on Explorations, 1(1): 3438.
GIS. Eider Press.
Rogers, J.P., Shine Celik, J.A. and Shekhar, S.
Celik et al. (2006). NORTHSTAR: A Parameter (2006). Discovering Emerging Spatio-Temporal Co-
Estimation Method for the Spatial Auto-regression occurrence Patterns: A Summary of Results.
Model.
Shekhar, S. and Chawla, S. (2003). A tour of spatial
Civilis, S.P.A. and Jensen, C.S. (2005). Techniques databases. New York: Prentice Hall.
for efcient road-network-based tracking of moving
Shekhar, S. and Huang, Y. (2001). Co-location
objects. IEEE Transaction on Knowledge and Data
rules mining: a summary of results. Proc. of
Engineering, 17(5).
Spatio-temporal Symposium on Databases.
Cressie, N.A. (1993). Statistics for Spatial Data, Shekhar, S., Lu, C.T. and Zhang, P. (2003).
New York: Wiley. A unied approach to detecting spatial outliers.
Han, J., Kamber, M. and Tung, A. (2001). Spatial GeoInformatica, 7(2).
clustering methods in data mining: a survey. In: Shekhar, S., Schrater, P.R., Vatsavai, R.R., Wu, W. and
Miller, H. and Han, J. (eds). Geographic Data Mining Chawla, S. (2002). Spatial contextual classication
and Knowledge Discovery. New York: Taylor and and prediction models for mining geospatial data.
Francis. IEEE Transaction on Multimedia, 4(2).
Hawkins, D. (1980). Identication of Outliers. New Shekhar, S., Srivastava, J., Mane, A. and Murray, C.
York: Chapman and Hall. (2005). Spatial clustering of chimpanzee locations for
Jain, A. and Dubes, R. (1988). Algorithms for Clustering neighborhood identication. Fifth IEEE International
Data. New York: Prentice Hall. Conference on Data Mining, pp. 773740.
Jhung, Y. and Swain, P.H. (1996). Bayesian contex- Solberg, A. H., Taxt, T. and Jain, A. (1996). A Markov
tual classication based on modied M-estimates random eld model for classication of multisource
and Markov random elds. IEEE Transaction satellite imagery. IEEE Transaction on Geoscience
on Pattern Analysis and Machine Intelligence, and Remote Sensing, 34(1): 100113.
34(1): 6775. Taylor, G. H. (xxxx). Impacts of El Nino on Southern
Kolaczyk, G.S., Eric, D. and Ju Junchang, J. Oscillation on the Pacic Northwest.
(2005). Multiscale, Multigranular Statistical Image Tobler, W.R. (1979). In: Gale and Olsson (eds), Cellular
Segmentation. Geography, Philosophy in Geography. Dordrecht:
Reidel.
Kou, Y., Lu, C.T. and Chen, D. (2003). Algorithms
for spatial outlier detection. IEEE International Vatsavai, T.E.B. and Shekhar, S. (2005).
Conference on Data Mining. A semi-supervised learning method for remote
AVAILABILITY OF SPATIAL DATA MINING TECHNIQUES 87
Figure 6.1 Spatial patterns. (a) Positive spatial autocorrelation where cells with similar
values (gray tones) are nearby forming a patch. (b) Spatial randomness. (c) Negative spatial
autocorrelation where nearby cells have dissimilar values showing spatial repulsion.
(a)
(b)
Figure 6.2 Sources of spatial structures. (a) Spatial dependence where the spatial
distribution of environmental factors (here soil types A and B) constrained seeds spatial
distribution. The gray polygons represent different soil types where black seeds (circles) can
grow only soil type A (light gray polygon) and white seeds on soil type B (dark gray
polygon). (b) Spatial autocorrelation, on top of the spatial dependence described in (a),
where the seeds are dispersed by the trees.
the processes (Wagner and Fortin, 2005). statistics quantify the degree of self-similarity
These legacies of the spatial patterns on of a variable as a function of distance.
the processes (Peterson, 2002) can either These spatial statistics assume that, within
promote the spread of disturbances and the study area, the parameters of the function
disease or impede animal movement (e.g., defining the underlying process, such as
fragmentation due to roads). The result of the mean and the variance, are constant
the sequence of processes and feedbacks are regardless of the distance and direction
included in the observed data (Haining, this between the sampling locations. This prop-
volume). The question is then: At which erty of the random function is known as
scale should we spatial analyze the data spatial stationarity (Cressie, 1993). Then the
when it is the scale itself that we want to goal of spatial statistics is to test the null
determine? hypothesis of absence of spatial pattern.
Spatial statistics will save us, right? For each spatial statistic spatial pattern
No! is either spatial aggregation or segregation
Spatial statistics were developed to quan- (Ripleys K; join count statistics) or spatial
tify the degree of spatial aggregation (join autocorrelation (Morans I and Gearys c).
count, Ripleys K), spatial autocorrelation The null hypothesis implies that nearby
(Morans I, Gearys c) or spatial variance locations (or attributes, measures) do not
(semi-variance ; see Atkinson and Lloyd, affect one another such that there is indepen-
in this volume) over a study area where the dence and spatial randomness (Figure 6.2(b)).
mean and variance of the function describing The alternatives are that there is clustering
the process are constant with distance and and thus positive spatial autocorrelation
direction between locations. Thus, spatial (Figure 6.2(a)) or repulsion and negative
statistics can quantify patterns but cannot spatial auotocorrelation (Figure 6.2(c)).
identify their origin. The mathematical commonality of the
So what are spatial statistics good for? various spatial statistics is that they use the
To answer this fundamental question, we cross-product between a weighted function
summarize first how the most commonly relating the degree of distance (wij ) among
used spatial statistics estimate spatial patterns the sampling locations (n) and a function (Y )
and spatial autocorrelation. We stress how quantifying the degree of similarity among
spatial analyses of larger areas where there is the values of the variable (xij ) at these
more than one process impair the direct use sampling locations (Dale et al., 2002; Getis,
of spatial statistics and parametric statistics. 1991; Getis and Ord, 1992):
We present the statistical issues and the
recent developments aiming to address them.
n
n
Then, we conclude by commenting on some wij (d)Yij (x)
unresolved challenges in the field of spatial i=1 j=1
i=j j=i
statistics. Statistic (d) =
C(w, d)
Table 6.1 Similarity functions and signicance test procedures according to the
spatial statistics
Global spatial statistics Signicance test
Point data Ripleys K : for each radius t, the statistic sums Assess using a condence envelope based on a
the indicator function It (i , j ) that counts, at randomization procedure (complete spatial
each point, the number of points within a randomness).
circle of radius t (Figure 6.3).
Join count statistics: The statistics count the Assess by comparing the observed frequencies of
number of links of matching Jrr and links to those expected under the null
mismatching Jrs categories. hypothesis of randomness.
Polygon data Join count: The statistics count the number of Assess by comparing the observed frequencies of
links of matching Jrr and mismatching Jrs links to those expected under the null
categories. hypothesis of randomness.
Quantitative data Morans I : The statistic sums the deviation of the Assess using either a randomization procedure or
values at a given distance lag from the mean a normal distribution approximation test where
of the variable, Yij (x ) = (xi x ) xj x . the expected value of absence of spatial
autocorrelation is EN (I ) = ER (I ) = (n 1)1 .
Gearys c : The statistic sums the squared Assess using either a randomization procedure or
deviation of the values at a given distance a normal distribution approximation test where
lag, Yij (x ) = xi xj 2 . the expected value of absence of spatial
autocorrelation is EN (c ) = ER (c ) = 1.
Ripleys k
Figure 6.3 Search window types to determine the distance weight among sampling
locations according to the data types and spatial statistics. For points data, the geographical
coordinates of objects as well as their attributes (black or gray) need to be surveyed for the
entire study area. Join count statistics require rst to establish the link network among the
sampling locations. Here we used a Delaunay tessellation network (Fortin and Dale, 2005;
Okabe et al., 2000) to determine the links. Ripleys K is using circles of radius t at each
point. For categorical data from polygons, join count statistics can be used where the links
are determined using the centroid of the polygons. For quantitative data, spatial
autocorrelation coefcients (Morans I, Gearys c ) can be computed using either a link
network among the sampling locations, a search window from each sampling location or
from each cell (quadrat) from a grid (quadrats, cells).
to them. Because spatial stationarity is this volume; Fortin and Dale, 2005). While
assumed, the shape of the search window this feature was used early on in geostatistics
is isotropic (Figure 6.3) and the intensity (Atkinson, in this volume; Journel and
of the spatial pattern is measured as if Huijbregts, 1978), it took longer to become
it were the same, whatever the direc- common practice in other applications of
tion. In natural systems, this assumption spatial statistics (Oden and Sokal, 1986). The
is often not realistic, as water flow and use of these directional weights still assumes
wind are mostly directional processes. Such that the process can occur over the entire area.
directional processes generate anisotropic It is not always the case, as in studying fish
patterns for which the characteristics depend pools for example, where the spatial patterns
on direction (Figure 6.2). Isotropic search we need to consider are only those of the
windows are not able to detect anisotropic aquatic network itself. In addition, proximity
patterns and therefore, weights are needed in an aquatic network (Figure 6.4) cannot
to compute spatial autocorrelation according be determined using Euclidean distance as
to direction as well as distance (Dubin, in in terrestrial systems (Figure 6.2), but rather
SPATIAL AUTOCORRELATION 95
The same Euclidean distance but not the same path length
Figure 6.4 Aquatic network path length does not match the Euclidean distance among
sampling locations.
requires a topological basis for proximity small, and it could not include enough
(Fortin and Dale, 2005; Okabe et al., data to characterize the pattern; too large,
2000). Okabe and Yamada (2001) used such and it could cover several patterns from
network weights to account for the particular various sources and at different scales as
topology of spatial networks for computing already mentioned above (Dungan et al.,
Ripleys K. 2002; Fortin and Dale, 2005). Increasing
the extent of the study area implies that
more processes and environmental factors
may alter the variable of interest. Usually,
6.3. EFFECTS OF THE EXTENT ON however, it is rare that we know in advance
GLOBAL SPATIAL STATISTICS at which extent to study a phenomenon. In
the absence of prior knowledge, researchers
As we are looking for spatial patterns should perform a pilot study to determine
in natural systems, several decisions will it (Dungan et al., 2002; Legendre et al.,
affect our ability to detect and quantify 2002). Unfortunately, the wealth of available
spatial structures: how the data are gathered data captured by remote sensing over large
(Dungan et al., 2002; Fortin and Dale, areas is tempting and we often succumb
2005; Legendre et al., 2002), the size of to the temptation. We use all the data
the study area (the extent), and the size available to us. We go fishing for spatial
of the sampling units (the grain). Here patterns. The problem is that the larger the
we will focus on the change of extent size extent, the more likely it is that several
as it has a direct effect on the spatial environmental factors and processes operate
stationarity of the area and the validity on the variable under investigation, resulting
of the global spatial statistics. The change in spatial nonstationarity with the spatial
of grain size is also important and it is patterns of several scales intermingled, or
known as the modifiable area unit problem that some processes have greater effects
(MAUP). A whole chapter is dedicated to in some sub-regions than in others. The
MAUP (Wong, in this volume), and so consequence of the resulting estimation by
we refer the readers to that part of this global spatial statistics of spatial autocor-
handbook. relation at various distances is that the
The extent of the study area affects average values of spatial autocorrelation
our ability to detect spatial patterns: too may not reflect any spatial pattern as the
96 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
spatial structures may be cancelling out each issues that affect the reliability of global
others signals. spatial statistics in estimating spatial autocor-
Even when the extent of the study area relation and how to minimize them within
is appropriate for the phenomena under the context of global analyses over the
study, our ability to determine adequately the entire study area. Another approach to deal
spatial pattern can be altered due to sampling with these issues is to measure spatial
issues, statistical issues, or a combination of autocorrelation locally using local spatial
both. One sampling issue is the mismatch statistics (Table 6.2). Local indicators of
between the location of the extent and the spatial autocorrelation (or spatial association,
process under study: if the actual location called LISA, Anselin, 1995) measure the
of the study area is a few meters north degree of spatial autocorrelation using, for
or south, it can cause the detected spatial example, Morans I algorithm for sampling
pattern to vary (Plante et al., 2004). From locations based only on the neighborhood
a statistical point of view, the number of around a given sampling location. The
neighboring points at the edge of the study neighborhood search window can be based
area is always smaller than at the center either on a link network or on distance classes
(as illustrated in Figure 6.3 by the sampling as in the global Morans I approach. Several
locations, at the centroid of patches or at variants of LISA having been developed in
the centroid of quadrats marked by squares). the same spirit of measuring local spatial
This edge effect is known and several edge association rather than autocorrelation such
correction algorithms have been proposed to as the local Getis and the local Ord statistics
adjust either for the edge, the corner or both (Boots, 2002; Fotheringham et al., 2000;
(Goreaud and Plissier, 1999; Haase, 1995; Getis and Ord, 1996; Ord and Getis, 1995,
Wiegand and Moloney, 2004). Similarly, 2001). One of the advantages of these
rectangular study areas will have pairs of local spatial statistics is that the values of
locations at the larger distance classes only in spatial autocorrelation (or spatial association)
one direction (Fortin 1999). To have a more can be mapped at each sampling location
comparable number of pairs of sampling allowing the identification of sub-regions
locations to estimate spatial autocorrelation, within the study area having positive (called
it is recommended to use distance classes hot spots) or negative (called cold spots)
no larger than half or two thirds of the autocorrelation values (Wulder and Boots,
smallest side of the study area (Fortin and 1998). This is very useful when large
Dale, 2005) or to use equifrequent classes study areas are analyzed to determine how
where the number of pairs is kept constant homogeneous (or not) a region is. One
rather than the thresholds of Euclidean drawback, however, is that the significance
distances for succeeding classes (Sokal and test for each sampling location is based on
Wartenberg, 1983). the global estimate of spatial autocorrelation
for the entire study area and that assumes
spatial stationarity. In the absence of spatial
stationarity, the advantage of using local
6.4. LOCAL SPATIAL STATISTICS: spatial statistics over larger areas is cancelled
ONE STEP IN A GOOD by the lack of significance test. This is why
DIRECTION recently researchers have been developing
new procedures to assess local significance
The previous section presented some of that account for the global estimate of spatial
the most common sampling and statistical autocorrelation (Ord and Getis, 2001; Kabos
SPATIAL AUTOCORRELATION 97
and Csillag, 2002). Even with these newer 6.5. SPATIAL AUTOCORRELATION
methods to test significance, one cannot IMPLICATIONS FOR
apply a Bonferronis correction to adjust for PARAMETRIC AND
the multiple tests for each coefficient as RANDOMIZATION
for the global spatial statistics (Fortin and SIGNIFICANCE TESTING
Dale, 2005) because the tests may be highly
correlated, and there are usually too many One important feature of spatial dependence
sampling locations so often no coefficients in data is that positive spatial autocorrela-
would appear significant. However, the tion makes parametric statistical tests too
mapping of a local spatial coefficient value liberal, in that they produce more apparently
at each sampling location has been found significant results than the data actually
a very informative tool for exploring the justify. A simple intuitive explanation is
characteristics of spatial data (Fotheringham, that because of the lack of independence,
1997; Fotheringham and Brunsdon, 1999; at least some of the information of sam-
Pearson, 2002; Sokal et al., 1998). In ple i is contained in adjacent samples
the same spirit of analyzing locally spatial and so instead of having the information
pattern and the underlying factor or process of n independent samples, we have the
responsible for it, geographically weighted information appropriate to fewer samples,
regression can be used (see Fotheringham, in n, called the effective sample size (cf.
this volume). Cressie, 1993). It is tempting to suggest
98 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
Table 6.3 Correction procedures for the presence of spatial autocorrelation for
parametric tests
General concepts Correction procedure
Parametric tests Cressie (1993)
Univariate tests
No general solution: model & Monte Carlo Mizon (1995); Dale and Fortin (2002)
Bivariate tests
Correlation Modied t -test (Clifford et al., 1989; corrected by Dutilleul (1993)
Linear regression Alpargu and Dutilleul (2003)
Partial correlation Alpargu and Dutilleul (2006)
2 2 contingency table Cerioli (1997)
R C contingency table Cerioli (2002)
CochranArmitage Cerioli (2003)
Multivariate
Following DutilleulCerioli approach Speculation!
that, based on the work of Cressie and data, whereas a Monte Carlo procedure
others (see below), we should be able to produces new data of similar structure.)
use the autocorrelation structure of the data In either case, the presence of spatial autocor-
in order to calculate the correct effective relation in the data impairs the fundamental
sample size for testing. For univariate tests, assumption of randomization tests which is
this approach does not seem to work well, that each labeling (attributes, values) can be
and Dale and Fortin (2002) suggest the exchangeable randomly (Figure 6.5 (a,b)).
approach of modeling the data by refining Depending on the type of spatial autocor-
a general ARMA (Auto-Regressive Moving relation, modified randomization procedures
Average) model followed by the Monte (or simply restricted randomization tests)
Carlo generation of artificial data sets can be used where the data are random-
with similar autocorrelation structure for ized with some specific spatial restriction.
comparison. For bivariate data, the effective For example, in Figure 6.5(ac), the data
sample size method seems to work well for a show marked regional differences along the
broad range of statistics (see Table 6.3), and south-westnorth-east diagonal. With such
we speculate that it will work for multivariate a spatial structure, a complete random-
data as well. ization test cannot be used, as illustrated
To avoid having to deal with the estimation in Figure 6.5(a), and a restricted one is
of the effective sample size, the use of more appropriate. One way to account for
randomization tests also seems attractive. this type of spatial structure is to have
Randomization tests (also called permutation, the study area partitioned into two regions
resampling or computer intensive tests) are (Figure 6.5(d)) and then the randomization
convenient when the goal of the study is is applied in each region separately. When
to assess the significance of the sample the data show spatial dependence due to
itself. When the goal is to make inferences underlying environmental factors, restricted
about the sampling population, a Monte Carlo randomization procedures that generate a
procedure should be used instead (Good, comparable degree of spatial autocorrelation
2000). (Permutation re-orders the original as that observed in the data can be helpful
SPATIAL AUTOCORRELATION 99
(a) (b)
5
5 5 6
5 6
4 6
7 6
4 7
7 4
7
4 4 2 4
2
7 7 2 7
2 7
4 6 6
4
3 6 6
6 3
5 5
6
3
3
3 5 3 5
2
2
5
2 5 2
(c) (d)
5
5 5 6
5 6
4 6
4 7 6 7
7
7 4 4
4 4 2
2
2 4 7
2 7 7 7
4 6
6
3 6 6
6 3 5
5
6
3
3 5
3 3
2 5
2
5
2 5 2
Figure 6.5 Randomization procedure. (a) Sampling locations with the quantitative values
of a given variable. (b) Complete spatial randomness of the values of the variable over
the sampling locations. (c) Same study area as in (a) where the dashed line delineates
two sub-regions having different mean values. (d) Restricted randomization within each
region.
(Fortin et al., 2003). To assess significance 6.6. HOW MANY SPATIAL SCALES?
with more complex spatial patterns in
which there is more than one spatial scale, A good practice to analyze larger regions
Goovearts and Jacquez (2004) proposed involves assessing first whether the spatial
a typology of increasing levels of spatial patterns of the data involve more than one
restrictions, that they called neutral models, spatial scale, and then relate each scale to a
to simulate more spatially realistic reference key factor or process. This was easy to say
distributions. but not so easy to do until recently. Two new
100 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
approaches have been proposed to use spatial 6.7. NEW ERA OF SPATIAL
scales as spatial predictors in regression ANALYSIS: CATEGORICAL DATA
or canonical analysis models (Borcard and
Legendre, 2002; Keitt and Urban, 2005). Spatial analysis of data requires a priori
Borcard and Legendre (2002) determined knowledge about the data and the under-
spatial predictors using principal coordi- lying processes. It requires as well good
nates of neighbor matrices (PCNM) that understanding of possibilities and limitations
decomposed spatial scales into orthogonal of the various spatial statistics available
spatial predictors based on the eigenvectors (Figure 6.6; see also Csillag and Boots,
of the positive eigenvalues of the principal 2005). The issues presented in this chapter
coordinates. The advantage of this method deal mostly with the context of spatial anal-
lies in the fact that neighborhoods, i.e., ysis of quantitative data. Over larger study
spatial scales, can be determined using areas, it is rare however that quantitative
the Euclidean distances among irregularly data are available and it is more likely
spaced sampling locations. Keitt and Urban that we need to rely only on qualitative
(2005) used the wavelet-coefficient of the data. The spatial analysis of categorical
wavelet transform at each decomposition data requires often that the questions are
level as spatial predictor in a multiple regres- revised (Figure 6.6) as well as the type
sion model. Unlike the PCNM approach, of spatial statistical tools. GIS packages
the wavelet decomposition requires that the offer a series of simple spatial descriptions
data are surveyed in a contiguous way of qualitative data (e.g., area, number of
as is the case with remotely sensed and patches) and several landscape metrics are
GIS raster data. These new approaches available to refine the spatial characterization
have a lot of potential to determine the of categorical data (Gustafson, 1998). More
relative importance of environmental factors work is still needed, however, to be able to
and processes in explaining the patterns determine the significance of these metrics
of data. so that they can be compared through time
One process
Yes No
and between sites (Fortin et al., 2003; Cerioli, A. (1997). Modied tests of independence
Remmel and Csillag, 2003, 2006). As for in 2 2 tables with spatial data. Biometrics,
spatial statistics for categorical data per se, 53: 619628.
recent methods were at the global level by Cerioli, A. (2002). Testing mutual independence
assessing spatial variance using a transiogram between two discrete-valued spatial processes:
a correction to Pearson chi-squared. Biometrics,
(Weidong, 2006) and at the local level
58: 888897.
developing new local measures of spatial
association (Boots, 2003). The use of mark Cerioli, A. (2003). The CochranArmitage trend test
under spatial autocorrelation. Proceedings of the
connection functions (Stoyan and Penttinen,
Conference Complex Models and Computational
2000) is also a promising area of further Methods for Estimation and Prediction. Treviso,
investigation, perhaps where the mosaic of Italy, September 2003.
patches is converted into a network of points Cliff, A.D. and Ord, J.K. (1981). Spatial Processes:
with marks which identify connections to Models and Applications. London: Pion.
first-order neighbors. Finally, there remains
Clifford, P., Richardson S. and Hmon, D. (1989).
the large problem of incorporating time, Assessing the signicance of correlation between
creating a spatio-temporal analysis to assess two spatial processes. Biometrics, 45: 123134.
the changes in spatial characteristics. Cressie, N.A.C. (1993). Statistics for Spatial Data,
Revised Edition. New York: Wiley.
Csillag, F. and Boots, B. (2005). A framework for
statistical inferential decisions in spatial pattern
REFERENCES analysis. The Canadian Geographer, 49: 172179.
Dale, M.R.T. and Fortin, M.-J. (2002). Spatial autocor-
Alpargu, G. and Dutilleul, P. (2003). To be or relation and statistical tests in ecology. coscience,
not to be valid in testing the signicance of 9: 162167.
the slope in simple quantitative linear models
with autocorrelated errors. Journal of Statistical Dale, M.R.T. and Powell, R.D. (2001). A new method
Computation and Simulation, 73: 165180. for characterizing point patterns in plant ecology.
Journal of Vegetation Science, 12: 597608.
Alpargu, G. and Dutilleul, P. (2006). Stepwise
Dale, M.R.T., Dixon, P., Fortin, M.-J., Legendre, P.,
regression in mixed quantitative linear models with
Myers, D.E. and Rosenberg, M. (2002). The
autocorrelated errors. Communications in Statistics
conceptual and mathematical relationships
Simulation and Computation, 35: 79104.
among methods for spatial analysis. Ecography,
Anselin, L. (1995). Local indicators of spatial associa- 25: 558577.
tion LISA. Geographical Analysis, 27: 93115.
Delmelle (Chapter 10 Spatial sampling).
Atkinson and Lloyd (Chapter 9 Geostatistics) Diggle, P.J. (1983). Statistical Analysis of Spatial Point
Bjrnstad, O.N. and Falck, W. (2001). Nonparametric Patterns. London: Academic Press.
spatial covariance functions: estimation and testing. Dixon, P.M. (2002). Nearest-neighbor contingency
Environmental and Ecological Statistics, 8: 5370. table analysis of spatial segregation for several
Boots, B. (2002). Local measures of spatial association. species. coscience, 9: 142151.
coscience, 9: 168176. Dubin (Chapter 8 Spatial Weights).
Boots, B. (2003). Developing local measures of Dubin, R.A. (1998). Spatial autocorrelation: A primer.
spatial association for categorical data. Journal of Journal of Housing Economics, 7: 304327.
Geographical Systems, 5: 139160.
Dungan, J.L., Perry, J.N., Dale, M.R.T., Legendre, P.,
Borcard, D. and Legendre, P. (2002). All-scale Citron-Pousty, S., Fortin, M.-J., Jakomulska, A.,
spatial analysis of ecological data by means Miriti, M. and Rosenberg, M.S. (2002). A balanced
of principal coordinates of neighbour matrices. view of scale in spatial statistical analysis. Ecography,
Ecological Modelling, 153: 5168. 25: 626640.
102 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
Dutilleul, P. (1993). Modifying the t test for assessing the case of lung cancer in Long Island, New York.
the correlation between two spatial processes. International Journal of Health Geographics, 3: 14.
Biometrics, 49: 305314.
Goreaud, F. and Plissier, R. (1999). On explicit
Epperson, B.K. (2003). Covariances among join- formulas of edge effect correction for Ripleys
count spatial autocorrelation measures. Theoretical K -function. Journal of Vegetation Science, 10:
Population Biology, 64: 8187. 433438.
Fortin, M.-J. and Dale, M.R.T. (2005). Spatial Analysis. Green, J.L., Hastings, A., Arzberger, P., Ayala, F.J.,
A Guide for Ecologists. Cambridge: Cambridge Cottingham, K.L., Cuddington, K., Davis, F.D.,
University Press. Dunne, J.A., Fortin, M.-J., Gerber, L. and Neubert, M.
(2005). Complexity in ecology and conserva-
Fortin, M.-J. and Jacquez, G.M. (2000). Randomization
tion: mathematical, statistical, and computational
tests and spatially autocorrelated data. Bulletin of
challenges. BioScience, 55: 501510.
the Ecological Society of America, 81: 201205.
Gustafson, E.J. (1998). Quantifying landscape spatial
Fortin, M.-J., Boots, B., Csillag, F. and Remmel, T.K.
pattern: What is the state of the art? Ecosystems,
(2003). On the role of spatial stochastic models in
1: 143156.
understanding landscape indices in ecology. Oikos,
102: 203212. Haase, P. (1995). Spatial pattern analysis in ecology
based on Ripleys K -function: Introduction and
Fortin, M.-J., P. Drapeau, P. and Legendre, P.
methods of edge correction. Journal of Vegetation
(1989). Spatial autocorrelation and sampling design.
Science, 6: 575582.
Vegetatio, 83: 209222.
Fotheringham (Chapter 13 GWR). Haining (Chapter 2 Nature of Spatial Data).
Fotheringham, A.S. (1997). Trends in quantitative Haining, R. (2003). Spatial Data Analysis: Theory and
methods I: stressing the local. Progress in Human Practice. Cambridge: Cambridge University Press.
Geography, 21: 8896. Journel, A.G. and Huijbregts, C. (1978). Mining
Fotheringham, A.S. and Brunsdon, C. (1999). Local Geostatistics. London: Academia Press.
forms of spatial analysis. Geographical Analysis, Kabos, S. and Csillag, F. (2002). The analysis of
31: 340358. spatial association on a regular lattice by join-
Fotheringham, A.S., Brunsdon, C. and Charlton, M. count statistics without the assumption of rst-
(2000). Quantitative Geography: Perspectives on order homogeneity. Computers and Geosciences,
Spatial Data Analysis. London: Sage Publications. 28: 901910.
Getis, A. (1991). Spatial interaction and spatial auto- Keitt, T.H. and Urban, D.L. (2005). Scale-specic
correlation: a cross product approach. Environment inference using wavelets. Ecology, 86: 24972504.
and Planning A, 23: 12691277. Legendre, P., Dale, M.R.T., Fortin, M.-J., Gurevitch,
Getis, A. and Ord, J.K. (1992). The analysis of J., Hohn, M. and Myers, D.E. (2002). The
spatial association by use of distance statistics. consequences of spatial structure for the design and
Geographical Analysis, 24: 189206. analysis of ecological eld surveys. Ecography, 25:
601615.
Getis, A. and Ord, J.K. (1996). Local spatial
statistics: an overview. In: Longley, P and Lichstein, J.W., Simons, T.R., Shriner, S.A. and Franzreb,
Batty, M. (eds), Spatial Analysis: Modelling in K.E. (2002). Spatial autocorrelation and autore-
a GIS Environment, pp. 261277. Cambridge: gressive models in ecology. Ecological Monographs,
GeoInformation International. 72: 445463.
Good, P. (2000). Permutation Tests: A Practical Guide Mizon, G.E. (1995). A simple message for autocor-
to Resampling Methods for Testing Hypotheses, 2nd relation correctors: dont. Journal of Econometrics,
edn. New York: Springer-Verlag. 69: 267289.
Goovaerts, P. and Jacquez, G.M. (2004). Accounting Oden, N.L. and Sokal, R.R. (1986). Directional
for regional background and population size in autocorrelation: an extension of spatial correlo-
the detection of spatial clusters and outliers using grams to two dimensions. Systematic Zoology, 35:
geostatistical ltering and spatial neutral models: 608617.
SPATIAL AUTOCORRELATION 103
Okabe, A. and Yamada, I. (2001). The K -function Remmel, T.K. and Csillag, F. (2003). When are two
method on a network and its computational imple- landscape pattern indices signicantly different?
mentation. Geographical Analysis, 33: 270290. Journal of Geographical Systems, 5: 331351.
Okabe, A., Boots, B., Sugihara, K. and Chiu, S.N. Remmel, T.K. and Csillag, F. (2006). Mutual information
(2000). Spatial Tessellations: Concepts and Appli- spectra for comparing categorical maps. Interna-
cations of Voronoi Diagrams, 2nd edn. Chichester: tional Journal of Remote Sensing, 27: 14251452.
John Wiley.
Ripley, B.D. (1981). Spatial Processes. New York: John
Ord, J.K. and Getis, A. (1995). Local spatial Wiley.
autocorrelation statistics: distributional issues and an Sokal, R.R., Oden, N.L. and Thomson, B.A. (1998).
application. Geographical Analysis, 27: 286306. Local spatial autocorrelation in biological variables.
Ord, J.K. and Getis, A. (2001). Testing for local Biological Journal of the Linnean Society, 65: 4162.
spatial autocorrelation in the presence of global Sokal, R.R. and Wartenberg, D.E. (1983). A test of
autocorrelation. Journal of Regional Science, 41: spatial autocorrelation using an isolation-by-distance
411432. model. Genetics, 105: 219237.
Pearson, D.M. (2002). The application of local Stoyan, D. and Penttinen, A. (2000). Recent applica-
measures of spatial autocorrelation for describing tions of point process methods in forestry statistics.
pattern in north Australian landscapes. Journal of Statistical Science, 15: 6178.
Environmental Management, 64: 8595.
Wagner, H.H. and Fortin, M.-J. (2005). Spatial analysis
Perry, J.N., Liebhold, A.M., Rosenberg, M.S., Dungan, of landscapes: concepts and statistics. Ecology,
J., Miriti, M., Jakomulska, A. and Citron- 86: 19751987.
Pousty, S. (2002). Illustrations and guidelines
for selecting statistical methods for quantifying Weidong, L. (2006). Transiogram: A spatial relationship
spatial pattern in ecological data. Ecography, 25: measure for categorical data. International Journal of
578600. Geographical Information Science, 20: 693699.
Peterson, G.D. (2002). Contagious disturbance, eco- Wiegand, T. and Moloney, K.A. (2004). Rings, circles
logical memory, and the emergence of landscape and null-models for point pattern analysis in ecology.
pattern. Ecosystems, 5: 329338. Oikos, 104: 209229.
Wong (Chapter 7 MAUP).
Plante, M., Lowell, L., Potvin, F., Boots, B. and
Fortin, M.-J. (2004). Studying deer habitat on Anti- Wulder, M. and Boots, B. (1998). Local spatial
costi Island, Qubec: Relating animal occurrences autocorrelation characteristics of remotely sensed
and forest map information. Ecological Modelling, imagery assessed with the Getis statistic. Interna-
174: 387399. tional Journal of Remote Sensing, 19: 22232331.
7
The Modiable Areal Unit
Problem (MAUP)
David Wong
results due to different spatial configuration In remote sensing or raster modeling, the
or partitioning schemes as the modifiable basic areal units are pixels or grid cells.
areal unit problem. Each pixel or cell can be regarded as a
This chapter is intended to provide an spatially discrete unit. These units can be
overview of the MAUP. Although several of different sizes or resolutions. Where the
overviews of the MAUP exist, they are edges of the pixels or cells are located
dated (e.g., Openshaw, 1984; Wong, 1995). is somewhat arbitrary. By shifting the grid
I explain the MAUP and its two sub-problems system slightly over space or changing the
in more detail in section 7.2. While existing size of the pixels or cells, a new dataset can be
literature has already elaborated upon the created. Thus, numerous raster-based datasets
impacts and scope of the MAUP, I provide an can be created and they will give us different
overview of some of its fundamental impacts results. Therefore, the MAUP is not limited
in section 7.3. In section 7.4, I use simulated to polygon or vector data, but also exists in
data and empirical datasets to illustrate the raster data.
processes creating the two MAUP sub- There are two dimensions through which
problems. In section 7.5, I summarize the we can partition space or draw boundaries.
research developments pertaining to the One is to focus on the spatial dimension
MAUP, with emphases upon the most recent by using different configurations to partition
decades. Different directions in handling and space and fixing the number of areal units to
searching for solutions for the MAUP are be derived in the study region. As discussed
reviewed in section 7.6, and this is followed earlier, there are many ways to partition
by a concluding remark. a region even if the number of areal
units is kept constant. In reality, we often
encounter this in the form of re-partitioning
or rezoning processes. A common example
7.2. WHAT IS THE MAUP? is the rezoning of school districts at the
local scale. In some cases, the number of
The essence of the MAUP is that there are schools or districts does not change. But
many ways to draw boundaries to demarcate because of changes in population distribution
space into discrete units to form multiple across the districts and/or in the capacities
spatial partitioning systems. These units of school facilities (such as through school
may serve administrative purposes, such as renovation or addition of structures), the
the counties in the U.S., or statistical or school district boundaries have to be redrawn
data gathering purposes, such as the census to accommodate the change. With new school
enumeration units of tracts, block groups boundaries, the student compositions of some
and blocks below the county level. Although schools according to the new boundaries may
these boundaries are often drawn along some be different from the original ones. Therefore,
physical features (such as rivers or roads) data tabulated according to the old and new
that may serve as physical barriers separating school boundary systems will give different
areal units, there are multiple ways to draw results. Note that the number of districts and
those boundaries. Thus multiple datasets of the population could be the same before and
the same area can be created and they will after the rezoning. Changes in the data occur
offer different descriptions of the areas and when the population is spatially regrouped
different analytical results. into different sub-units in the region.
But the process of drawing boundaries Another common example is congressional
should be interpreted beyond the literal sense. redistricting. Although redistricting may not
THE MODIFIABLE AREAL UNIT PROBLEM (MAUP) 107
inset in (c)
9 10
11 9
7 6 11
4
5
3 10
8
12
3
1
8
2
2 1
(a) (b)
Census tracts
Block groups
Census blocks
7 10
11 9
6
5 4
13 13
8
8
(c) (d)
Figure 7.1 107th Congressional Districts for Georgia; (b) 109th Congressional Districts for
Georgia; (c) 109th Congressional Districts around the Atlanta Region, Georgia; (d) Census
tracts, block groups and blocks of Washington, DC, 2000 Census.
be a perfect example, since the number of It is obvious that the two partitioning
congressional districts is likely different after systems have very different spatial patterns,
the redistricting process due to population although only two districts were added in
growth, it provides a good example to the 109th Congress for Georgia. No old
illustrate the significance and magnitude of district in the 107th Congress in Georgia
this dimension of the MAUP. Figure 7.1(a) maintained its territory in the 109th. Another
and (b) show two maps of the 107th obvious change is that the area around
and 109th congressional districts (CDs) in the Atlanta metropolitan area has become
Georgia. Figure 7.1(a) is the map of the 107th much more spatially fragmented to accom-
CDs and Figure 7.1(b) is for the 109th CDs. modate more CDs. Because of the spatial
108 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
complexity within that region, Figure 7.1(c) Note that when the U.S. is partitioned
was provided to show the details. It is according to the levels of census geography
impressive how some districts have such an described above (regiondivisionstate
irregular or non-compact shape. For instance, countytract-blockgroup-block), they form
CD 8 essentially has two sectors, very much a geographical hierarchy such that subdivi-
breaking up CD 11. CD 13 seems to have sions at the more detailed level are found
several pieces scattered around the city of only within, but not across, the larger units
Atlanta and stretched outward narrowly from involved. When other census units, such as
the city. The case of Atlanta CDs is a metropolitan areas, are involved, the situation
possible case of gerrymandering, and is a will not conform to a geographical hierarchy.
good example to illustrate how space can be Still, the general idea is that the region
partitioned in a seemingly infinite number can be subdivided to different levels of
of ways (Openshaw, 1996; Fotheringham, detail or spatial resolution, as in raster data.
2000). Figure 7.1(d) offers such an example using
When the number of areal units is fixed or Washington, DC, while only tracts, block
relatively stable, but boundaries are redrawn groups, and blocks are shown here.
to accommodate changes, this is basically Data are available at all of these census
a zoning process. Data gathered according geography levels. Census tract and census
to different zoning systems of the same block groups data are commonly used in
region will give us different depictions of the demographic and socioeconomic analyses.
region and different analytical results when But one cannot assume that analysis results
the data are analyzed. The inconsistency of from the census tract data will be consistent
the results based upon data from different with the results based on the block group
zoning systems is known as the zoning data. This inconsistency due to the use of
problem, one of the two sub-problems of data at different geographical scales or spatial
the MAUP. resolutions is known as the scale problem,
Another dimension through which we the second sub-problem of the MAUP. In the
can partition space is the scale dimension. next section, I will use simple examples to
Given a study region, we can partition illustrate some fundamental inconsistencies
the region to different levels of detail. of analytical results due to the zoning and
For instance, the U.S. is divided into scale problems.
four census regions, and each region is
further subdivided into divisions, giving
the entire U.S. nine divisions. Under each
division are states, which are essentially 7.3. FUNDAMENTAL IMPACTS
political and administrative units (Wong OF THE MAUP
and Lee, 2005, p. 8). Under each state,
we have counties and then census tracts, To date, most of the literature on the MAUP
block groups and census blocks. Under has been focused on the impacts of the
counties, those census units are enumera- problems. Before I provide a review of
tion or statistical units created for census the literature, I will illustrate some simple
data gathering, tabulation and dissemination effects of the MAUP using the Congressional
purposes. But they provide information Districts data of Georgia and the census data
about the region at a more geographically of Washington, DC.
detailed level than the state or county In the Georgia example, the boundaries of
levels. congressional districts (CDs) changed quite
THE MODIFIABLE AREAL UNIT PROBLEM (MAUP) 109
significantly between the 107th and 109th that in the 107th. On the other hand, CDs
Congresses. The maps in Figure 7.1 show along the southern side of the city of Atlanta
only the boundary changes without demon- became more populated by blacks when
strating the potential impacts on analysis due the boundaries changed from the 107th to
to this zoning effect. The redistricting of the the 109th, while whites tended to be more
109th Congress was based upon the 2000 numerous in CDs surrounding the outskirts
Census data. The 2000 Census data can also of Atlanta and the northeast part of the state.
be tabulated according to the boundaries of When one examines the legends of the
the 107th Congressional Districts in order two maps in Figure 7.2, it is easy to note
to assess how the rezoning affected the that: (1) the different visual patterns of the
characteristics of the CDs. Using simple GIS two maps are not due to using different
procedures, some population variables of the classification values; and (2) data tabulated
2000 Census were tabulated according to according to the two CDs have different
the 107th CD boundaries. Figure 7.2 shows statistical distributions such as minimum and
the percent black in each congressional maximum values. Table 7.1 shows some
district in Georgia in the 2000 Census, of the statistics in detail. Numerically, the
according to the boundaries of the two means from the two Congresses are different,
Congresses. although they are quite close. The 109th
The two maps show very different spatial CDs have a smaller range than the 107th
patterns of the AfricanAmerican population. CDs, but the standard deviation is slightly
The congressional district in southeast larger. When the correlation of percent
Georgia has a lower black concentration white and percent black is evaluated for the
according to the 109th when compared with two Congresses, the correlation for the 107th
Figure 7.2 Percent blacks in 2000 Census according to the boundaries of the 107th and
109th Congressional Districts (CDs).
110 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
Table 7.1 Selected statistics for the variable percent black for the 107th and 109th CDs
Variable: percent black Mean Minimum Maximum Standard deviation
107th CDs 30.19 3.21 62.62 17.60
109th CDs 28.70 3.40 56.10 18.70
Congress was 0.987 ( p < 0.001) and for When a small number of low value areas
the 109th Congress it was 0.9898 ( p < are surrounded by a large number of high
0.001), too close to tell that they are different. value areas, the scale effect tends to inflate
Although statistically the two means of the low value areas. On the other hand,
percent black for the two Congresses are not when a small number of high value areas are
significantly different, numerical difference surrounded by a large number of low value
in statistical values does raise some concerns areas, the scale effect tends to deflate the
about the consistency of analysis results high value areas. To summarize, a general
using data tabulated according to different characteristic of the scale effect is to smooth
spatial partitioning systems. This inconsis- out extreme values so that the range of the
tency attributable to zonal differences is part values is narrower. To verify this general
of the impact of the zoning effect. impact, Table 7.2 shows selected statistics
The most effective way to illustrate the of the variable at the census tract and block
scale effect of the MAUP is to use data at group levels. Although the means for the two
different levels of a geographical hierarchy. levels are not dramatically different, their
Figure 7.1(d) shows three levels of the maximum values and standard deviations are
census geography of the Washington, DC quite different, supporting the argument that
area. The lowest level, census block, has more aggregated data (tract) tend to have less
limited socioeconomic variables. Therefore, variation, since the aggregation process over
only census tract and block group data are scale smooths the variability.
used here. Figure 7.3 shows the variable If one follows the logic that more spatially
per capita income (PCI) for blacks at aggregated data are less variable, and this
the two census levels. The overall income logic is extended to analyze correlation
distributions depicted by the two maps are between variables, it is not difficult to come
quite similar higher levels in the northwest to the conclusion that data at the higher
and lower levels in the southeast. But the aggregation levels will likely have higher
block group map provides refined details correlation than more spatially disaggregated
that are otherwise concealed at the census data. By picking two variables, per capita
tract level. Some of the block groups in the income for black and median house value,
western part of the region have relatively low we can evaluate their correlation at the tract
PCI values. Because their neighboring block and block group levels. At the tract level,
groups had reasonable PCI levels for blacks, Pearsons correlation coefficient for the two
the overall tract level PCI values are medium variables is 0.6806 ( p < 0.001). At the block
to high. Similarly, there is one small block group level, the correlation is only 0.3867
group on the southeastern side that had ( p < 0.001). Apparently, the correlation at
a moderately high value. But because all the block group level was much lower than
neighboring block groups had lower PCI that at the census tract level. This impact
values, the aggregated value for that tract was of scale effect has long been recognized in
relatively low. the literature for many decades (Gehlke and
THE MODIFIABLE AREAL UNIT PROBLEM (MAUP) 111
Figure 7.3 Per capita income of blacks, census tract and block group levels.
Table 7.2 Selected statistics for the variable per capita income for blacks at the census
tract and block group levels
Variable: per capita Mean Minimum Maximum Standard deviation
income for blacks
Census tracts 21879 0 104731 15053
Block groups 23390 0 217910 20073
112 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
Biehl, 1934; Openshaw and Taylor, 1979; to form larger units (such as tracts), original
Robinson, 1956). Fotheringham and Wong values of the smaller units with some level
(1991) provided a more detailed statistical of variability are summarized or replaced by
explanation to the spatial scale-variant nature a representative value, which, in most cases,
of the correlation coefficient. is a measure of central tendency such as the
mean or median. Extreme values among the
smaller units are now removed and therefore
data more aggregated are becoming less
varied or more similar. Thus, the correlations
7.4. THE MAUP PROCESSES among variables tend to be higher with higher
levels of spatial aggregation. This nature of
7.4.1. The scale effect
the scale effect was best exemplified by the
The above analyses have demonstrated that work of Openshaw and Taylor (1979), which
the MAUP is relevant to even simple shows that the correlation coefficient could
mapping. Maps are often used to explore carry a wide range from a moderate level for
and visualize spatial patterns. The Georgia relatively disaggregated data, to a very high
example shows that to a large extent, the correlation level for highly aggregated data.
spatial pattern is a function of the partitioning Sometimes, slightly negative relationships at
system. Adopting different partitioning sys- the individual or disaggregated level can
tems can generate different patterns for the turn into moderate positive correlations when
same area, despite using the same variable. data are aggregated into larger areal units
The impacts on mapping for the scale effect, (Fotheringham and Wong, 1991).
in the specific example of Washington, DC, Although these correlation analyses are
are not very dramatic. The overall pattern quite straightforward, their results and
is quite persistent across different scales. general patterns have significant implications
However, it is dangerous to assume that the for conducting general statistical and spatial
scale effect has minimal impacts on mapping. analyses on data that may be tabulated and
In fact, many experiments and studies have disseminated at different spatial aggregation
shown that using data from different scale levels. With most multivariate statistics, the
levels can portray very different spatial relationships between variables are often
patterns. summarized by the correlation matrix, or
The impacts of the MAUP on mapping the variancecovariance matrix, and these
are quite obvious, but its impacts on serve as the foundation for analysis (Griffith
statistical analysis seem to be quite difficult and Amrhein, 1997). Data at higher levels
to comprehend and generalize. That is why of aggregation tend to inflate correlation
most of the literature on the MAUP has as compared to the disaggregated levels.
been on assessing its impacts on different Therefore, we can expect that analyses
subject areas (population, urban, vegetation, using more aggregated data will likely show
soil, etc.) and on different techniques (general stronger relationships than analyses using
statistics, spatial statistics, and mathematical more disaggregated data. To some extent, the
models). But the simple correlation analysis process of the scale effect and its general
above using Washington, DC census tract and impacts are quite predictable. Also, because
block group level data offers some insights the variancecovariance matrix is the core of
on the processes related to the scale effect almost all multivariate analyses, the impacts
and its potential impacts. As smaller areal of the MAUP on this matrix are propagated
units (such as block groups) are aggregated to various multivariate statistical techniques
THE MODIFIABLE AREAL UNIT PROBLEM (MAUP) 113
(e.g., Perle (1977) on factor analysis; Hunt Figure 7.4(a), similar values (variable 1) tend
and Boots (1996) on principal component to locate close to each other, exhibiting strong
analysis). positive spatial autocorrelation, a situation
quite common in reality (Odland, 1988;
Griffith, 1987). Figure 7.4(b) was created by
randomly shuffling the original 100 values
7.4.2. The zoning effect
and assigning them to different cells to
For the zoning effect, its process and its create variable 2. As a result, the pattern
general impacts seem to be more difficult is somewhat random. In addition, I have
to assess and comprehend. There are several created two zoning patterns: the first pattern,
variables acting both independently and Configuration 1 in Figure 7.4, follows closely
together to determine the impacts of the the patterns of Variable 1 in Figure 7.4(a); the
zoning effect. To illustrate the roles of second pattern, Configuration 2 in Figure 7.4,
some of these variables, I have created two cuts through different zones of Variable 1.
artificial landscapes (Figure 7.4a and b). Both When Configuration 1 is applied to Vari-
landscapes have the same number of areal able 1, we expected that the general spatial
units (100) and the same set of values. For trend will likely be preserved, while this
Configuration 1
Variable 1
2 23
24 40
41 57
58 75
76 96
(a)
Configuration 2
Variable 2
2 23
24 40
41 57
58 75
76 96
(b)
Figure 7.4 100 units with (a) positive spatial autocorrelation and (b) random pattern; and
two hypothetical zoning systems: Conguration 1 and Conguration 2.
114 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
Configuration 1 Configuration 2
Variable 1 Variable 1
15 42.8
15 37 42.8 47.28
37 57.76 47.28 49.56
57.76 84.64 49.56 54.76
Variable 2 Variable 2
40.68 43.24
40.68 49.56 43.24 45.12
49.56 50.56 45.12 46.44
50.56 53.5 46.44 59.6
will not be the case when Configuration 2 the spatial pattern. When Configuration 2
is applied to Variable 1. On the other was applied to Variable 1, the minimum
hand, because the spatial distribution of value was inflated, but the standard deviation
Variable 2 is somewhat random, imposing was greatly suppressed. For Variable 2, we
Configurations 1 or 2 will unlikely create see no obvious differences in statistics when
major differences. Figure 7.5 shows the different configurations were applied, as the
results; besides Variable 1Configuration 1, values are spatially random. In other words,
we cannot identify any pattern. Note that the zoning effect will be minimal if the
the ranges of values in other situations phenomenon exhibits a somewhat random
are much smaller than that in Variable pattern. But if the phenomenon exhibits
1Configuration 1. strong positive spatial autocorrelation, then
Table 7.3 shows the details for some of the we should expect some significant impacts
statistics. The first row in Table 7.3 (V1 and due to the zoning effect.
V2) lists the statistics of the original values. Besides the spatial distribution of the
Assuming that we aggregate the original 100 data, another major factor in determining
areal units into four larger units by taking the impacts of the MAUP is the spatial
the averages of the original values, the first aggregation mechanism, or the process used
batch of rows shows the results from the to derive a representative value for the
averaging process. The only situation that aggregated units. The above example used
can preserve some of the statistics (standard averaging as the process, i.e., the average
deviation and to some extent maximum) of value of the original data will be used for the
the original values reasonably well is V1C1, aggregated unit. But there are other possible
when the spatial configuration coincides with choices for the representative values, such
THE MODIFIABLE AREAL UNIT PROBLEM (MAUP) 115
Table 7.3 Selected statistics for using the two hypothetical congurations (1 and 2) to
aggregate Variable 1 and Variable 2 in Figure 5
Variables (V1 and V2) and Mean Minimum Maximum Standard deviation
congurations (C1 and C2)
V1 and V2 49.00 2.00 96.00 27.00
as median, minimum, maximum and others. One needs to recognize that this division
The second batch of rows in Table 7.3 shows is somewhat artificial and not exclusive in
the aggregation results when the minimum nature, since their labels simply reflect the
values are taken as the representative values. dominant types of research during those
Again, applying Configuration 1 to periods.
Variable 1 best preserves the original
information, but still the results are quite
different from using the averaging process
7.5.1. Discovery and impact
and the original values. Therefore, how
assessment
values of sub-units are aggregated to larger
units will also affect the magnitude of the The impacts of MAUP have been
MAUP. Although our discussion focuses on documented thoroughly. Given that changing
the zoning effect, both the spatial distribution the correlation among variables is a typical
of the data and the aggregation mechanism and fundamental impact of the MAUP, it
are also applicable to explain the scale effect. is not surprising to find that most statistical
analyses are subject to the MAUP. In
addition, non-statistical-based mathematical
models or quantitative methods are also
7.5. STAGES OF THE MAUP likely impacted by the MAUP. Although
RESEARCH Openshaw and Taylor coined the term, many
researchers prior to them had documented
In the following section, I attempt to provide some aspects of the MAUP. The earliest
an account of MAUP research over the past seems to be the work by Gehlke and Biehl
several decades. To facilitate the discussion, (1934), who reported patterns of correlation
I divide the history into two periods charac- coefficient changes when census tracts were
terized by the major types of MAUP research grouped differently. Another early work by
appearing during those periods: discovering Robinson (1956) moved a step forward by
and assessing the impacts of the problem; and arguing that a weighting scheme was neces-
conceptualizing and formulating solutions. sary to correct the correlation coefficient to
116 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
account for different numbers of observations his entropy-based approach to deal with
among areal units. While not targeted at the the aggregation problem in the context
MAUP specifically, Moellering and Tobler of developing gravity-based models (Batty
(1972) offered a better understanding of the and Sikdar, 1982ad). Putman and Chung
smoothing process of the scale effect by (1989) also joined the British geographers
explaining how variance changes over scale to address zonal design issues for spatial
levels. Sawicki (1973), and later Clark and interaction models. Blair and Miller (1983)
Avery (1976), launched among the earliest demonstrated the impacts of MAUP on
attempts to assess the MAUP effects on inputoutput models.
general statistical analyses. Perle (1977) The formation of the NCGIA and the
explicitly links the MAUP to the issue of launching of the spatial data accuracy
ecological fallacy (Wong, 2007), although research initiative created a boost for the
the potential problems of using ecological MAUP research since 1989. Fotheringham
correlation to infer individual behavior (1989) called for the recognition of scale
were well documented (Robinson, 1950). sensitivity issues in spatial analysis, as well
Parallel to these developments, some British as the need to perform multi-scale analyses.
geographers, including Openshaw, focused In the same volume, Tobler (1989) argued
on a related issue of developing optimal zonal that the MAUP is a spatial problem and
systems, partly for regionalization purposes therefore the solution has to be spatial
and partly to deal with the MAUP problem. in nature. Subsequently, he proposed a
As Batty (1976) adopted the information migration modeling framework that was
approach to handle spatial aggregation, not sensitive to scale changes, probably
others aimed at designing the best zonal the first scale-independent spatial analytical
systems to support spatial interaction technique to be introduced. Unrelated to
modeling (Masser and Brown, 1975; the development of NCGIA, Arbia (1989)
Openshaw, 1977a, b, 1978a, b). Creating published a highly in-depth monograph
zones or regions is often needed in regional addressing the MAUP.
analysis, and these zones or regions provide
the basis for locationallocation models.
Goodchild (1979) first recognized the MAUP
7.5.2. From conceptualization
effect on locationallocation modeling.
to problem solving
Mathematical modelers occasionally picked
up this topic (Fotheringham et al., 1995; With the NCGIA research initiative on spatial
Hodgson et al., 1997; Murray and Gottsegen, data accuracy as the platform, a new wave
1997; Horner and Murray, 2002), but these of research activities on the MAUP began in
studies were limited to assessing the impacts the early 1990s, starting with the paper by
of the MAUP. Fotheringham and Wong (1991), a frequently
After Openshaw and Taylor coined the cited paper, systematically addressing the
term MAUP in 1979, the next major impacts of the MAUP on correlation analysis
concerted effort in addressing the MAUP and regression models. While researchers
started around 1989, partly due to the were still interested in, and to some extent
research initiative of the National Center for obsessed with the impacts of the MAUP,
Geographic Information Analysis (NCGIA) the community had gradually moved toward
on data accuracy. In between, there were finding solutions to the MAUP. This search
intermittent developments in identifying dif- for solutions was in parallel to the effort
ferent aspects of the MAUP. Batty continued of several researchers who had provided
THE MODIFIABLE AREAL UNIT PROBLEM (MAUP) 117
evidence that the MAUP effects may not be optimal zoning was firmly believed to be a
as pervasive as some others claimed (e.g., potential solution to the MAUP in the past
Fotheringham and Wong, 1991). Amrhein (Openshaw, 1977a), and this direction was
and Flowerdew (1989) show that the MAUP still an interest at this stage of the research
has limited impact on Poisson regressions. (Openshaw and Schmidt, 1996).
Trying to identify when MAUP will be Most of the research on the MAUP
significant, Amrhein (1995) and Amrhein mentioned above focused on aggregating
and Reynolds (1996) conducted a series of polygon feature data, a popular operation
simulation, controlling for various statistical in manipulating vector format data in GIS
properties of the data, including various and frequently used in the handling of
levels of spatial autocorrelation. They con- socioeconomic phenomena. However, the
cluded that the MAUP effects may not be impacts of the MAUP are also present in
significant given certain levels of aspatial and physical geography, environmental modeling
spatial correlation among variables, but their and in general, the analysis of raster format
relationships are extremely complex. While data. Outside of human geography, some
most impact analyses of the MAUP focused landscape ecologists and physical geogra-
on statistical or mathematical modeling, phers started developing an appreciation
some analyses were more narrowly focused of the MAUP problems (Jelinski and Wu,
on index formulations, particularly using 1996), and a series of research followed this
indices to measure segregation (Wong, 1997; direction. While Arbia et al. (1996) might
Wong et al., 1999). Besides conceptualizing have been the first linking the scale effect
the scale effect on measuring segregation, in raster or remote sensing data analysis
this line of research also shows that spa- to the MAUP explicitly, the scale effect
tial measures are likely more sensitive to or scale dependency issue was definitely
changing scale than aspatial measures (Wong, not new to remote sensing scientists (e.g.,
2004). Bian and Walsh, 1993) since remote sens-
A coordinated effort during this phase ing data are often available and can be
of the MAUP research was the publishing tabulated easily to multiple scale levels
of a special issue of Geographical Systems (Bian, 1997). Part of the issue, which
(Wong and Amrhein, 1996). In this special has historically been a problem in remote
issue, some researchers still focused on the sensing analysis, is to select the resolution
MAUP effects (e.g., Okabe and Tagashira, appropriate for the analysis (e.g., Townshend
1996; Hunt and Boots, 1996), but others and Justice, 1988). Lam and Quattrochi
delved deeper into the sources of the MAUP (1992) reviewed several concepts related to
(e.g., Amrhein and Reynolds), including the scale and resolution, attempting to address
change-of-support concept in geostatistics the issue of choosing the optimal scale
(Cressie, 1996). A clear direction was to or resolution to analyze a particular phe-
develop solutions. Holt et al. (1996) argued nomenon. Some researchers also recognized
that the source of the scale effect was the that the scale effect is essentially a change-
changes in correlation between variables and of-support problem in geostatistics (Atkinson
thus they proposed a framework to model and Curran, 1995). The edited volume by
the changes of correlation over scale by Quattrochi and Goodchild (1997) collected
taking into account spatial autocorrelation papers partly focusing on the impacts of
implicitly. Unfortunately, the complexity of the MAUP on remote sensing, and also
the computational method was beyond a on modeling the scale effect and develop-
practical solution to the problem. Creating ing solutions (e.g., Bian, 1997; Xia and
118 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
Clarke, 1997). Still, no clear solutions have spatial in nature. Thus, he called for the
been identified. development of scale-insensitive or frame-
Outside of the geographical literature, the independent spatial analytical techniques to
MAUP attracted additional attention after deal with the MAUP and he employed a
the appearance of Kings monograph (1997), population migration model that was rela-
which focused on ecological inference issues tively insensitive to scale changes (Tobler,
across social science disciplines, but also 1989). Toblers migration model is one
addressed the related MAUP. He made a bold of the very few analytical tools that are
claim that an error-bound approach can solve relatively scale-insensitive. Another one that
the scale effect, part of the MAUP and is has demonstrated some level of stability
conceptually related to the ecological fallacy in correlation over different scale levels is
problem. His claim triggered reactions from location-specific correlation analysis (Wong,
the geographic realm, and some of these 2001). But all of these potentially scale-
reactions were aired through a series of coor- insensitive tools have limited applications.
dinated comments (Sui, 2000), although the A popular spatial solution to the MAUP
focus was still on the ecological fallacy issue. even before Openshaw and Taylor coined the
But geographers responses (Fotheringham, term was to create optimal zoning systems
2000; Anselin, 2000; OLoughlin, 2000) (Openshaw, 1977a, b, 1978a, b; Openshaw
were not too optimistic that Kings solutions and Baxter, 1977; Openshaw and Rao, 1995;
can solve the ecological fallacy issue and Openshaw and Schmidt, 1996). Given that
specifically the MAUP. On the other hand, most aggregation problems involve multiple
Johnston and Pattie (2001) rebuffed the claim variables, derivations of zonal systems have
that geographers have not spent adequate to be based upon multiple variables and
effort in dealing with the ecological fallacy multiple objectives. In general, the principle
by citing previous research on entropy is to create zonal systems to minimize the
maximization, which offers promising results intra-zonal variances and to maximize inter-
in dealing with the ecological inference zonal variances. But often there is no unique
problem. solution and therefore, heuristic processes
seem to be quite promising (Bong and Wang,
2004).
Recently, the edited volume by Tate and
7.6. POTENTIAL SOLUTIONS Atkinson (2001) pointed to three directions of
MAUP research: impacts of the scale effects,
The recent exchanges between geographers the potential of fractal analysis in dealing
and King raise doubt that Kings solutions the scale issue and the use of geostatistics,
can solve the MAUP. Even though the early specifically kriging and related methods such
phase of the MAUP research was fascinated as variograms to handle and model the scale
by the pervasiveness of the MAUP effects effect. Although the intended coverage of
and overwhelmed by impact-analysis type the volume included both vector and raster
of studies, researchers have never stopped data, the impact assessment tended to focus
searching for solutions since the very more on vector data while the modeling
beginning. Robinson (1956) suggested sim- and solutions were geared more toward
plistic weighting methods to overcome some raster data. Fractals have a strong relationship
of the MAUP effects on correlation analysis. historically with the scale effect as remote
Tobler (1989) argued that because the MAUP sensing data can be tabulated and analyzed
is a spatial problem, solutions have to be at multiple scale levels and fractal geometry
THE MODIFIABLE AREAL UNIT PROBLEM (MAUP) 119
is a powerful mathematical tool to handle internal homogeneity (low variance), and the
multiscale phenomena (Lam and Quattrochi, magnitude of the scale effect will be partly
1992; Lam and De Cola, 1993; Pecknold a function of the internal homogeneity. As
et al., 1997; Quattrochi et al., 1997). The a result, one may model the scale effect
volume by Tate and Atkinson (2001) includes or statistics describing the data at different
several papers on using fractals to handle scale levels as long as we can establish
the scale problem. But so far, although the rules of aggregation and how the scale
fractal analysis has been demonstrated to effect is related to the level of internal
be effective in describing and modeling homogeneity. Since the foundation of most
phenomena at multiple scales, it has not yet classical statistics is the variancecovariance
been proven to be a viable solution to the matrix, this group of researchers proposed
MAUP, or more specifically the scale effect. using the correlation at the individual level
Tate and Atkinson (2001) also suggested to estimate the correlation at the aggregated
geostatistical analysis as a potential solution level and thus can estimate the variance
to the scale problem. Geostatistical tools, covariance matrix at the aggregate level.
especially variograms, can identify the geo- The statistical derivations involved were very
graphical range of spatial autocorrelation. sophisticated and the computation was very
This is an important piece of information demanding. As a result, this has not been a
to understand and model the scale effect. practical solution to the MAUP.
They claimed that geostatistical tools are Although tremendous efforts have been
not used to rescale the data themselves, spent to deal with the scale problem, to many
but to rescale statistics describing the data researchers, the zoning problem seems to
(Atkinson and Tate, 2000). This is an inter- be easier to handle. Flowerdew and Green
esting idea, but has not been fully validated (1989, 1992) treated the zoning problem in
or operationalized. More recently, following the same way as resolving incompatible zonal
the introduction of Geographically Weighted systems. The general approach is to use spa-
Regression (GWR), the potential for using tial interpolation methods to transform data
GWR to depict spatial heterogeneity related gathered according to one zonal pattern to
to the MAUP was alluded to (Fotheringham another pattern. Fisher and Langford (1995,
et al., 2000). Because a major source of the 1996) have evaluated the reliability of this
scale effect is spatial heterogeneity and GWR technique in handling the zoning problem.
can model local variability reasonably well, A related technique, dasymmetric mapping,
it is believed that GWR may be more robust was also shown to be effective to handle
than other global models and less sensitive to incompatible zonal patterns from a carto-
the scale effect (Fotheringham et al., 2002, graphic perspective (Fisher and Langford,
pp. 144158). Still GWR cannot really be 1996; Mennis, 2003). An older smoothing or
regarded as a solution to the scale effect or interpolation technique, the smooth pycno-
the MAUP. phylactic interpolation introduced by Tobler
Somewhat similar to the geostatistical (1979), has also been revisited and is believed
approach to rescale statistics over multiple to be a solution candidate for the MAUP,
scale levels was the direction taken by a specifically in addressing the problem from
group of social statisticians (Holt et al., the change-of-support perspective (Gotway
1996; Steel and Holt, 1996; Tranmer and and Young, 2002).
Steel, 1998). They realized that the scale To summarize, the MAUP effects can
effect can be kept to a minimum when possibly be tackled by sophisticated models
the aggregated areas have a high degree of and computationally intensive techniques,
120 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
while their practical and operational poten- forward to the stage that some very promising
tials are yet to be affirmed. Relatively simple and operational modeling techniques are
techniques can handle the zoning problem, available to handle spatial autocorrelation
but not the scale problem. Thus, without quite effectively (e.g., Griffith, 2003). For
generally feasible methods to handle the the MAUP, we have accumulated pieces of
MAUP, the old call for recognizing the knowledge and developed some comprehen-
MAUP is still the most affordable approach sive understanding and conceptualizations
to deal with this long-term stubborn problem of the problems. But a systematic research
(Fotheringham, 1989). Given the advances agenda seems to be needed in order to
in GIS technology and computational tools, bring significant advancements along this
and the availability of digital data at various direction. Assessing the impacts of the
scales, repeating the same analysis but MAUP should be a topic confined to the past,
using different scales or partitioning schemes and the future should focus on developing
is within reach of most researchers. This operational solutions.
approach is probably the minimum standard
in handling the MAUP given where we are
on this topic.
Taking one step further, using segregation REFERENCES
indices as examples, Wong (2003) disaggre-
gated segregation at different geographical Amrhein, C.G. (1995). Searching for the elusive
aggregation effect: evidence from statistical
levels to demonstrate that one can document simulations. Environment and Planning A, 27:
the sources of the MAUP effects. This 105119.
accounting framework is to identify and
Amrhein, C.G. and Flowerdew, R. (1989). The effect
quantify the amount of the MAUP effects of data aggregation on a Poisson regression model
contributed by different locations at different of Canadian migration. In: Goodchild, M.F. and
scale levels. This detailed mapping of the Gopal, S. (eds), Accuracy of Spatial Databases,
MAUP effects by scale and space is not just pp. 229238. London: Taylor and Francis.
informative, but also sheds light on where the Amrhein, C.G. and Reynolds, H. (1996). Using spatial
MAUP effects may be the most acute in the statistics to assess aggregation effects. Geographical
geographic hierarchy and highlights locations Systems, 3(2/3): 143158.
that deserve more attention. Anselin, L. (2000). The alchemy of statistics, or creating
data where no data exist. Annals, Association of
American Geographers, 90(3): 586592.
Arbia, G. (1989). Spatial Data Conguration in
7.7. CONCLUDING REMARK Statistical Analysis of Regional Economic and Related
Problems. Dordrecht: Kluwer.
Many methodological or technical problems
Arbia, G., Benedetti, R. and Espa, G. (1996). Effects
can be found in the geographical literature.
of the MAUP on image classication. Geographical
Some have broad impacts and are very Systems, 3(2/3): 123141.
complex, while some are confined to certain
Atkinson, P.M. and Curran, P.J. (1995). Dening
areas and are more manageable. Two very an optimal size of support for remote sensing
stubborn but pervasive problems in statistical investigations. IEEE Transactions on Geosciences and
analysis of spatial data are spatial auto- Remote Sensing, 33(3): 768776.
correlation and the MAUP. The past two Atkinson, P.M. and Tate, N.J. (2000). Spatial scale
decades of research in spatial statistics and problems and geostatistical solutions: a review. The
spatial econometrics have moved the field Professional Geographer, 52(4): 607623.
THE MODIFIABLE AREAL UNIT PROBLEM (MAUP) 121
Batty, M. (1976). Entropy in spatial aggregation. Flowerdew, R. and Green, M. (1989). Statistical
Geographical Analysis, 8: 121. methods for inference between incompatible zonal
systems. In: Goodchild, M. and Gopal, S. (eds),
Batty, M. and Sikdar, P.K. (1982a). Spatial aggregation
The Accuracy of Spatial Data Bases, pp. 239247.
in gravity models: 1. An information-theoretic
London: Taylor and Francis.
framework. Environment and Planning A, 14:
377405. Flowerdew, R. and Green, M. (1992). Developments
in areal interpolating methods and GIS. Annals of
Batty, M. and Sikdar, P.K. (1982b). Spatial aggregation Regional Science, 26: 6778.
in gravity models: 2. One-dimensional population
density models. Environment and Planning A, 14: Fotheringham, A.S. (1989). Scale-independent spatial
525553. analysis. In: Goodchild, M. and Gopal, S. (eds),
Accuracy of Spatial Databases. pp. 221228.
Batty, M. and Sikdar, P.K. (1982c). Spatial aggre- London: Taylor and Francis.
gation in gravity models: 3. Two-dimensional trip
distribution and location models. Environment and Fotheringham, A.S. (2000). A bluffers guide to
Planning A, 14: 629658. a solution to the ecological inference problem.
Annals, Association of American Geographers,
Batty, M. and Sikdar, P.K. (1982d). Spatial aggregation 90(3): 582586.
in gravity models: 4. Generalisations and large-
scale applications. Environment and Planning A, 14: Fotheringham, A.S., Brunsdon, C. and Charlton, M.E.
795822. (2000). Quantitative Geography: Perspectives on
Spatial Analysis. London: Sage.
Blair, P. and Miller, R.E. (1983). Spatial aggregation
in multiregional inputoutput models. Environment Fotheringham, A.S., Brunsdon, C. and Charlton, M.E.
and Planning A, 15: 187206. (2002). Geographically Weighted Regression.
England: Wiley & Sons.
Bian, L. (1997). Multiscale nature of spatial data in
scalling up environment models. In: Quattrochi, D.A. Fotheringham A.S., Densham P.J. and Curtis A.
and Goodchild, M.F. (eds). Scale in Remote Sensing (1995). The zone denition problem in location-
and GIS. pp. 1327. Lewis Publishers. allocation modeling. Geographical Analysis, 27:
6077.
Bian, L. and Walsh, S. (1993). Scale dependencies of
vegetation and topography in a mountainous envi- Fotheringham A.S. and Wong, D.W.S. (1991). The
ronment of Montana. The Professional Geographer, modiable areal unit problem in multivariate
45(1): 111. statistical analysis. Environment and Planning A, 23:
10251044.
Bong, C.W. and Wang, Y.C. (2004). A multiobjective
hybrid metaheuristic approach for GIS-based spatial Gehlke, C.E. and Biehl, K. (1934). Certain effects of
zoning model. Journal of Mathematical Modelling grouping upon the size of the correlation coefcient
and Algorithms, 3: 245261. in census tract material. Journal of the American
Statistical Association, 29: 169170.
Clark, W.A.V. and Avery, K.L. (1976). The effects of
Gotway, C.A. and Young, L.J. (2002). Combining
data aggregation in statistical analysis. Geographical
incompatible spatial data. Journal of the American
Analysis, 8: 428438.
Statistical Association, 97: 632648.
Cressie, N. (1996). Change of support and the mod-
Grifth, D.A. (1987). Spatial Autocorrelation: a Primer.
iable areal unit problem. Geographical Systems,
Washington, D.C.: Association of American
3(2/3): 159180.
Geographers.
Fisher, P.F. and Langford, M. (1995). Modelling
Grifth, D.A. (2003). Spatial Autocorrelation and
the errors in areal interpolation between zonal
Spatial Filtering. Berlin: Springer-Verlag.
systems by Monte Carlo simulation. Environment and
Planning A, 27(2): 211224. Grifth, D.A. and Amrhein, C.G. (1997). Multivariate
Statistical Analysis for Geographers. Upper Saddle
Fisher, P.F. and Langford, M. (1996). Modeling
River, NJ: Prentice Hall.
sensitivity to accuracy in classied imagery: a study
of areal interpolation by dasymetric mapping. The Goodchild, M.F. (1979). Aggregation problem in location-
Professional Geographer, 48(3): 299309. allocation. Geographical Analysis, 11: 240255.
122 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
Hodgson, M.J., Shmulevitz, F. and Krkel, M. (1997). OLoughlin, J. (2000). Can Kings ecological inference
Aggregation error effects on the discrete-space method answer a social scientic puzzle: who
p-median model: the case of Edmonton, Canada. voted for the Nazi party in Weimar Germany?
Canadian Geographer, 41: 415429. Annals, Association of American Geographers,
90(3): 592601.
Holt, D., Steel, D.G. and Tranmer, M. (1996). Area
homogeneity and the Modiable Areal Unit Problem. Openshaw, S. (1977a). Geographical solution to
Geographical Systems, 3(2/3): 181200. scale and aggregation problems in region-building,
partitioning and spatial modeling. Transactions of
Horner, M.W. and Murray, A.T. (2002). Excess
the Institute of British Geographers, 2: 459472.
commuting and the modiable areal unit problem.
Urban Studies, 39: 131139. Openshaw, S. (1977b). Optimal zoning systems
Hunt, L. and Boots, B.N. (1996). MAUP effects in for spatial interaction models. Environment and
the principal axis factoring technique. Geographical Planning A, 9: 169184.
Systems, 3(2/3): 101122. Openshaw, S. (1978a). An optimal zoning approach to
Jelinski, D.E. and Wu, J. (1996). The modiable areal the study of spatially aggregated data. In: Masser, I.
unit problem and implications for landscape ecology. and Brown, P.J.B. (eds), Spatial Representation and
Landscape Ecology, 11: 129140. Spatial Interaction, pp. 93113. Leiden: Martinus
Nijhoff.
Johnston, R. and Pattie, C. (2001). On geographers
and ecological inference. Annals, Association of Openshaw, S. (1978b). Empirical-study of some zone-
American Geographers, 91(2): 281282. design criteria. Environment and Planning A, 10:
781794.
King, G. (1997). A Solution to the Ecological Inference
Problem: Reconstructing Individual Behavior from Openshaw, S. (1984). The Modiable Areal Unit Prob-
Aggregate Data. Princeton: Princeton University lem. CATMOG, 38. Norwich, England: Geobooks.
Press. Openshaw, S. (1996). Developing GIS-relevant zone-
Lam, N.S.-N. and De Cola, L. (eds) (1993). Fractals in based spatial analysis methods. In Longley, P. and
Geography. Englewood Cliffs, NJ: Prentice-Hall. Batty, M. (eds), Spatial Analysis: Modelling in a
GIS Environment, pp. 5573. Cambridge, U.K.:
Lam, N.S.-N. and Quattrochi, D.A. (1992). On the GeoInformation International.
issues of scale, resolution, and fractal analysis in
the mapping sciences. The Professional Geographer, Openshaw, S. and Baxter, R.S. (1977). Algorithm 3
44(1): 8898. procedure to generate pseudo-random aggregations
of n-zones into m-zones, where m is less than n,
Masser, I. and Brown P.J.B. (1975). Hierarchical aggre- Environment and Planning A, 9: 14231428.
gation procedures for interaction data. Environment
and Planning A, 7: 509523. Openshaw, S. and Rao L. (1995). Algorithms for
reengineering 1991 census geography. Environment
Mennis, J. (2003). Generating surface models of popu- and Planning A, 27: 425446.
lation using dasymetric mapping. The Professional
Geographer, 55(1): 3142. Openshaw, S. and Schmidt, J. (1996). Parallel simulated
annealing and genetic algorithms for re-engineering
Moellering, H. and Tobler, W.R. (1972). Geographical zoning systems. Geographical Systems, 3(2/3):
variances. Geographical Analysis, 4: 3442. 201220.
Murray, A. and Gottsegen, J. (1997). The inuence Openshaw, S. and Taylor, P.J. (1979). A million or
of data aggregation on the stability of p-median so correlation coefcients: three experiments on
location model solutions. Geographical Analysis, 29: the modiable areal unit problem. In: Wrigley, N.
200213. (ed.), Statistical Applications in the Spatial Sciences,
Odland, J. (1988). Spatial Autocorrelation. London: pp. 127144. London: Pion.
Sage.
Pecknold, S., Lovejoy, S., Schertzer, D. and Hooge, C.
Okabe, A. and Tagashira, N. (1996). Spatial aggre- (1997). Multifractals and resolution dependence of
gation bias in a regression model containing a remotely sensed data: GSI to GIS. In: Quattrochi, D.A.
distance variable. Geographical Systems, 3(2/3): and Goodchild, M.F. (eds), Scale in Remote Sensing
7799. and GIS, pp. 361394. Lewis Publishers.
THE MODIFIABLE AREAL UNIT PROBLEM (MAUP) 123
Perle, E.D. (1977). Scale changes and impacts Townshend, J.R.G. and Justice, C.O. (1988). Selecting
on factorial ecology structures. Environment and the spatial resolution of satellite sensors required for
Planning A, 9: 549558. global monitoring of land transformation. Interna-
tional Journal of Remote Sensing, 9: 187236.
Putman, S.H. and Chung, S.H. (1989). Effects of spatial
system-design on spatial interaction models.1. the Tranmer, M. and Steel, D.G. (1998). Using census data
spatial system denition problem. Environment and to investigate the causes of the ecological fallacy.
Planning A, 21: 2746. Environment and Planning A, 30: 817831.
Quattrochi, D.A. and Goodchild, M.F. (eds) (1997). Wong, D.W.S. (1995). Aggregation effects in geo-
Scale in Remote Sensing and GIS. Lewis Publishers. referenced data. In: Arlinghaus, S.L. and Grifth,
D.A. (eds), Practical Handbook of Spatial Statistics,
Quattrochi, D.A., Lam, N.S.-N., Qiu, H-L. and Zhao, W.
pp. 83106. Boca Raton, FL: CRC Press.
(1997). ICAMS: A geographic information system
for the characterization and modeling of multiscale Wong, D.W.S. (1997). Spatial dependency of segre-
remote sensing data. In: Quattrochi, D.A. and gation indices. The Canadian Geographer, 41(2):
Goodchild, M.F. (eds), Scale in Remote Sensing and 128136.
GIS, pp. 295308. Lewis Publishers.
Wong, D.W.S. (2001). Location-specic cumulative
Robinson, A.H. (1956). The necessity of weighting distribution function (LSCDF): an alternative to
values in correlation analysis of areal data. Annals, spatial correlation analysis. Geographical Analysis,
the Association of American Geographers, 46: 33(1): 7693.
233236.
Wong, D.W.S. (2003). Spatial decomposition of
Robinson, W.S. (1950). Ecological correlations and segregation indices: a framework toward measur-
the behavior of individuals. American Sociological ing segregation at multiple levels. Geographical
Review, 15: 351357. Analysis, 35(3): 179194.
Sawicki, D.S. (1973). Studies of aggregated areal data Wong, D.W.S. (2004). Comparing traditional and
problems of statistical inference. Land Economics, spatial segregation measures: a spatial scale
49: 109114. perspective, Urban Geography, 25(1): 6682.
Steel, D.G. and Holt, D. (1996). Rules for random Wong, D.W.S. (2007). Ecological fallacy, In: B. Warf
aggregation. Environment and Planning A, 28: (ed), Encyclopedia of Human Geography, Sage
957978. Publications, pp. 117118.
Sui, D. (2000). New directions in ecological inference: Wong, D.W.S. and Amrhein, C.G. (1996). Research on
an introduction. Annals, Association of American the MAUP: old wine in a new bottle or real break-
Geographers, 90(3): 579582. through? Geographical Systems, 3(2/3): 7377.
Tate, N.J. and Atkinson, P.M. (eds) (2001). Modelling Wong, D.W.S., Lasus, H. and Falk, R.F. (1999).
Scale in Geographical Information Sciences. London: Exploring the variability of segregation index D with
Wiley & Sons. scale and zonal systems: an analysis of thirty US
cities. Environment and Planning A, 31: 507522.
Tobler, W. (1979). Smooth pycnophylactic interpolation
for geographical regions. Journal of the American Wong, D.W.S. and Lee, J. (2005). Statistical Analysis of
Statistical Association, 74: 519536. Geographic Information. New York: Wiley and Sons.
Tobler, W. (1989). Frame independent spatial analysis. Xia, Z-G. and Clarke, K.C. (1997). Approches to
In: Goodchild, M. and Gopal, S. (eds), The Accuracy scaling of geo-spatial data, In: Quattrochi, D.A. and
of Spatial Data Bases, pp. 115122. London: Taylor Goodchild, M.F. (eds), Scale in Remote Sensing and
and Francis. GIS, pp. 309360. Lewis Publishers.
8
Spatial Weights
Robin Dubin
row normalizing will change the weights so The most natural way to represent the
that they sum to one. Thus pairs with the spatial relationships with areal data is through
same separation distance can have different the concept of contiguity. That is, regions
weights, depending on the number of nearby will be considered to be related if their
observations. boundaries share common points. There are
In the remainder of this chapter, I will three types of contiguity that are commonly
explore weight matrices for the following considered: rook contiguity, bishop conti-
cases: regular lattice data for points, regular guity, and queen contiguity. Contiguity is
lattice data for areas, irregularly located data determined by imagining that the regions
for points, and irregularly located data for form a chess board; neighbors are determined
areas. by the regions that the appropriate chess piece
Consider the data presented in Figure 8.1. could reach.
This is a map of 25 regions arranged on a
regular lattice. The borders of the regions
are shown with solid lines, the centroids
8.1.1. Rook contiguity
are shown with heavy black points, and the
lattice itself is shown with dashed lines. Each With rook contiguity, the neighbors are due
region is identified by a number between north, south, east and west. Region 7s
1 and 25. neighbors are regions 2, 6, 8 and 12 and
5 10 15 20 25
5
4 9 14 19 24
4
3 8 13 18 23
3
2 7 12 17 22
2
1 6 11 16 21
1
1 2 3 4 5
5 10 15 20 25
5
4 9 14 19 24
4
3 8 13 18 23
3
2 7 12 17 22
2
1 6 11 16 21
1
1 2 3 4 5
are indicated with stars in Figure 8.2. The To obtain the row normalized version of
weight matrix for this data will have 25 rows this weight matrix, divide each row by the
and columns. The first 10 rows and columns number of neighbors (ones). Thus in rows
are of the unstandardized weight matrix are with 4 neighbors, the entries will be 0.25,
shown in Figure 8.3. and in rows with only two, the entries will
This symmetric matrix has zeros on its be 0.5. This is a common occurrence: row
main diagonal. A one indicates that regions normalizing often makes symmetric weight
i and j are neighbors. Regions in the interior matrices asymmetric.
of the study area will have four ones in their
rows. For example, the seventh row of the
weight matrix contains four ones, because
8.1.2. Bishop contiguity
region 7 has four neighbors (only three
are shown in Figure 8.3 because the fourth In bishop contiguity, region is neighbors are
neighbor is region 12). Regions on the located at its corners. Figure 8.4 shows the
periphery will have fewer neighbors. For neighbors for region 7 under this scheme. The
example, the first row (representing region 1) neighbors are regions 1, 3, 11 and 13.
has only two ones. These are in the second Again, regions in the interior will have four
and sixth cells, indicating that region 1 has neighbors, while those on the periphery will
only two neighbors: region 2 and region 6. have fewer. Figure 8.5 shows the first 10 rows
128 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
0.00 1.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00
1.00 0.00 1.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00
0.00 1.00 0.00 1.00 0.00 0.00 0.00 1.00 0.00 0.00
0.00 0.00 1.00 0.00 1.00 0.00 0.00 0.00 1.00 0.00
0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 1.00
1.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00
0.00 1.00 0.00 0.00 0.00 1.00 0.00 1.00 0.00 0.00
0.00 0.00 1.00 0.00 0.00 0.00 1.00 0.00 1.00 0.00
0.00 0.00 0.00 1.00 0.00 0.00 0.00 1.00 0.00 1.00
0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 1.00 0.00
5 10 15 20 25
5
4 9 14 19 24
4
3 8 13 18 23
3
2 7 12 17 22
2
1 11
6 16 21
1
1 2 3 4 5
0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 1.00 0.00 1.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 1.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 1.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00
0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
1.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 1.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 1.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00
5 10 15 20 25
5
4 9 14 19 24
4
3 8 13 18 23
3
2 7 12 17 22
2
1 6 11
16 21
1
1 2 3 4 5
the neighbors for region 7 under queen columns of the unstandardized weight matrix
contiguity. are shown in Figure 8.7.
The weight matrix for queen contiguity Comparing Figure 8.7 to Figures 8.5 and
is the sum of the weight matrices for rook 8.6 shows that Figure 8.7 can be obtained
and bishop contiguity. The first 10 rows and by summing the other two weight matrices.
130 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
0.00 1.00 0.00 0.00 0.00 1.00 1.00 0.00 0.00 0.00
1.00 0.00 1.00 0.00 0.00 1.00 1.00 1.00 0.00 0.00
0.00 1.00 0.00 1.00 0.00 0.00 1.00 1.00 1.00 0.00
0.00 0.00 1.00 0.00 1.00 0.00 0.00 1.00 1.00 1.00
0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 1.00 1.00
1.00 1.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00
1.00 1.00 1.00 0.00 0.00 1.00 0.00 1.00 0.00 0.00
0.00 1.00 1.00 1.00 0.00 0.00 1.00 0.00 1.00 0.00
0.00 0.00 1.00 1.00 1.00 0.00 0.00 1.00 0.00 1.00
0.00 0.00 0.00 1.00 1.00 0.00 0.00 0.00 1.00 0.00
For example, the first row now has three ones, The variance-covariance matrix for the
in positions 2, 6 and 7, showing that these data is given by the following formula:
three regions are neighbors of region 1.
1
= 2 (I lW ) (I lW ) (8.3)
Y =+ (8.1)
ij
Kij = (8.4)
ii
jj
= lW + e (8.2)
0
1
2
3
4
6 5
7
26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Legend:
0.8 r
r < 0.2
0
1
2
3
4
5
6
7
8
26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Legend:
0.8 r
r < 0.2
In what follows, I will use region 7 for direct neighbors and region 7 would be
illustration purposes. Figure 8.11 shows the the same. However, Figure 8.11 shows
seventh row of the correlation matrix. that the correlations are 0.76, 0.59, 0.59
Recall that under bishop contiguity the and 0.48, respectively. The correlations
neighbors of region seven are regions 1, 3, 11 differ because these regions have different
and 13. One might then reasonably expect numbers of neighbors themselves, as shown
that the correlation between all of the in Table 8.1. The general rule is that the
SPATIAL WEIGHTS 133
0
1
2
3
4
5
6
7
8
26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Legend:
0.8 r
r < 0.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
0.76 0.00 0.59 0.00 0.22 0.00 1.00 0.00 0.31 0.00 0.59 0.00 0.48 0.00 0.20 0.00 0.31 0.00 0.20 0.00 0.22 0.00 0.20 0.00 0.14
Table 8.1 Neighbors of region 7 using effects. Edge effects occur when the spatial
bishop contiguity processes continue outside of the study area.
Neighbors Correlation Number of neighbors Regions with fewer neighbors are assigned
with region 7 of neighbors higher correlations; however, these regions
(excluding region 7)
occur on the boundaries of the study area,
1 0.76 0
where they will be influenced by regions not
3 0.59 1
11 0.29 1
included in the study.
13 0.48 3 Also of note in the correlation matrix
is the large number of zeros. These are
shown as dots in Figure 8.8 and can be seen
explicitly in Figure 8.11. This occurs because
there are regions which are impossible to
reach from region i using bishops moves.
more connected a region is, the lower For example, as Figure 8.12 shows, it is
its correlation with another region. This impossible to reach regions 8 or 10 from
makes sense because the more neighbors region 7, and so Figure 8.11 shows zeros for
a region has, the greater the different these cells.
influences on it. Figure 8.12 also shows that, although
Although sensible, this connectedness not direct neighbors, regions 9 and 5 can
property will increase the severity of edge be reached from region 7. It takes two
5 10 15 20 25
5
4 9 14 19 24
4
3 8 13 18 23
3
2 7 12 17 22
2
1 6 11 16 21
1
1 2 3 4 5
moves to reach region 9 and three to reach Table 8.2 Neighbors of region 7 using rook
region 5. Therefore, we should see non- contiguity
zero correlations in these cells, with a larger Neighbors Correlation Number of neighbors
correlation in cell 9 than in cell 5, and this is with region 7 of neighbors
(excluding region 7)
confirmed by Figure 8.11.
2 0.51 2
6 0.51 2
8 0.44 3
12 0.44 3
8.2.2. Correlation matrix for rook
contiguity
An examination of Figure 8.9 reveals a much
different pattern of correlations for the rooks
case. The 7th row of the correlation matrix is
shown in Figure 8.13. neighbors are also neighbors of region 7.
The same principal of greater connectivity Region 9 has only one neighbor in common
leading to smaller correlations continues to with region 7. This means that region
be demonstrated by the rook contiguity, as 13 should have the stronger relationship
shown in Table 8.2. with region 7, and this is borne out by
In the rooks case, it is possible to get Figure 8.13.
from one region to any other region, although
many moves may be required. This means
that there are no unrelated regions, as in
8.2.3. Correlation matrix for queen
the bishops case, and hence no zeros in
contiguity
the correlation matrix. There are, however,
many small values, and these are shown Although the weight matrix for the queens
as dots in Figure 8.9. The greater the case is the sum of the rooks and the
number of moves required, the smaller bishops case, the same cannot be said for
the correlation. For example, starting from the correlation matrix. However, the same
region 7, three moves are required to get principals noted above apply: more con-
to region 10 but only two to get to region nected neighbors have lower correlations and
9. Figure 8.13 shows that the correlation shared neighbors increase the correlations.
between regions 7 and 9 is larger than that The seventh row of the correlation matrix is
between regions 7 and 10. Likewise, it is shown in Figure 8.14.
also possible to get to region 25, but this Clearly, in the queens case, it is possible
requires 6 moves, and the correlation here is to reach one region from any other region,
very small (0.02). and so there are no isolated regions. However,
Finally, it is of interest to examine regions there are again some very small correlations,
9 and 13. Both are two moves away from which appear as dots in Figure 8.10.
region 7. However, two of region 13s Table 8.3 shows the relationship between
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
0.40 0.51 0.31 0.14 0.07 0.51 1.00 0.44 0.16 0.07 0.31 0.44 0.24 0.10 0.05 0.14 0.16 0.10 0.05 0.03 0.07 0.07 0.05 0.03 0.02
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
0.52 0.49 0.40 0.18 0.11 0.49 1.00 0.39 0.19 0.11 0.40 0.39 0.32 0.15 0.09 0.18 0.19 0.15 0.10 0.07 0.11 0.11 0.09 0.07 0.05
Table 8.3 Neighbors of region 7 using of its four neighbors are also neighbors to
queen contiguity region 7.
Neighbors Correlation Number of Number of
with region 7 neighbors shared
neighbors neighbors
1 0.52 2 2 8.3. CORRELOGRAMS
2 0.49 4 4
3 0.40 4 2
The final tool that I will use to analyze differ-
6 0.49 4 4
8 0.39 7 4 ences between these weighting schemes is the
11 0.40 4 2 correlogram. A correlogram shows how the
12 0.39 7 4 correlations change as the distance between
13 0.32 7 2 the regions increases. Thus, the correlation is
graphed on the vertical axis, and separation
distance is graphed on the horizontal axis.
In general, we expect that the correlations
will fall as separation distance increases.
connectedness and the correlations. The Figure 8.15 shows the correlograms for the
queens case is somewhat more complex than three cases under consideration.
the rook and the bishop because now there are Since the data is on a regular lattice,
more neighbors. and hence the centroids of the regions are
Examination of Table 8.3 shows that, as evenly spaced, one might think that the
before, the correlations with region 7 fall as correlations would be the same for any given
its neighboring regions have more neighbors separation distance. However, Figure 8.15
themselves. For example, region 1 has the shows that this is not the case: there is
smallest number of neighbors (two) and the a range of correlations for each separation
highest correlation (0.52). However, we see distance, although this range gets smaller as
that the relationship is more complex than the separation distance increases. The range
before, because regions 2, 3, 6, and 11 all of correlations comes from the dependence
have four neighbors but their correlations of the correlations on the connectedness of
differ. The answer can be found in the the neighbors and the number of shared
last column of Figure 8.17, which shows neighbors as discussed above.
the number of shared neighbors. That is, The correlations for both the rooks and the
these are the number of regions that are queens cases do tend to fall with separation
neighbors to both region 7 and the region distance. For the bishops case, there is a
in the first column. Thus, regions 3 and tendency for the correlations to decline with
11 have the same correlation because both separation distance, but this decline is not
the number of neighbors and the number monotonic. Relatively large correlations are
of shared neighbors is the same. Similarly, interrupted by the zero correlations of the
region 6 has a higher correlation because all isolated regions, as discussed previously.
SPATIAL WEIGHTS 137
Rook contiguity
1.0
0.8
0.6
Correlation
0.4
0.2 0.0 0.2
0 1 2 3 4 5 6
Separation distance
Bishop contiguity
1.0
0.8
0.6
Correlation
0.4
0.2 0.0 0.2
0 1 2 3 4 5 6
Separation distance
Queen contiguity
1.0
0.8
0.6
Correlation
0.4
0.2 0.0 0.2
0 1 2 3 4 5 6
Separation distance
11
2 6
10
5 7
9
1 9
8
8
7
3
6
3
4
2
10
1
1 2 3 4 5 6 7 8 9 10 11
8.4. REGULAR LATTICE POINT DATA points, located as shown in Figure 8.16, for
purposes of illustration. The coordinates of
This is point data that is located at the inter- these points are given in Table 8.4.
section points of a regular grid. The data of I have chosen to use a small number
the previous section can be used here by sim- of points to keep the weight and corre-
ply considering the centroids of the regions to lation matrices small. Cluster 1 consists
be the data points. This means that applicable of observations 1 though 4. This cluster
weighting schemes include: rook, bishop and
queen contiguity. Also, weighting schemes
that are used primarily for irregularly located
point data can be used here as well. These
will be discussed in the next section. Table 8.4 Coordinates for irregularly
located point example data
Observation X Coordinate Y Coordinate
1 2 9
8.5. IRREGULARLY LOCATED 2 2 10
3 2 7
POINT DATA
4 2 3
5 9 9
The discussion thus far has pertained to data 6 9 10
located at regular intervals along a grid. 7 9.75 9
However, spatial data is not always located 8 9 7.75
so conveniently, and it is to this case that 9 7.5 9
10 11 1
we now turn. In what follows, I will use ten
SPATIAL WEIGHTS 139
is somewhat dispersed, with observation 4 The relatively small numbers in the shaded
having the weakest link. Cluster 2 consists upper left portion of Table 8.5 reveal
of observations 5 though 9. Cluster 2 is much Cluster 1. The very small numbers in the
tighter than Cluster 1. Observation 10 is an shaded lower right portion reveal Cluster 2.
isolated point and not part of any cluster. The large numbers in the last row and column
The example data makes it clear that this show that observation 10 is isolated from the
type of data is very different from the rest of the data.
regular lattice data, in which no clusters In what follows, I explore the properties of
could appear. There are a number of five different weighting schemes, which can
weighting schemes that can be used for be characterized as discrete or continuous.
this type of data; the analyst must be A discrete weighting scheme will have a
skillful in choosing the weighting scheme non-normalized weight matrix consisting of
that best represents the spatial interactions in ones and zeros, with the ones indicating
the data. the interactions. In the continuous weighting
Since the coordinates of the data points schemes, the cells will consist of numbers
are known, the distances separating each which indicate the strength of the interac-
pair of observations can be calculated. These tions. Each of these weighting schemes has
distances can be stored in an N N distance a parameter, the value of which must either
matrix. The distance matrix for the example be determined by the researcher or estimated.
data is shown in Table 8.5. All of the The weighting schemes are summarized in
weighting schemes discussed in this chapter Table 8.6 and described below. Note that
will be functions of separation distance. in most of the presented matrices, the
Table 8.5 Distance matrix for irregularly located point example data
0.00 1.00 2.00 6.00 7.00 7.07 7.75 7.11 5.50 12.04
1.00 0.00 3.00 7.00 7.07 7.00 7.81 7.35 5.59 12.73
2.00 3.00 0.00 4.00 7.28 7.62 8.00 7.04 5.85 10.82
6.00 7.00 4.00 0.00 9.22 9.90 9.80 8.46 8.14 9.22
7.00 7.07 7.28 9.22 0.00 1.00 0.75 1.25 1.50 8.25
7.07 7.00 7.62 9.90 1.00 0.00 1.25 2.25 1.80 9.22
7.75 7.81 8.00 9.80 0.75 1.25 0.00 1.46 2.25 8.10
7.11 7.35 7.04 8.46 1.25 2.25 1.46 0.00 1.95 7.04
5.50 5.59 5.85 8.14 1.50 1.80 2.25 1.95 0.00 8.73
12.04 12.73 10.82 9.22 8.25 9.22 8.10 7.04 8.73 0.00
elements representing the pairs in the two true (observation 2 is 1s nearest neighbor).
clusters will be shaded. If the text refers Also note that this weighting scheme gives
to specific cells, these will be highlighted observation 4 the same relationship with
instead. observation 3 that 3 has with 1, even though
3 is much closer to 1 than 4 is to 3. For one
nearest neighbor, the unstandardized weight
8.5.1. Nearest neighbors matrix will have one 1 in each row; in
general, the number of ones per row will be
A nearest neighbor weight matrix is defined equal to NN.
so that: Figure 8.17 shows the correlation matrix
for 1 nearest neighbor. The two clusters show
Wij = 1 if j is is nearest neighbor up clearly. Note that observation 10 appears
to be part of Cluster 2, even though it
= 0 otherwise is distant from the other points in that
cluster. Figure 8.18 provides a closer look at
Cluster 2.
A nearest neighbor is the observation that Let a doublet be a pair such that each is
is the closest to observation i. Nearest the others nearest neighbor, and a singlet be
neighbors can be generalized to include any a pair in which one member is the others
number of neighbors. For example, if the nearest neighbor, but not the reverse. The
number of nearest neighbors is set to five, only doublet shown in Figure 8.18 is the pair
then the non-normalized W will have five (5, 7), and this pair has the highest correlation
ones in each row, indicating the five closest shown, at 0.92. Observations 6, 8 and 9 have
observations to i. The number of neighbors only singlet connections to observation 5,
(NN) is the parameter of this weighting and their correlations are lower at 0.83.
scheme. Table 8.7 shows the weight matrix Observation 10 has a singlet connection to 8,
when NN is set to 1. but this correlation is even lower at 0.76. This
An examination of this table shows that is because 8 (the nearest neighbor) is less well
the weight matrix is not symmetric. For connected to the rest of the cluster than is
example, (3, 1) = 1 but (1, 3) = 0. This point 5.
is because observation 1 is observation 3s It may seem that there is a contradiction
nearest neighbor, but the reverse is not between the correlation patterns in the
Table 8.7 Weight matrix for example data with one nearest neighbor
0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00
SPATIAL WEIGHTS 141
0
1
2
3
4
Legend:
5
0.8 r
6
r < 0.2
11
0 1 2 3 4 5 6 7 8 9 10 11
.76 6
.68 .76
.83
.68
.83 .92
9 5 7
.83 .76
.68
.58
.63
.52
.76 Legend:
Doublet
Singlet
Not neighbors
10
Figure 8.18 Correlations between selected points for cluster two, one nearest neighbor.
142 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
Number of neighbors = 2
0
1
2
3
Legend:
4
0.8 r
5
r < 0.2
11 10
0 1 2 3 4 5 6 7 8 9 10 11
.67 6
.84
.73
.84
.67
.73 .84
9 5 7
.73
.73
.56
.69
.62
.48 .67 Legend:
Doublet
Singlet
Not neighbors
10
Figure 8.20 Correlations between selected points for cluster two, two nearest neighbors.
144 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
0 1 2 3 4 5 6 7 8 9 10 11
.59 6
.69 .71
.72 .60
.66 .74
9 5 7
.72
.71
.69
.63
.63
.50 Legend:
.62
Doublet
Singlet
Not neighbors
10
Figure 8.22 Correlations between selected points for cluster two, three nearest neighbors.
have been replaced by solid and dashed lines, separation distance pairs are not related to
because there are more neighbors. Thus, most each other.
of the points in this cluster are related, which Increasing the number of neighbors to two
leads to the diffused pattern of correlations reduces the size of the large correlations,
shown in Figure 8.21. and hence diminishes the diminution with
separation distance. The same pattern of large
correlations interspersed with zeros persists.
Finally, when the number of neighbors is
8.5.4. Correlograms for nearest
increased to three, all points are related at
neighbors
least weakly (keep in mind that there are
The correlograms for nearest neighbors are only 10 observations). The upper end of the
shown in Figure 8.23. Examination of strong (greater than 0.5) correlations has been
this figure shows that the correlations do reduced further. There is still a separation
not decline monotonically with separation distance range in which strong correlations
distance. For one nearest neighbor, although are interspersed with weak ones, and so there
there is a slight diminution with distance, is no monotonic relationship between sep-
the basic pattern is that the correlations aration distance and correlation. It remains
are either very strong or zero. Further- true, however, that the strongest correlations
more, the strong correlations are interspersed are associated with the smallest separation
with the zeros. This means that some distances and the largest separation distances
pairs can be highly correlated, while others, have the smallest correlations.
that are closer together, are not. However, All of these results show that the number of
all of the small separation distance pairs neighbors is an important parameter for this
are highly correlated and all of the large weighting scheme; the number of neighbors
146 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Separation distance
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Separation distance
has a sizeable impact on the weight matrix where N(k) is an N N matrix such that:
and the associated correlation matrix. It is
traditional for researchers to pick the number
of neighbors a priori. These results show that N (k)ij =1 if j is is kth nearest neighbor
this should be done with care.
=0 otherwise,
and then estimate to find the optimal all of the neighbors are assigned equal
degree of influence.1 In the example, NN is weight. In the P&G weighting scheme, the
set to 5. first nearest neighbor always has the greatest
This weighting scheme falls into the weight. As the value of increases, the more
continuous category, because the unstan- distant neighbors are given greater weight,
dardized weight matrix does not consist of but the weights are always less than for
ones and zeros, as in nearest neighbors. the first nearest neighbor. For example, in
The standardized weight matrices are shown Table 8.9 (three nearest neighbors) W has
in Tables 8.10 and 8.11, for = 0.1 and 0.33 in the second, third and ninth elements
= 0.5. These weight matrices differ from of the first row, while in Table 8.11 ( = .3)
the nearest neighbor weight matrices, partic- the corresponding elements are 0.52, 0.26,
ularly as becomes larger. For example, a and 0.13.
comparison of Tables 8.7 and 8.10 shows Figure 8.24 shows the correlation matrices
that = 0.1 produces a weight matrix that for = 0.1 and 0.5. Examination of this
is similar to that for one nearest neighbor. figure shows that = 0.1 corresponds well
However, comparing Tables 8.9 and 8.11 to one nearest neighbor. Not surprisingly
shows that the = 0.5 case is quite different however, = 0.5 differs from three nearest
from three nearest neighbors. This is because, neighbors in that there is less bleeding of the
in the nearest neighbors weighting scheme, clusters with P&G.
1 0
11 10 9 8 7 6 5 4 3 2
11 10 9 8 7 6 5 4 3 2
0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 11
Figure 8.25 shows the correlograms for neighbors model. That is, it changes the
the P&G model. These should be compared weighting on the neighbors so that more
to the nearest neighbors correlograms in distant (in terms of neighbors) neighbors
Figure 8.23. Not surprisingly, the first panels have less weight. Further, the rate of decline
of these two figures agree quite closely, while of the weight is estimated. Thus it should
the second panels do not. In the P&G model, be considered to be an important weighting
the attenuation of the strong correlations scheme in its own right. The features of this
with separation is more pronounced than in weighting scheme may make sense in some
nearest neighbors. Additionally, the range of situations. For example, if the data looks like
the strong correlations does not become as cluster one, then it makes sense to weight the
compressed when increases as it does when third neighbor less than the second. However,
the number of neighbors increases. Finally, if the data looks like cluster two, it makes
comparing the two panels in Figure 8.25, less sense. Note also that the researcher must
the bleeding of the clusters is shown by choose the maximum number of neighbors,
the zero correlations in the first panel ( = NN. As in all choices of parameters, the
0.1) becoming positive in the second panel researcher must use his judgment as to what
( = 0.5). is best.
1.0
0.8
0.6 Alpha = 0.1
Correlation
0.4
0.2
0.0
0.2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Separation distance
Alpha = 0.5
1.0
0.8
0.6
Correlation
0.4
0.2
0.0
0.2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Separation distance
where L is the distance limit. The parameter Tables 8.12 and 8.13 show standardized
L is usually chosen by the researcher and weight matrices for distance limits of 1 and 3,
its value can have a profound effect on respectively. When L = 1, W is very
both the weight and correlation matrices. sparse, because there are very few pairs
The unstandardized version of W is sym- with separation distance less than one. The
metric because Dij = Dji . However, the asymmetry is illustrated by the shaded
standardized W is usually not symmetric cells in Table 8.12. Observation 5 has two
because the number of points within the other points located within 1 distance unit
distance limit will vary by observation. This (points 6 and 7), and so these weights
is a discrete weighting scheme because the are standardized to 0.5. However, points
unstandardized W consists entirely of ones 6 and 7 only have one neighbor each
and zeros. (point 5), and so the weight for point 5 in
150 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
rows 6 and 7 is 1. Table 8.13 shows that, data, the correlation matrix does not resemble
when the distance limit is increased to 3, those of the previously discussed weighting
the weight matrix becomes symmetric. This schemes. Also note that L is not required to
distance limit reveals the two clusters, putting be an integer.
observations 1 through 3 in Cluster 1 and Figure 8.27 shows the correlograms for
5 through 9 in Cluster 2. Points 4 and 10 L = 1 and L = 3. For the small distance
have no neighbors; these rows contain only limits shown here, there is no intermingling
zeros. (with respect to distance) of correlated and
Figure 8.26 shows correlation matrices for uncorrelated points, as in nearest neighbors.
the Limit Model, for L = 1 and L = 3. Pairs are either correlated or not, and when
When L = 1, the correlation matrix is very they are, the correlation is high. The strictly
sparse, and the correlations that are present positive correlations end at the distance
are very high. Observations 1 and 5 are very limit. However, when the distance limit
dominant. The two clusters are apparent, but is larger (not shown), there is a range
include too few observations. Expanding the of separation distances in which positive
distance limit to 3 reveals the two clusters and zero correlations are interspersed. This
more accurately, although observations 4 and range occurs beyond the distance limit
10 remain excluded from either. Although and shows the neighbors of neighbors
L = 3 seems to make the most sense for this effect.
SPATIAL WEIGHTS 151
Limit = 1 Limit = 3
1 0
1 0
11 10 9 8 7 6 5 4 3 2
0 1 2 3 4 5 6 7 8 9 10 11 11 10 9 8 7 6 5 4 3 2
0 1 2 3 4 5 6 7 8 9 10 11
Distance limit = 1
0.6 0.8 1.0
Correlation
0.2 0.0 0.2 0.4
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Separation distance
Distance limit = 3
0.6 0.8 1.0
Correlation
0.2 0.0 0.2 0.4
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Separation distance
3 2 1 0 Exponent = 1 Exponent = 3
3 2 1 0
11 10 9 8 7 6 5 4
11 10 9 8 7 6 5 4
0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 11
(not shown), all points are correlated with all Tables 8.16 and 8.17 show the standardized
other points, and the correlations are roughly weight matrices for A = 0.5 and A = 2.
the same. When P = 1, we begin to see some When A = 0.5, the weights within the
stronger correlations associated with points clusters are reasonably large, and the largest
1 and 5, but non-zero correlations still exist weights are associated with points 1 and 5.
between all pairs. When P = 3, the clusters The weights for pairs outside of the clusters
appear very clearly, and points 1 and 5 appear are all zero, except for point 10. When A = 2,
to be influential. Note that points 4 and 10 are the weights associated with points 1 and 5
always included in the clusters. get smaller, but there is an indeterminate
Figure 8.29 shows the correlograms for effect on the weights on the other pairs in
P = 1 and P = 3. Examination of this the cluster: some get smaller and some get
figure shows that the correlations decline larger. The weights on the pairs outside of
monotonically when P = 1. At P = 3, an the cluster become larger.
intermixing of large and small correlations Figure 8.30 shows the correlation matrices
occurs when separation distance is in the for A = 0.5 and A = 2. When A =
range of 5 to 9. 0.25 (not shown), the clusters are clearly
indicated, and correlations associated with
points 1 and 5 are very large, indicating
8.6.3. Negative exponential model their centrality. When A is increased to 0.5,
point 5 loses some of its centrality, remaining
This is another continuous weighting scheme. highly correlated only with point 7. When A
Here the weights decline exponentially with is set to 1 (not shown), point 5 appears no
separation distance. different from the other points in Cluster 2,
and the centrality of point 1 becomes weaker.
Wij = exp (Dij /A) Finally, when A = 2, all of the points become
correlated, although the highest correlations
remain in the clusters. Note that point 10 is
where A is a parameter that is commonly always included in Cluster 2, regardless of
chosen by the researcher. the value of A.
154 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
P=1
0.8 1.0
0.2 0.0 0.2 0.4 0.6
Correlation
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Separation distance
P=3
0.8 1.0
0.2 0.0 0.2 0.4 0.6
Correlation
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Separation distance
Table 8.16 Standardized weight matrix for negative exponential model, A = 0.5
0.00 0.88 0.12 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.98 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.87 0.12 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.00
0.02 0.00 0.98 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.28 0.46 0.17 0.10 0.00
0.00 0.00 0.00 0.00 0.53 0.00 0.32 0.04 0.11 0.00
0.00 0.00 0.00 0.00 0.60 0.22 0.00 0.15 0.03 0.00
0.00 0.00 0.00 0.00 0.49 0.07 0.32 0.00 0.12 0.00
0.00 0.00 0.00 0.00 0.46 0.25 0.10 0.19 0.00 0.00
0.00 0.00 0.00 0.01 0.07 0.01 0.10 0.79 0.03 0.00
Figure 8.31 shows correlograms for with zero correlations, and zero correlations
A = 0.5 and A = 2. When A = 0.5, the at separation distances greater than 9. When
pattern is familiar: high correlations at small A = 2, previously strong correlations are
separation distances, a range between 5 and reduced somewhat, and the previously zero
9 where strong correlations are interspersed correlations become stronger.
SPATIAL WEIGHTS 155
A = 0.5 A = 0.2
1 0
1 0
5 4 3 2
5 4 3 2
8 7 6
8 7 6
11 10 9
11 10 9
0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 11
Figure 8.30 Correlation matrices for negative exponential model, selected values of A.
8.7. IRREGULARLY LOCATED AREAS where i(j) is the proportion of the perimeter
of area i that is shared with area j, and a and
All of the weighting schemes described in b are parameters. Dacey (1968) suggested
the previous section can be used for areas, taking the relative size of each area into
if they are applied to the centroids of the consideration, and proposed the following
regions. Other weighting schemes have been weights:
suggested for areas. For example, Cliff and
Ord (1981) suggest using weights based on
centriod separation distance and the length Wij = dij i i(j)
of the shared boundary.
where dij is one if the areas are contiguous
and zero otherwise, and i is the fraction of
i(j)
b
Wij = the study area that is contained in area i.
Dija Many other weighting schemes have been
156 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
A = 0.5
0.8 1.0
0.6
Correlation
0.4
0.2 0.0 0.2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Separation distance
A = 0.2
0.8 1.0
0.6
Correlation
0.4
0.2 0.0 0.2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Separation distance
proposed. These will not be explored further means that both the family and parameter
in this chapter. value are known by the researcher. While
this makes the estimation of other parameters
easier, it is not very satisfying, since it
8.8. DISCUSSION implies that the researcher knows a great
deal about the spatial interactions in the
The weight matrix is a powerful tool for data. Furthermore, the remaining parameter
representing spatial relationships. There are estimates can be biased, since they will
many choices for the form that this matrix be conditional upon the specification of W .
can take; only a few have been described Given the impact of the choice of family and
in this chapter. The researcher will always parameters upon the analysis, it is incumbent
have to specify a family of schemes (e.g., upon the researcher to choose carefully.
nearest neighbor, limit) and will often have Estimation of W is appealing, although
to choose at least one parameter to complete difficult. Maximum likelihood methods can
the specification. Despite this, it is standard to be used, at the cost of assuming normality.
treat the weight matrix as exogenous, which Bhattacharjee and Jensen-Butler (2006) have
SPATIAL WEIGHTS 157
These possible outcomes define the discrete 1997, and section 9.4.1 below). For the
distribution function of the die. Rolling the present purpose, a stationary mean provides
die leads to a particular outcome, called a basic starting point. A second important
a realization. For continuous variables, a restriction, which is not as restrictive as
continuous function (the probability density defining a stationary variance, is to define
function, pdf, or cumulative distribution a stationary spatial covariance function (rep-
function, cdf) replaces the discrete def- resenting second-order stationarity) or semi-
inition of the distribution function. The variogram (representing intrinsic stationarity,
cdf defines the probability of the outcome a weaker form of stationarity). Although
being less than a selected value (Goovaerts, much of the computation in geostatis-
1997). See Isaaks and Srivastava (1989) tics is based on the spatial covariance,
for a discussion of RVs in a geostatistical the equations are often written in terms
context. of semi-variograms and, thus, we shall
In defining a RF it is important to focus on the semi-variogram from this
consider how the RV will be allowed to point onwards.
vary through space x. One simple possibility The semi-variogram defines the relations
is to allocate to every position x in space between points and, thus, facilitates spatial
its own cdf, with each independent of all statistical inference. It is usually estimated
other cdfs. A problem is that this model from empirical data as a plot of half the
requires a large number of parameters; average squared difference between pairs
one set (e.g., mean and variance of the of values (the semivariance) against the
Gaussian model) for each possible location. vector separation or lag. Then a mathe-
Moreover, such a possibility is unlikely matical model is commonly fitted to the
to be realistic in practice; we know that empirical semi-variogram plot for use in
places close together tend to have similar geostatistical operations. Various methods
characteristics. Therefore, this model is too may be employed in the fitting, although
loosely controlled and does not make use of weighted least squares is a common basic
our practical knowledge of spatially varying starting point. Several important considera-
phenomena. For these reasons, we place tions should be taken into account during
some restrictions on the RF model. The model fitting (see McBratney and Webster,
most common set of restrictions are referred 1986). Once the parameters are estimated
to as stationarity constraints, meaning that (either with or without the uncertainty of
particular parameters are invariant with x. estimation accounted for) the RF is defined
In the strictest sense, the mean and variance and geostatistical operations can proceed.
parameters can be held constant for all Variogram estimation and model fitting are
locations x. However, under this model each described in section 9.2.
point is identical, independent distributed The mean and semi-variogram are, thus,
(iid), meaning that spatial inference is the parameters that define the RF model,
severely limited (we now have too tight a and that need to be estimated, effectively
control over the possibilities). replacing the mean and variance of the
In geostatistics, it is common to define a RV model. It should be pointed out that
stationary mean parameter. Various alterna- the variogram may itself be comprised of
tive models have been proposed in which the several further parameters. For example,
mean is allowed to vary through space. Such the spherical model is an example of a
a non-stationary mean parameter is generally transitive variogram model (i.e., for which
referred to as a trend (see Goovaerts, a positive finite maximum value is defined).
GEOSTATISTICS AND SPATIAL INTERPOLATION 161
The spherical model has two parameters; the In particular, the variance of the conditional cdf
sill c and the non-linear parameter a usually (ccdf) is likely to be less than that of the original
referred to as the range. The sill defines the cdf. In general terms, this means that the range
maximum value of semivariance while the of possible values for the unknown value is
range defines the lag at which the sill is restricted to be close to the neighbouring data by
an amount determined by the spatial proximity of
reached.
the prediction location to the neighbours. Such
Geostatistical operations include spatial
information can be used to extend the process
prediction, spatial simulation, regularization of spatial prediction (in which the mean of the
and spatial optimization. In spatial prediction posterior or conditional cdf is drawn) to spatial
or kriging, the objective is to predict the simulation (in which a value is drawn from the
value of z(x0 ) at some unobserved location x0 ccdf randomly).
given a sample of data z(xi ), i = 1, 2, . . ., n
usually defined on point supports (the space
on which each observation is defined) or Geostatistics, as described above, has
quasi-points. The RF model helps because been used widely to characterize spatial
it is useful to base the prediction of z(x0 ) variation (using the semi-variogram or other
on a model that captures our knowledge function) in relatively small data sets and
of the underlying processes or form. In to predict unobserved values using kriging
environmental science (in the broadest sense) informed by the modelled semi-variogram.
process knowledge is often limited and In such circumstances, the decision to
the RF model provides a useful stochastic adopt a stationary model of the mean
framework that builds on some general and semi-variogram makes sense. In fact,
principles. it is necessary for statistical inference.
The RF model is useful for several reasons, However, very large spatially-extensive and
but prime among them are: spatially-detailed data sets are increasingly
readily available. Examples include digital
elevation data and image-based data sets
1 the dependence of the prediction z (x0 ) on the provided primarily through remote sensing
data z (xi ), i = 1, 2, . . . , n is estimated by the
(Atkinson, 2005). Researchers and practi-
semi-variogram. In a general sense, the closer
tioners are increasingly overwhelmed by
z (x0 ) and a given data point the more similar the
the magnitudes of the datasets available
two values are likely to be. The semi-variogram
quanties this spatial dependence. Critically, for analysis. This has led to a realization
this means in a linear weighting of proximate that the biggest problem facing spatial
data to be used in spatial prediction the analysts today is one of data richness
weights can be determined automatically through rather than data sparcity. In these cir-
linear algebra. This process is referred to as cumstances, a stationary RF model is not
kriging. only inappropriate, but wasteful of data.
More suitable solutions can be found by
2 In kriging, the relations between the sample data allowing the previously stationary parame-
themselves are accounted for so that, at a given
ters to vary across the region of interest
separation, a cluster of data points will contribute
(Atkinson, 2001).
less to the prediction than a dispersed set (Journel
The present chapter provides an
and Huijbregts, 1978).
introduction to linear geostatistics, but with a
3 The cdf of the predicted value (i.e., the set of particular focus on models that include non-
possible values from which one realization is stationarity parameters, particularly of (a) the
drawn) can be conditioned on the sample data. mean and (b) the semi-variogram. The next
162 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
section describes the process of fitting the RF these weights. The most commonly used
model parameterized by a spatial covariance approach is based on the estimated semi-
or semi-variogram, while section 9.3 variogram. The experimental semi-variogram
describes geostatistical prediction (kriging). is estimated by calculating the squared
Section 9.4 considers non-stationary models differences between all the available paired
and section 9.5 discusses a range of observations and obtaining half the average
issues related to the use of geostatistics for all observations separated by a given
within GIS. lag (or within a lag tolerance where the
observations are not on a regular grid). So,
while the semi-variogram cloud provides
semivariances as a function of a set of
actual lags the experimental semi-variogram
9.2. CHARACTERIZING SPATIAL provides only a set of average semivariances
VARIATION at a set of discrete lags. Examination of
the semi-variogram cloud provides a means
9.2.1. Estimating the experimental
of identifying heterogeneities in spatial
semi-variogram
variation within a variable (Webster and
Much of the effort and time associated with Oliver, 2000) that are obscured through
geostatistical analysis is expended in analysis the summation over lags that occurs with
of the spatial structure of a variable. One the experimental semi-variogram. Therefore,
simple way of examining spatial structure is examination of the semi-variogram cloud is
through estimating the semi-variogram cloud. a sensible step prior to estimation of the
The semi-variogram cloud is a plot of the experimental semi-variogram.
semivariances for paired data against the The experimental semi-variogram, (h),
distances separating the paired data points can be estimated from p(h) paired observa-
in a given direction. The semivariance is tions, z(x ), z(x + h), = 1, 2, . . ., p(h)
half the squared difference between values using:
at two locations, and can be thought of
as a measure of dissimilarity. Thus, the
1
p(h)
semi-variogram cloud shows how dissimilar
2
(h) = z(x ) z(x + h)
paired data points are as a function of their 2 p(h)
=1
separation distance and direction (termed (9.1)
spatial lag, h). If data are spatially structured
then pairs separated by small lags will tend
to be less dissimilar than pairs separated by The semi-variogram can be estimated for dif-
large lags. ferent directions to enable the identification
A core idea in geostatistics is that the of directional variation (termed anisotropy).
spatial structure in a variable should be Where a variable is preferentially sampled
characterized and used for spatial prediction in areas with large or small values of the
and simulation. The objective of geostatis- property of interest, the histogram will be
tical prediction is to find optimal weights unrepresentative and often a declustering
to assign to observations located around algorithm is necessary to correct this. For
the prediction location. If information is example, values in areas or cells with more
available on how dissimilar two observations data may be given smaller weights than
are likely to be for a given lag then values in sparsely sampled areas (Deutsch
this information can be used to determine and Journel, 1998). Preferential sampling
GEOSTATISTICS AND SPATIAL INTERPOLATION 163
of a variable also impacts on the form of a priori variance. The range, a, represents the
the experimental semi-variogram. Richmond scale of spatial variation (Atkinson and Tate,
(2002) shows that clustering can, in some 2000). For example, if a measured property
cases, alter drastically the form of the semi- varies markedly over small distances then the
variogram. Two methods of declustering for property can be said to exhibit short range
weighting paired data in estimation of the spatial variation.
experimental semi-variogram are given by Some of the most commonly used author-
Richmond (2002). ized models are detailed below. The nugget
In the presence of large-scale, low- effect model, defined above, is given by:
frequency variation (e.g., that would be fitted
well by a trend model), the form of the
semi-variogram will be affected. If the semi- 0 for h = 0
(h) = (9.2)
variogram increases more rapidly than a c0 for |h| > 0.
quadratic polynomial for large lags then a RF
which is non-stationary in the mean should
be adopted (Armstrong, 1998). This topic is Three of the most frequently used bounded
explored in greater depth in section 9.4.1. models are the spherical model, the expo-
nential model and the Gaussian model and
these are defined in turn. The spherical
model is perhaps the most widely used
9.2.2. Fitting a semi-variogram
semi-variogram model. Its form corresponds
model
closely with what is often observed in
A mathematical model may be fitted to many real world studies; almost linear
the experimental semi-variogram and the growth in semivariance with separation and
coefficients of this model can be used for then stabilization (Armstrong, 1998). It is
a range of geostatistical operations such as given by:
spatial prediction (kriging) and conditional
simulation. A model is usually selected from
one of a set of so-called authorized models. c[1.5(h/a)0.5(h/a)3 ] if h a
(h) =
McBratney and Webster (1986) provide a c if h > a
review of some of the most widely used (9.3)
authorized models. There are two principal
classes of semi-variogram model. Transitive
(bounded) models have a sill (finite variance), where c is the sill of the spherical model and
and indicate a second-order stationary pro- a is the non-linear parameter, known as the
cess. Unbounded models do not reach an range.
upper bound; they are intrinsically station- The exponential model is given by:
ary only (McBratney and Webster 1986).
Figure 9.1 shows the parameters of a bounded
semi-variogram model (the spherical model h
as defined below). The nugget effect, c0 , (h) = c 1 exp (9.4)
d
represents unresolved variation (a mixture
of spatial variation at a scale finer than
the sample spacing and measurement error). where d is the non-linear distance parameter.
The sill, c, represents the spatially correlated The exponential model reaches the sill
variation. The total sill, c0 + c, is the asymptotically and the practical range is 3d
164 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
Range (a)
Sill (c)
Nugget (c0)
Lag (h)
(i.e., the separation at which approximately where is a power 0 < < 2 with
95% of the sill is reached). a positive slope, m (Deutsch and Journel,
The Gaussian model is given by: 1998). The linear model is a special case of
the power model.
One of the advantages of kriging is
2 that it is often fairly straightforward to
h
(h) = c 1 exp 2 . (9.5) model anisotropic structure using the semi-
d
variogram. Two primary forms of anisotropy
have been outlined in the geostatistical
literature. If the sills for all directions are not
The Gaussian model does not reach a sill
significantly different and the same structural
at a finite
separation and the practical range components (for example, spherical or
is a 3 (Journel and Huijbregts, 1978).
Gaussian) are used then anisotropy can be
Semi-variograms with parabolic behaviour at
accounted for by a linear transformation of
the origin, as represented by the Gaussian
the co-ordinates: this is called geometric
model here, are indicative of very regular
or affine anisotropy (Webster and Oliver,
spatial variation (Journel and Huijbregts,
1990). Where the sill changes with direction
1978). Authorized models may be used in
but the range is similar for all directions
positive linear combination where a single
the anisotropy is called zonal (Isaaks and
model is insufficient to represent well the
Srivastava, 1989). However, the modelling of
form of the semi-variogram.
zonal anisotropy is much more problematic
Where the semi-variogram appears to
than the modelling of geometric anisotropy.
increase indefinitely with separation the most
In practice, a mixture of geometric and zonal
widely used model is the power model:
anisotropy has been found to be common
(Isaaks and Srivastava, 1989).
There are various approaches for fitting mod-
(h) = mh (9.6) els to semi-variograms. Some geostatisticians
GEOSTATISTICS AND SPATIAL INTERPOLATION 165
800
700
600
Semivariance (mm2)
500
400
300
200
100
Precipitation
50.34 Nug(0) + 229.939 Sph(14159.8) + 475.979 Sph(154817)
0
0 50000 100000 150000 200000
Lag (m)
1200 Semivariance
0 degrees
22.5 degrees
45 degrees
1000 67.5 degrees
90 degrees
112.5 degrees
135 degrees
Semivariance (mm2)
600
400
200
0
0 50000 100000 150000 200000
Lag (m)
So, the objective of the kriging system is where OK is a Lagrange muliplier. Knowing
to find appropriate weights by which the OK , the kriging variance, an estimator of the
available observations will be multiplied prediction variance of OK, can be given as:
before summing them to obtain the predicted
value. These weights are determined using
the coefficients of a model fitted to the semi-
n
OK
2
= lOK
(x x0 ) + OK . (9.12)
variogram (or another function such as the
=1
covariance function).
The kriging prediction error must have an
expected value of 0: The kriging variance is a measure of
confidence in predictions and is a function of
the form of the semi-variogram, the sample
E{ZOK (x0 ) Z(x0 )} = 0. (9.9) configuration and the sample support (Journel
and Huijbregts, 1978). The kriging variance
is not conditional on the data values locally
The kriging (or prediction) variance, OK
2 , is
and this has led some researchers to use alter-
expressed as: native approaches such as conditional simu-
lation (discussed in the next section) to build
models of spatial uncertainty (Goovaerts,
OK
2
(x0 ) = E[{ZOK (x0 ) Z(x0 )}2 ] 1997).
There are two varieties of OK: punctual
n
OK and block OK. With punctual OK the pre-
=2 lOK
(x x0 )
dictions cover the same area (the support, v)
=1
as the observations. In block OK, the
n
n predictions are made to a larger support than
lOK OK
l (x x ). the observations. With punctual OK the data
=1 =1 are honoured. That is, they are retained in
(9.10) the output map. Block OK predictions are
averages over areas (that is, the support has
increased). Thus, at x0 the prediction is not
That is, we seek the values of l1 , . . . , ln the same as an observation and does not need
(the weights) that minimize this expression to honour it.
with the constraint that the weights sum to The choice of semi-variogram model
one (equation (9.8)). This minimization is affects the kriging weights and, therefore,
achieved through Lagrange multipliers. The the predictions. However, if the form
conditions for the minimization are given by of two models is similar at the origin
the OK system comprising n + 1 equations of the semi-variogram then the two sets of
and n + 1 unknowns: results may be similar (Armstrong, 1998).
The choice of nugget effect may have
n marked implications for both the predictions
OK
lOK
=1
Srivastava, 1989).
=1 A map of precipitation in Britain in
(9.11) January 1999 generated using OK is shown
168 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
in Figure 9.4. It was generated using the the expected values (i.e., the mean) but
semi-variogram model given in Figure 9.2 are values drawn randomly from the
and the 16 nearest neighbours to each grid conditional cdf: a function of the available
cell were used in the prediction process. observations and the modelled spatial
The map is very smooth in appearance; variation (Dungan, 1999). The simulation
this is a common feature of maps derived is considered conditional if the simulated
using OK. values honour the observations at their
locations (Deutsch and Journel, 1998).
As noted above, simulated realizations
represent a possible reality whereas kriging
9.3.2. Cokriging does not. Simulation allows the generation
of many different possible realizations that
Where a secondary variable (or variable)
may be used as a guide to potential errors
is available that is cross-correlated with
in the construction of a map (Journel, 1996)
the primary variable both variables may
and multiple realizations encapsulate the
be used simultaneously in prediction using
uncertainty in spatial prediction. Arguably,
cokriging. To apply cokriging, the semi-
the most widely used form of conditional
variograms (that is, auto semi-variograms)
simulation is sequential Gaussian simulation
of both variables and the cross semi-
(SGS). With sequential simulation, simulated
variogram (describing the spatial dependence
values are conditional on the original data
between the two variables) are required.
and previously simulated values (Deutsch
The operation of cokriging is based on
and Journel, 1998). In SGS the ccdfs
the linear model of coregionalization (see
are all assumed to be Gaussian. SGS is
Webster and Oliver, 2000). For cokriging to
discussed in detail in several texts (for
be beneficial, the secondary variable should
example, Goovaerts, 1997; Deutsch and
be cheaper to obtain or more readily available
Journel, 1998; Chils and Delfiner, 1999;
to make the most of the technique. If the
Deutsch, 2002).
variables are clearly linearly related then
cokriging may estimate more accurately than,
for example, OK.
Precipitation (mm) N
value
High : 271
Low : 1
800
700
600
Semivariance (mm2)
500
400
300
200
Order 0
100
Order 1
Order 2
0
0 50000 100000 150000 200000
Lag (m)
Figure 9.5 Semi-variogram of precipitation: raw data (order 0) and residuals from a
polynomial trend of order 1 and 2.
GEOSTATISTICS AND SPATIAL INTERPOLATION 171
700
Semivariance (mm2, dir. <x, y > 45 +/ 22.5)
600
500
400
300
200
100
Semivariance
82.16 Nug(0) + 229.658 Sph(15581.7) + 337.892 Sph(131143)
0
0 20000 40000 60000 80000 100000 120000 140000
Lag (m)
be estimated yet the local trend (or drift) is approaches that make use of secondary
estimated as a part of the KT procedure which variables that describe the shape of the mean
itself requires the semi-variogram. Various in the primary variable. If some variable is
approaches for estimating the trend-free available that is linearly related to the primary
semi-variogram are described in the literature variable and varies smoothly (i.e., there are
and two approaches are summarized above. no marked local changes in values) it could
Figure 9.7 shows the KT predictions made be used to inform spatial prediction of values
using 16 nearest neighbours with the semi- of the primary variable. Two such approaches
variogram model given in Figure 9.6; the are described below.
semi-variogram for the direction with the With SK, the mean is assumed to be
least evidence of trend. An alternative constant (there is no systematic change in
approach is Intrinsic Random Functions of the mean of the property across the region of
Order k kriging whereby the generalized study) and known. If the mean is not constant,
covariance is used in place of the semi- but we can estimate the mean at locations in
variogram (Chils and Delfiner, 1999). the domain of interest, then this locally vary-
ing mean can be used to inform prediction.
That is, the local mean can be estimated prior
to kriging. The locally-varying mean can
Making use of secondary variables: be estimated in various different ways. One
KED and SKlm approach, termed simple kriging with locally
As well as estimating the form of the trend varying means (SKlm), is to use regres-
from the variable of interest, there are various sion to estimate the value of the primary
172 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
Precipitation (mm) N
value
High : 408
Low : 0
variable at (a) all observation locations and estimating directional semi-variograms and
(b) all locations where SKlm predictions retaining the semi-variogram for the direction
will be made. The semi-variogram is then that showed least evidence of trend. That is,
estimated using the residuals from the temperature values systematically increase or
regression predictions at the data locations. decrease in one direction (there is a trend
SKlm is conducted using the residuals and in the values), but values of temperature
the trend is added back after the prediction are more constant in the perpendicular
process is complete (an example is given by direction. In such cases, the concern is
Lloyd, 2005). to characterize spatial variation in the
An alternative approach is kriging with direction for which values of temperature
an external drift model (KED). In KED, are homogeneous. Hudson and Wackernagel
the secondary data act as a shape func- (1994) assumed that the trend-free semi-
tion (the external trend) and the function variogram was isotropic and the semi-
describes the average shape of the primary variogram for the direction selected was used
variable (Wackernagel, 2003). The local for kriging.
mean of the primary variable is derived as
a part of the kriging procedure using the
secondary information and SK is carried
out on the residuals from the local mean.
9.4.2. Non-stationary
So, the approach differs from SKlm in
semi-variogram
that the local mean is estimated as part
of the kriging procedure and not before In cases where the semi-variogram does
it, as is the case with SKlm (Goovaerts, not represent well spatial variation across
1997). Lloyd (2002, 2005) illustrates the use the whole of the region of interest some
of KED in mapping monthly precipitation approach may be necessary to account
whereby elevation is used as the external for the change in spatial variation locally.
trend. In the geostatistical literature, there are
As noted above, a major problem with KT several approaches presented for estimation
and KED is that the underlying (trend-free) of non-stationary semi-variograms. These
semi-variogram is assumed known. That is, vary from approaches that estimate and
if the mean changes from place to place the model automatically the semi-variogram in a
semi-variogram estimated from the raw data moving window (this approach is discussed
will be biased, so it is necessary to remove the below) to approaches that transform the
local mean and estimate the semi-variogram data so that the transformed data have
of the residuals. Since the trend (that is, a stationary semi-variogram. Reviews of
local mean) is estimated as a part of the some methods are provided by Sampson
KED (and KT) system, which requires the et al. (2001) and Schabenberger and
semi-variogram model coefficients as inputs, Gotway (2005).
we are faced with a circular problem. The estimation and automated modelling
A potential solution is to infer the trend- of local semi-variograms for kriging is one
free semi-variogram from paired data that are published approach that accounts for non-
largely unaffected by any trend (Goovaerts, stationarity in the semi-variogram (Haas,
1997; Wackernagel, 2003). Hudson and 1990). This approach is employed here.
Wackernagel (1994), in an application The WLS semi-variogram model fitting
concerned with mapping mean monthly routine presented by Pardo-Igzquiza (1999)
temperature in Scotland, achieved this by was used to fit models to semi-variograms
174 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
Str. comp. N
170328
329531
532776
7771044
10451193
1400
N
Range 1200
Semivariance (m2)
1136017396 1000
1739720373 800
2037423703 600
400
2370426995
200 Semivariance
2699630795 1168 Sph(18841)
0
0 10000 20000 30000 40000 50000 60000 70000
Lag (m)
1200
1000
Semivariance (m2)
800
600
400
200
Semivariance
973 Sph(21728)
0
0 10000 20000 30000 40000 50000 60000 70000
Lag (m)
700
600
Semivariance (m2)
500
400
300
200
100 Semivariance
571 Sph(16723)
0
0 10000 20000 30000 40000 50000 60000 70000
Lag (m)
200
Semivariance (m2)
150
100
50
Semivariance
172 Sph(21669)
0
0 10000 20000 30000 40000 50000 60000 70000
Lag (m)
200
150
Semivariance (m2)
50
Semivariance
180 Sph(28911)
0
0 10000 20000 30000 40000 50000 60000 70000
Lag (m)
Figure 9.9 Range of spherical model for a moving window, showing ve selected
semi-variograms with automatically tted models.
GEOSTATISTICS AND SPATIAL INTERPOLATION 177
been used in local downscaling cokriging Lloyd et al. (2005) have used the local
(Pardo-Igzquiza and Atkinson, 2007). range to show that the choice of optimum
spatial resolution for a given scene itself
varies locally. In the multivariate case, local
variation in the parameters of the linear
model of co-regionalization contains more
9.5.3. The objective of
information than the parameters mapped
non-stationary modelling
through GWR. The latter omits information
Several other chapters of this book have been on the spatial correlation in each variable,
concerned with geographically weighted as well as the cross-correlation between
regression (GWR). The non-stationary variables.
approaches presented in this chapter differ One of the reasons that local modelling
from GWR in their objective. For GWR is so important for remotely sensed images
the objective is to explore the spatially is that remotely sensed scenes rarely lend
varying parameters of a local regression themselves to description using the RF model
model; spatial variation in the estimated directly. Often, scenes are comprised of sev-
parameters is the primary interest. For the eral objects arranged on a background (e.g.,
non-stationary mean and semi-variogram buildings in a rural area) or comprised of a
modelling presented in this chapter the mosaic of objects (e.g., an agricultural scene).
objective is spatial prediction or some other In such circumstances, it is unreasonable to
geostatistical operation. Thus, non-stationary expect the RF model parameterized with a
modelling will be useful where it leads to an global semi-variogram function to capture
increase in the precision of prediction and the full range of variability in the image
where it leads to an increase in the precision locally. Non-stationary variogram modelling
of the estimation of the prediction variance. achieved by fitting within a moving window
While the objective is spatial prediction, it goes some way to addressing this problem,
is often informative to map the non-stationary but probably not far enough. It would be
parameters (in the sense of GWR). For preferable to define the objects of interest
GWR, the coefficients inform on local and then fit the RF model locally within the
relations between variables. For geostatistics, boundaries of those objects. For example,
the non-stationary mean and (especially) Berberoglu et al. (2000) and Lloyd et al.
semi-variogram parameters inform on the (2004) estimated semi-variograms on a per-
nature of local spatial structure. For example, field basis; semi-variograms were estimated
the local sill c is very much related to the using values within pre-defined boundaries.
magnitude of variation locally. The local The semivariances were then used as inputs,
sill parameter is related mathematically to along with spectral values, to maximum
the local variance (LV), which itself has likelihood and artificial neural network
been used repeatedly as a texture mea- (ANN) classifiers.
sure in describing remotely sensed images
(e.g., Bocher and McCloy, 2006). The local
range parameter is related to the scale of
spatial variation locally. The local range has
9.5.4. When is local, local enough?
also been mapped and used as a texture
measure in the classification of remotely The size of neighbourhood within which
sensed images (e.g., Ramstein and Raffy, the local variogram is estimated, whether
1989; Atkinson and Lewis, 2000). Recently, defined in terms of a search radius or the
178 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
nearest number of data points, represents a this chapter (see section 9.4), and such
compromise between two competing factors. approaches overcome the problem of nonsta-
The first is the desire to achieve suf- tionarity of the mean and variogram which
ficient data points (i.e., sufficiently large is likely to be encountered if the region of
neighbourhood) to reduce the uncertainty concern is large.
of variogram estimation to a tolerable Perhaps the biggest change in focus in the
level. McBratney and Webster (1986) and application of geostatistics in the last 20 years
Webster and Oliver (1992) provide excellent has been a shift from prediction (kriging)
discussions of the number of data required based analyses to those based on conditional
for reliable estimation of the variogram. simulation (see section 9.3.3). Simulation
The second is the desire to reduce the allows the generation of many equally-
neighbourhood such as to localize suf- probable realizations and the exploration
ficiently the variogram parameters. With of spatial uncertainty in the property of
regard to the latter point, it should be interest. In cases where extreme values are
remembered that since the objective is of interest kriging is problematic because
precise spatial prediction, what is actually of its smoothing properties. In such cases,
required is to represent accurately the local conditional simulation is more appropriate
variogram within the window used for (Goovaerts, 1997).
local kriging. So an extremely localized Another research focus has been on
variogram may be counter-productive. Ulti- the use and development of model-based
mately, a balance between these factors geostatistics (Diggle and Ribeiro, 2006).
should be achieved, potentially through The term was coined by Diggle et al.
calibration of the window size, although (1998) who introduced a body of approaches
this possibility is often too expensive that is applicable where Gaussian distribu-
computationally. tional assumptions, and therefore classical
geostatistics, are inappropriate. A Bayesian
approach is presented that the authors
argue enables uncertainty in the prediction
9.6. FUTURE TRENDS IN of model parameters to be accounted for
GEOSTATISTICS properly.
The advances in geostatistical methodol-
The availability of extensive data sets which ogy that have been made are limited in their
cover large areas and have a variety of application if extensive expert knowledge
supports poses problems for conventional is required to apply such models. In the
geostatistics, as this chapter indicates. Much last decade, the range of software packages
research is being conducted to develop with extensive geostatistical functionality has
solutions to the kinds of problems that grown markedly. Functions for estimating
have arisen. Gotway and Young (2002) variograms and for kriging and simulation
review a variety of approaches for area are now commonplace in GIS software.
to point interpolation while Kyriakidis Undoubtedly, with widespread access to
(2004) outlines one possible framework in often very sophisticated methods misuse and
the univariate (kriging) case and Pardo- misunderstanding are apparent (Atkinson,
Igzquiza and Atkinson (2007) a possible 2005). However, an increasingly well edu-
solution in the multivariate (cokriging) case. cated user base will hopefully contribute
There are various nonstationary geostatistical to more effective use of spatial data in all
models, as discussed at some length in application areas.
GEOSTATISTICS AND SPATIAL INTERPOLATION 179
Gotway, C.A. and Young, J.J. (2002). Combining Lloyd, C.D., Berberoglu, S., Curran P.J. and Atkinson
incompatible spatial data. Journal of the American P.M. (2004). Per-eld mapping of Mediterranean
Statistical Association, 97: 632648. land cover: A comparison of texture measures.
International Journal of Remote Sensing, 15:
Haas, T.C. (1990). Lognormal and moving window
39433965.
methods of estimating acid deposition. Journal of
the American Statistical Association, 85: 950963. McBratney, A.B. and Webster, R. (1986). Choosing
Herzfeld, U.C. and Holmlund, P. (1990). Geostatistics functions for semi-variograms of soil properties and
in glaciology: implications of a study of tting them to sampling estimates. Journal of Soil
Scharffenbergbotnen, Dronning Maud Land, East Science, 37: 617639.
Antarctica. Annals of Glaciology, 14: 107110. Oliver, M.A. and Webster, R. (1990). Kriging: a
Hudson, G. and Wackernagel, H. (1994). Mapping method of interpolation for geographical information
temperature using kriging with external drift: theory systems. International Journal of Geographical
and an example from Scotland. International Journal Information Systems, 4: 313332.
of Climatology, 14: 7791.
Pardo-Igzquiza, E. (1999). VARFIT: a Fortran-77
Isaaks, E.H. and Srivastava, R.M. (1989). An Intro- program for tting semi-variogram models by
duction to Applied Geostatistics. New York: Oxford weighted least squares. Computers and Geosciences,
University Press. 25: 251261.
Journel, A.G. (1996). Modelling uncertainty and spatial Pardo-Igzquiza, E. and Atkinson, P.M. (2007).
dependence: stochastic imaging. International Automatic modelling of variograms and cross-
Journal of Geographical Information Systems, 10: variograms in downscaling cokriging by numerical
517522. convolution-deconvolution. Computers and Geo-
sciences, 33 12731284.
Journel, A.G. and Huijbregts, C.J. (1978). Mining
Geostatistics. London: Academic Press. Pebesma, E.J. and Wesseling, C.G. (1998). Gstat,
Kyriakidis, P.C. (2004). A geostatistical framework a program for geostatistical modelling, prediction
for area-to-point spatial interpolation. Geographical and simulation. Computers and Geosciences, 24:
Analysis, 36: 259289. 1731.
Lloyd, C.D. (2002). Increasing the accuracy of Ramstein, G. and Raffy, M. (1989). Analysis of the
predictions of monthly precipitation in Great Britain structure of radiometric remotely-sensed images.
using kriging with an external drift. In: Foody, G.M. International Journal of Remote Sensing, 10:
and Atkinson, P.M. (eds), Uncertainty in Remote 10491073.
Sensing and GIS, pp. 243267. Chichester: John
Richmond, A. (2002). Two-point declustering for
Wiley and Sons.
weighting data pairs in experimental semi-variogram
Lloyd, C.D. (2005). Assessing the effect of integrating calculations. Computers and Geosciences, 28:
elevation data into the estimation of monthly 231241.
precipitation in Great Britain. Journal of Hydrology,
308: 128150. Sampson, P.D., Damien, D. and Guttorp, P. (2001).
Advances in modelling and inference for envi-
Lloyd, C.D. and Atkinson P.M. (2004). Archaeology and ronmental processes with non-stationary spatial
geostatistics. Journal of Archaeological Science, 31: covariance. In: Monestiez, P., Allard, D. and
151165. Froidevaux, R. (eds), GeoENV III: Geostatistics for
Environmental Applications, pp. 1732. Dordrecht:
Lloyd, C.D., Atkinson, P.M. and Aplin, P. (2005).
Kluwer Academic Publishers.
Characterising local spatial variation in land cover
imagery using geostatistical functions and the dis- Schabenberger, O. and Gotway, C.A. (2005). Statistical
crete wavelet transform. In: Renard, P., Demougeot- Methods for Spatial Data Analysis. Boca Raton:
Renard, H. and Froidevaux, R. (eds), Geostatistics for Chapman and Hall/CRC.
Environmental Applications: Proceedings of the
Fifth European Conference on Geostatistics for Wackernagel, H. (2003). Multivariate Geostatistics.
Environmental Applications. pp. 391402. Berlin: An Introduction with Applications, 3rd edn. Berlin:
Springer. Springer.
GEOSTATISTICS AND SPATIAL INTERPOLATION 181
Webster, R. and Oliver, M.A. (1990). Statistical Webster, R. and Oliver, M.A. (2000). Geostatistics
Methods in Soil and Land Resource Survey. Oxford for Environmental Scientists. John Wiley and Sons:
University Press: Oxford. Chichester.
Webster, R. and Oliver, M.A. (1992). Sample Webster, R. and McBratney, A.B. (1989). On the Akaike
adequately to estimate variograms of soil information criterion for choosing models for semi-
properties. Journal of Soil Science, 43: variograms of soil properties. Journal of Soil Science,
177192. 40: 493496.
10
Spatial Sampling
Eric Delmelle
xi = x1 + (i 1), yi = y1 + (j 1)
xi = Ki L, yi = Ki L, (10.1)
i, j = 1, . . ., m. (10.2)
0.9
0.8
0.7
0.6
y 0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
(a)
0.9
0.8
0.7
0.6
y 0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
(b)
Figure 10.1 From left to right, top to bottom: random, centric systematic, systematic
random, and systematic unaligned sampling schemes. Sampling size m = 100.
SPATIAL SAMPLING 187
0.9
0.8
0.7
0.6
y 0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
(c)
1
b e g
a d f
0.9
j
c k
0.8
l
h
0.7
i
0.6
y 0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
(d)
design coincides in frequency with a regular method that combines systematic and random
pattern in the landscape (Grifth and Amrhein, procedures (Dalton et al., 1975). One sample
1997; Overton and Stehman, 1993). point is randomly selected within each
cell. However, sample density needs to
The second drawback can be lessened be high enough to have some clustering
considerably by use of a systematic random of observations or the spatial relationship
188 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
between observations cannot be built. From Consider the estimation of the global
Figure 10.1(c), some patches of D remain mean zD :
undersampled, while others regions show
evidence of clustered observations. A system-
1
atic unaligned scheme prevents this problem zD = z(s) ds. (10.3)
from occurring by imposing a stronger [D] D
1
C
0.9 D
F
0.8
E
0.7
0.6
y 0.5
0.4
0.3
B A
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
(a)
1
C D
0.9
F
0.8
E
0.7
0.6
y 0.5
0.4
0.3
B A
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
(b)
Figure 10.2 Stratied sampling designs with six strata of different sizes (m = 6 on the right
gure and m = 100 to the left).
with W defined as a weight matrix w(sij ), of spatial proximity between points si and sj ;
m is the number of observations, the mean for example:
of the sampled values is denoted by z
and z(si ) is the measured attribute value at
location si . The weight w(sij ) is a measure w(sij ) = exp(d(sij )2 ) (10.6)
190 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
where d(sij )2 is the squared distance between these counties we might sample a number of
location si and point sj . Morans I is not quadrats, or say, townships and finally, within
implicitly constrained within the interval the latter, randomly select some farmsteads
[1, +1]. Spatial autocorrelation generally (King, 1969).
decreases as the distance between sample In the multivariate case, dependent and
points increases. A positive autocorrelation independent variables are hierarchically
occurs when values taken at nearby samples organized and are thus not collected at
are more alike than samples collected farther the same sampling frequency (Haining,
away. When the autocorrelation is a linearly 2003). The primary variable may exhibit
decreasing function of distance, stratified rapid change in spatial structure while the
random sampling has a smaller variance secondary variables are much more homo-
than a systematic design (Quenouille, 1949). geneous. A hierarchical sampling design
If the decrease in autocorrelation is not linear, captures such variation by collecting one
yet concave upwards, systematic sampling variable at points nested within larger sam-
is more accurate than stratified random pling units so that it can be collected more
sampling, and a centered systematic design, intensively than another variable.
where each point falls exactly in the middle
of each interval, is more efficient than a
random systematic sampling configuration Clustered sampling
(Madow, 1953; Zubrzycki, 1958; Dalenius This type of sampling consists of the
et al., 1960; Bellhouse, 1977; Iachan, 1985). random selection of groups of sites where
sites are spatially close within groups
(Cressie, 1991). Clusters of observations are
drawn independently with equal probability.
10.2.3. Other sampling designs
In the first stage, when the population
Nested or hierarchical sampling is grouped into clusters, the clusters are
Nested or hierarchical sampling designs first sampled (Haining, 2003). Either all of
require the study area D to be partitioned the observations in the clusters, or only
randomly into sample units (or blocks) a random selection from it, are included.
creating the first level in the hierarchy, Cluster sampling is essentially useful in the
and this is then further subdivided into discrete case, when a complete list of the
sample units nested within level 1, and members of a population cannot be obtained,
so forth (Haining, 2003). These units can yet a complete list of groups (i.e., clusters) of
be systematically or irregularly arranged. the variable is available. The method is also
As the process progresses, the distances useful in reducing sampling cost.
between observations decreases (Corsten and
Stein, 1994). One advantage of a nested
sampling design is that it allows for multiple
scale analysis and supports quadrat analysis. 10.3. SAMPLING RANDOM FIELDS
Spatially nested sampling designs may work USING GEOSTATISTICS
well for geographic phenomenon that are
naturally clustered and for exploring multiple Most classical statistical sampling methods
scale effects. Hierarchical sampling is also make no use of the spatial information
possible at the discrete level. In such cases, provided by nearby samples. Geostatistics
it is desirable to first select randomly one describes the spatial continuity that is an
or more counties in a state. Then within essential feature of many natural phenomena.
SPATIAL SAMPLING 191
(a) (b)
2 C1,2 . . . C1,m a higher interpolation uncertainty, which
C2,1 2 C2,m is increasing away from existing points.
CM = . .. .. (10.14)
.. . . The estimation error is low at visited
Cm,1 Cm,2 2 points.
2
C2,1 Distance-based criteria
c = . , cT = 2 C1,2 . . . C1,m . It is possible to design sampling config-
..
urations considering explicitly the spatial
Cm,1 correlation of the variable (Arbia, 1994).
(10.15) What would you do if you were in a dark
room with candles? You would probably
light the first candle at a random location
The total kriging variance TKV is obtained
or in the middle of the room. Then you
by integrating Equation (10.13) over D:
would find it convenient to light the second
candle somewhere further away from the
first. How far away will depend on the
TKV = k2 (s)ds. (10.16)
D luminosity of the first candle. The stronger
the light, the further it can be located from
the first candle. You would then light the
Computationally, it is easier to discretize D third candle far away from the two first ones.
and sum the kriging variance over all grid Such an approach known as Depending
points sg . The average kriging variance AKV Areal Units Sequential Technique (DUST)
over the study area is defined as: is an infill sampling algorithm, and very
suitable to locate points to minimize the
kriging variance over D. Another method,
AKV = k2 (sg ). (10.17)
known as the Minimization of the Mean
gG
of the Shortest Distances (MMSD) requires
all sampling points spread evenly over
The only requirement to calculate the kriging the study area, ensuring that unvisited
variance is to have an initial covariogram and locations are never far from a sampling
the locations of the m initial sample points. point. Both MMSD and DUST methods
It then depends solely on the spatial depen- assume:
dence and configuration of the observations
(Cressie, 1991).
1 prior knowledge of the spatial structure of the
variable; and
Illustration
2 a stationary variable an assumption violated in
Since continuous sampling is not feasible,
practice.
it is necessary to discretize the area into
a set of potential points. Seeking the best
sampling procedure becomes a combinatorial Both criteria are purely deterministic, result-
problem. Figure 10.4 illustrates the kriging ing in spreading pairs of points evenly across
variance associated with random sampling the study area, similar to the systematic
and systematic random sampling from an configuration. Van Groenigen (1997) notes
exponential model. Darker areas denote that the area D is a continuous, infinite plane.
SPATIAL SAMPLING 195
0.9
0.8
0.7
0.6
y 0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
(a)
1
0.9
0.8
0.7
0.6
y 0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
(b)
Figure 10.4 The kriging variance of a systematic random pattern (right gure) reduces the
value of Equation 17 by 20% from a random pattern. Sample patterns are similar to those in
Figure 10.1.
196 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
level of confidence is reached with a limited spacing of terrain elevation data points to
number of samples. produce a Digital Elevation Model (DEM).
The importance of evaluating the adequate
number of data points as well as the appropri-
ate sampling distribution of such points, that
10.4. SECOND-PHASE AND in turn constitute a good match to character-
ADAPTIVE SAMPLING ize a given terrain. Determining a sufficient
number of points is not straightforward, since
When there is a need or desire to gather it depends on terrain roughness in relation to
more information (i.e., additional samples) the size of the area occupied by the terrain.
about the variable of interest, we talk The ideas suggested in progressive sampling
about adaptive and second-phase sampling, were later carried over to the field of adaptive
depending on the study objective. In the sampling (see Thompson and Seber, 1996).
following subsections, both techniques are A major difference with conventional designs
discussed. lies in the selection of additional samples
in adaptive designs, because the location
of a new sample will depend upon the
value of the points observed in the field.
10.4.1. Adaptive sampling
In other words, the procedure for selecting
Adaptive sampling finds its roots in the additional samples depends on the outcome
concept of progressive sampling (Makarovic, of the variable of interest, as observed during
1973). It provides an objective and automatic the survey of an initial sampling phase.
method for sampling, for example, terrain of The addition of a new sample improves
varying complexity when sampling altitude confidence in the sampling distribution.
variation. As illustrated in Figure 10.5, Adaptive sampling is very efficient in the
progressive sampling involves a series of context of soil contamination (Cox, 1999).
successive runs, beginning with a coarse How should a risk manager decide where to
sampling grid and then proceeding to grids of re-sample in order to maximize information
higher densities. The grid density is doubled on contamination? In this particular context
on each successive sampling run and the it is generally recommended to sample in
points to be sampled are determined by a locations above a particular threshold and
computer analysis of the data obtained on draw a fixed number of additional samples
the preceding run. The analysis proceeds around them until subsequent measurement
as follows: a square patch of nine points values are below a pre-specified contami-
on the coarsest grid is selected and the nation threshold. Figure 10.6 illustrates the
height differences between each adjacent procedure for adaptive cluster sampling,
pair of points along the rows and columns where sample points represent measurement
are computed. The second differences are locations of hypothetical contamination rates.
then calculated. The latter carries information On the left, contamination rates have been
on the terrain curvature. If the estimated measured at seven locations. A geographic
curvature exceeds a certain threshold, it location is said to be at risk (and needs
becomes necessary on the next run to increase remediation) when its value is above 0.7
the sampling density and sample points at the or at 70% of the contamination threshold.
next level of grid density. Call a property fathomed if samples have
A similar study was carried out by Ayeni been taken from its immediate neighbors.
(1982) to determine the optimum number and A common choice is to define new neighbors
198 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
15
25
30
20
30
25
10
20
25
15
30
20
30
25
(a)
(b)
Figure 10.5 Initial systematic sampling of altitude is performed over the study in the top
gure. When strong variation in elevation is encountered, the sampling density is increased
until desirable threshold is met.
of a contaminated zone to the North, South, because there is little rationale in taking
East, and West: fathom each property on additional samples in areas where we know
the list by sampling and remove it from that the probability of exceeding a particular
the risk list when it has been fathomed. threshold is maximal.
In other words, the procedure re-samples
four neighboring locations of a contaminated
site. Once a site shows a contamination rate
10.4.2. Second-phase sampling
under the threshold value, it is fathomed.
Otherwise, the procedure continues until a In second-phase spatial sampling, a set
trigger condition is satisfied (e.g., a maximum M of m initial measurements has been
number of additional samples is reached). collected, and a covariogram C(h) has been
This approach has some limitations however, calculated. In the second-phase, the scientist
SPATIAL SAMPLING 199
.4
.5
.7
.4 .8
.4
(a)
.4
. 65
. 78
.5
. 74 . 72 . 7
. 78 . 59
. 76 . 71
. 75
. 71 . 89
. 73
.4
.8
. 69
. 72 . 89
. 71
.4
(b)
Figure 10.6 The cluster adaptive sampling procedure, illustrated in the context of toxic
waste remediation. A site is fathomed (+) when its toxicity rate does not exceed the
contamination value.
augments the set of observations, guided kriging variance of the augmented set M N
by the covariogram. The objective function containing [m + n] samples:
aims to collect new samples to reduce the
kriging variance or uncertainty by as much
as possible. Equation (10.18) formulates the
change in kriging variance k2 over all k2 = TKV old TKV new
grid points sg , when a set N of size n
containing new sample points is added to 1
our initial sample set M. The change k2 is = k,old2 (sg ) k,new
2
(sg )
G
the difference between the kriging variance gG gG
1 T
k,old
2
(sg ) = 2 c(sg ) C c (sg ) (10.19) containing m+n points that will maximize the
!"#$ !"#$ ! "# $
[1,m] [m] [m,1]
change in weighted kriging variance. From
equation (10.21):
k,new
2
C1 cT (sg ) .
(sg ) = 2 c(sg ) !"#$
!"#$ ! "# $
[1,m+n] [m+n] [m+n,1] 1
(10.20) MAX
! "# $ J(S) = w(sg )k2 (sg ; S).
G
{sm+1 ,...,sm+n } gG
(10.22)
The objective function (equation (10.21)) is
to find the optimal set S containing m + n
In an effort to detect contaminated zones in
points that will maximize this change in
the Rotterdam harbor, Van Groenigen et al.,
kriging variance (Christakos and Olea, 1992;
(2000) introduced the Weighted Means of
Van Groenigen et al., 1999), where S is a
Shortest Distance (WMSD) criterion, offer-
specific sampling scheme:
ing a flexible way of using prior knowledge
on the variable under study. However, the
1 weights do not reflect the spatial structure
MAX
! "# $ J(S) = k2 (sg ; S ).
G of the variable, but rather the scientists
{sm+1 , ...,sm+n } gG
perception of the risks of contamination. In
(10.21) the first sampling phase, sampling weights
are assigned to sub-areas based on their
For simplicity, the continuous region D is risks for contamination. In the second phase
usually approximated by a finite set P of however, a greater weight is assigned to
p points (Cressie, 1991). The set of new locations expected to exhibit a higher priority
points is selected from the set of potential for remediation. Four weighting factors are
considered with weights w = 1, 1.5, 2, and
p
points P. Hence, there is a total of 3, leading to more intensive sampling where
n
possible sampling combinations and it is too the weight is higher. In a more recent study,
time-consuming to find the optimal set using Rogerson et al., (2004) have developed a
combinatorics. Figure 10.7 illustrates the case second-phase sampling technique, allowing
where 50 sample points have been collected re-sampling in areas where there is some
in the first stage, leading to an exponen- uncertainty associated with a variable of
tial covariogram, with the sequential addition interest, and hence not in areas where
of n = 10 new points and an improvement in the probability of an event occurring is
the objective function of nearly 20%. near 0 or 1. A greedy algorithm was proposed
to locate the points that would maximize the
change in weighted kriging variance.
Weighting the kriging variance?
The use of a weighting function w () for
the kriging variance was originally suggested Shortcomings of the use of the
by Cressie (1991) and has been applied kriging variance
by Van Groenigen et al., (2000), Rogerson Many authors have advocated the use of the
et al., (2004), and Delmelle (2005). The kriging variance as a measure of uncertainty.
importance of a location to be sampled is It is unfortunately misused as a measure
represented by a weight w(s). The objective of reliability of the kriging estimate, as
is to find the optimal sampling scheme S noted by several authors (Deutsch and
SPATIAL SAMPLING 201
4.72
4.719
4.718
4.717
y
4.716
4.715
4.714
4.713
4.712
6.7 6.71 6.72 6.73 6.74 6.75 6.76 6.77 6.78 6.79
x x 105
(a)
Change in Kriging variance
0.18
0.16
0.14
0.12
Improvement
0.1
0.08
0.06
0.04
0.02
0
0 1 2 3 4 5 6 7 8 9 10
Additional points
(b)
Figure 10.7 An initial sampling network of m = 50 points (in white) has been augmented
with the addition of n = 10 new samples (in blue). The gure to the right displays the
improvement.
202 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
11 2
? ?
8 9 1 0
12 37
(a) (b)
Figure 10.8 Example of two-dimensional non-stationarity. Dark points are used as data
values to interpolate the center point (light gray). After Armstrong (1994).
Journel, 1997; Armstrong, 1994). It is solely left since there is less variation among the
a function of the sample pattern, sample neighbors. This illustrates that the prediction
density, the numbers of samples and their error is not suitable for setting up confidence
covariance structure. The kriging variance intervals and should not be used as an
assumes that the errors are independent of optimization criterion for additional sampling
each other. This means that the process is strategies.
stationary, an assumption usually violated
in practice. Stationarity entails that the
variation of the primary variable between
two points remains similar at different
locations in space, as long their separation 10.5. CURRENT RESEARCH
distance remains unchanged. Figure 10.8 DIRECTIONS
illustrates non-stationarity in two dimensions
(Armstrong, 1994). The objective in this
10.5.1. Incorporating multivariate
particular example is to interpolate the value
information
of the inner grid point, highlighted with a Sample data can be very difficult to
question mark. The interpolation depends on collect, and very expensive, especially in
the values of the four surrounding points. monitoring air or soil pollution for instance
Two scenarios are presented. The scenario (Haining, 2003). Secondary data can be a
in b shows three very similar values and valuable asset if they are available over
an extreme one. The scenario in a however an entire study area and combined within
shows four values in a very narrow range. the primary variable (Hengl et al., 2003).
Assuming the spatial structure is similar in Secondary spatial data sources include maps,
both cases, and since the configuration of socioeconomic, and demographic census
the data points is the same, the kriging data, but also data generated by public
variances are identical. However, we have sources (local and regional). This is very
more confidence in the scenario on the valuable and there has been a dramatic
SPATIAL SAMPLING 203
1 Note that a systematic sampling scheme is a Cochran, W.G. (1963). Sampling Techniques. Second
special case of a stratied design in that the strata Edition. New York: Wiley. 413p.
are all squares of equal size. Corsten, L.C.A. and Stein, A. (1994). Nested sampling
for estimating spatial semivariograms compared to
other designs. Applied Stochastic Models and Data
Analysis, 10: 103122.
REFERENCES Cox, L.A. (1999). Adaptive spatial sampling of
contaminated soil. Risk Analysis, 19: 10591069.
Arbia, G. (1994). Selection techniques in sampling Cressie, N. (1991). Statistics for Spatial Data. New York:
spatial units. Quaderni di Statistica e Matematica Wiley. 900p.
Applicata Alle Scienze Economico-Sociali,
16: 8191. Dalenius, T., Hajek, J. and Zubrzycki, S. (1960). On
plane sampling and related geometrical problems.
Armstrong, M. (1994). Is research in mining geostats Proceedings of the Fourth Berkeley Symposium,
as dead as a dodo? In: Dimitrakopoulos R. (ed.). 1: 125150.
Geostatistics for the Next Century, pp. 303312.
Dordrecht: Kluwer Academic Publisher. Dalton, R., Garlick, J., Minshull, R. and Robinson, A.
(1975). Sampling Techniques in Geography. London:
Aubry, P. (2000). Le Traitement des Variables Rgion- Georges Philip and Son Limited. 95p.
alises en Ecologie: Apports de la Gomatique et
de la Gostatistique}. Thse de doctorat. Universit Delmelle, E.M. (2005). Optimization of Second-Phase
Claude Bernard Lyon 1. Spatial Sampling Using Auxiliary Information. Ph.D.
Dissertation, Department of Geography, SUNY at
Aspie, D. and Barnes, R.J. (1990). Inll-sampling design Buffalo.
and the cost of classication errors. Mathematical
Deutsch, C.V. and Journel, A.G. (1997) Gslib:
Geology, 22: 915932.
Geostatistical Software Library and Users
Ayeni, O. (1982). Optimum sampling for digital Guide. 2nd edition, 369p. Oxford University
terrain models: A trend towards automation. Press.
Photogrammetric Engineering and Remote Sensing,
Ferreya, R.A., Apeztegua, H.P., Sereno, R. and
48: 16871694.
Jones, J.W. (2002). Reduction of soil water sampling
Bailey, T.C. and Gatrell, A.C. (1995). Interactive Spatial density using scaled semivariograms and simulated
Data Analysis. Longman. 413p. annealing. Geoderma, 110: 265289.
SPATIAL SAMPLING 205
Ferri, M. and Piccioni, M. (1992). Optimal selection of Makarovic, B. (1973). Progressive sampling for digital
statistical units. Computational Statistics and Data terrain models. ITC Journal, 15: 397416.
Analysis, 13: 4761.
Matrn, B. (1960). Spatial variation. Berlin: Springer-
Goovaerts, P. (1997). Geostatistics for Natural Verlag. 151p.
Resources Evaluation. Oxford University Press. 483p.
McBratney, A.B. and Webster, R. (1981). The design of
Gatrell, A.C. (1979). Autocorrelation in spaces. optimal sampling schemes for local estimation and
Environmental and Planning A, 11: 507516. mapping of regionalized variables: II. Program and
examples. Computers and Geosciences, 7: 331334.
Goldberg, D.E. (1989). Genetic Algorithms in Search,
Optimization, and Machine Learning. Reading, MA: McBratney, A.B., Webster, R. and Burgess, T.M. (1981).
Addison-Wesley. The design of optimal sampling schemes for local
estimation and mapping of regionalized variables:
Grifth, D. (1987). Spatial Autocorrelation: A Primer.
I. Theory and method. Computers and Geosciences,
Washington, DC: AAG.
7: 335365.
Grifth, D. and Amrhein, C. (1997). Multivariate
McBratney, A.B. and Webster, R. (1986). Choosing
Statistical Analysis for Geographers. New Jersey:
functions for semi-variograms of soil properties and
Prentice Hall. 345p.
tting them to sampling estimates. Journal of Soil
Grtschel, M. and Lovsz, L. (1995). Combinatorial Science, 37: 617639.
optimization. In: Graham R.L., Grtschel and
Michalewicz, Z. and Fogel, D. (2000). How to Solve It:
Lovsz (eds), Handbook of Combinatorics, Vol. 2;
Modern Heuristics. Berlin: Springer. 467p.
pp. 15411579. Amsterdam, The Netherlands:
Elsevier. Moran, P.A.P. (1948). The interpretation of statistical
maps. Journal of the Royal Statistical Society, Series
Haining, R.P. (2003). Spatial Data Analysis: Theory and
B, 10: 245251.
Practice. Cambridge University Press. 452p.
Moran, P.A.P. (1950). Notes on continuous phenom-
Hedayat, A.S. and Sinha, B.K. (1991). Design and
ena. Biometrika, 37: 1723.
Inference in Finite Population Sampling. New York:
Wiley. 377p. Muller, W. (1998). Collecting Spatial Data: Opti-
mal Design of Experiments for Random Fields.
Hengl, T., Rossiter, D.G. and Stein, A. (2003).
Heidelberg: Physica-Verlag.
Soil sampling strategies for spatial prediction by
correlation with auxiliary maps. Australian Journal Olea, R.A. (1984). Sampling design optimization
of Soil Research, 41: 14031422. for spatial functions. Mathematical Geology, 16:
369392.
Iachan, R. (1985). Plane sampling. Statistics and
Probability Letters, 3: 151159. Oliver, M.A. and Webster, R. (1986). Combining nested
and linear sampling for determining the scale and
Isaaks, E.H. and Srivastava, R.M. (1989). An Intro-
form of spatial variation of regionalized variables.
duction to Applied Geostatistics. New York: Oxford
Geographical Analysis, 18: 227242.
University Press. 561p.
Overton, W.S. and Stehman, S.V. (1993). Properties
King, L.J. (1969). Statistical Analysis in Geography.
of designs for sampling continuous spatial resources
288p.
from a triangular grid. Communications in Statistics
Lajaunie, C., Wackernagel, H., Thiry, L. and Theory and Methods, 21: 26412660.
Grzebyk, M. (1999). Sampling multiphase noise
Pardo-Igzquiza, E. (1998). Optimal selection of
exposure time series. In: Soares A., Gomez-
number and location of rainfall gauges for areal
Hernandez J. and R. Froidevaux (eds), GeoENV II
rainfall estimation using geostatistics and simulated
Geostatistics for Environmental Applications;
annealing. Journal of Hydrology, 210: 206220.
pp. 101112. Dordrecht: Kluwer Academic
Publishers. Pettitt, A.N. and McBratney, A.B. (1993). Sampling
designs for estimating spatial variance components.
Madow, W.G. (1953). On the theory of systematic
Applied Statistics, 42: 185209.
sampling. III. Comparison of centered and random
start systematic sampling. Annals of Mathematical Quenouille, M.H. (1949). Problems in plane sampling.
Statistics, 24: 101106. Annals of Mathematical Statistics, 20: 355375.
206 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
Rogerson, P.A., Delmelle, E.M., Batta, R., Akella, M.R., simulated annealing. Journal of Environmental
Blatt, A. and Wilson, G. (2004). Optimal sampling Quality, 27: 10781086.
design for variables with varying spatial importance.
Van Groenigen, J.W., Siderius, W. and Stein, A.
Geographical Analysis, 36: 177194.
(1999). Constrained optimisation of soil sampling for
Ripley, B.D. (1981). Spatial statistics. New York: minimization of the kriging variance. Geoderma, 87:
Wiley. 252p. 239259.
Stehman, S.V. and Overton, S.W. (1996). Spatial Van Groenigen, J.W., Pieters, G. and Stein, A.
sampling. In: Arlinghaus, S. (ed.) Practical Handbook (2000). Optimizing spatial sampling for multivariate
of Spatial Statistics; pp. 3164. Boca Raton, FL: CRC contamination in urban areas. Environmetrics, 11:
Press. 227244.
Thompson, S.K. and Seber, G.A.F. (1996). Adaptive Warrick, A.W. and Myers, D.E. (1987). Optimization of
Sampling. New York: Wiley. 288p. sampling locations for variogram calculations. Water
Resources Research, 23: 496500.
Van Groenigen, J.W. (1997). Spatial simulated
annealing for optimizing sampling different Webster, R. and Oliver, M.A. (1993). How large a
optimization criteria compared. In: Soares, A., sample is needed to estimate the regional variogram
Gmez-Hernndez, J. and Froidevaux, R. (eds). adequately. In: Soares, A. (ed.), Geostatistics Tria
GeoENV I Geostatistics for Environmental 92; pp. 155166. Dordrecht: Kluwer Academic
Applications. Dordrecht: Kluwer Academic Publishers.
Publishers.
Whittle, P. (1963). Stochastic processes in several
Van Groenigen, J.W. (2000). The inuence of variogram dimensions. Bulletin of the International Statistical
parameters on optimal sampling schemes for Institute, 40: 974994.
mapping by kriging. Geoderma, 97: 223236.
Zubrzycki, S. (1958). Remarks on random, stratied
Van Groenigen, J.W. and Stein, A. (1998). Constrained and systematic sampling in a plane. Colloquium
optimization of spatial sampling using continuous Mathematicum, 6: 251264.
11
Statistical Inference for
Geographical Processes
Chris Brunsdon
It is often necessary to make informed some form of statistical test appears, and
statements about something that cannot be in the wide range of software packages
observed or verified directly. It is equally use- (spreadsheets, statistical packages and others)
ful to assess how reliable these statements are in which code for carrying out such
likely to be. A great deal of research is based techniques appears.
on the collection of data, both qualitative and However, despite this clear recognition
quantitative in order to make such statements. of the importance of statistical inference,
For this reason, inference in science is a many commercial GIS packages claiming
fundamental topic, and the development of to offer spatial analysis facilities have no
theories of statistical inference should be procedures for this. The reasons for this are
seen as a cornerstone of any field of study complex, but one thing to note is that it
claiming to be based on scientific method. was the chi-squared test, and not statistical
Indeed, the American Association for the inference in general that was cited by the
Advancement of Science (AAAS) listed the AAAS as a key development. Chi-squared
development of the chi-squared test as one of tests are relatively simple computationally,
the twenty key scientific developments of the and make a number of assumptions about
twentieth century.1 the simplicity of the underlying processes
In general, the success of the statistical about which inferences are to be made. In
hypothesis testing methodology is reflected particular, they assume that each observation
in the vast number of publications in which is probabilistically independent, and drawn
208 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
from the same distribution. For spatial data The process model. This is a model, with a
this is unlikely to be the case recall number of unknown parameters, describing the
Toblers law stating that nearby things are process that generated the observations. This
likely to be more related than distant things. will take a mathematical form, describing the
In addition, the distributions of observations probability distribution of the observations. The
mathematical model can be very specic, so
may well depend on their geographical
that only a small number of parameters are
location. This violates the drawn from
unknown or quite broad so that for example a
the same distribution assumption. Thus, mathematical function of the general form f (x , y )
although tools of inference are just as is not known.
important for geographical data as for any
other kind of data, there are potential The inferential task. The task that the analyst
problems when borrowing standard statis- wishes to perform having obtained his or her
tical methods and applying them to spatial data. Typical tasks will be testing whether
phenomena. The aim of this chapter is a hypothesis about a given model is true,
to consider some fundamental ideas about estimating the value of a parameter in a given
inference, and then to discuss some of model, or deciding which model out of a set of
candidates is the most appropriate.
the difficulties of applying these ideas on
to spatial processes and hopefully offer The computational approach. Having chosen
a few constructive suggestions. It is also a process model, the inferential framework
important to note that although for some should determine what mathematical procedure
areas a degree of consensus has been reached, is necessary to carry out the inferential task. In
the subject of statistical inference is not many cases, the procedure is the relatively simple
without its controversies see Fotheringham application of a simple formula (for example a
and Brunsdon (2004) for example, and in chi-squared test). However, sometimes it is not.
particular there are unresolved issues in In such cases alternative strategies are needed.
inference applied to geographical data. Sometimes they involve numerical solution of
equations or optimizations. In other cases Monte
Carlo simulation-based approaches are used,
where characteristics of statistical distributions
are determined by simulating variables drawn
from those distributions. The strategy used to
11.1. BASIC CONCEPTS OF carry out the task is what will be termed the
STATISTICAL INFERENCE computational approach here.
intervals. When they do so, they are making in general, the particular nature of statis-
use of two key ideas from classical inference2 tical inference when spatial processes are
which may be applied to geographical and considered and the way in which these two
non-geographical problems alike. are related. This provides a broad frame-
The most geographically specific of the work for the chapter. First, a (very) brief
concepts is the process model. As stated overview of the key statistical inferential
earlier, many inferential tests are based on the frameworks will be outlined. Next, spatial
assumption that observations are independent process models and related inferential tasks
of one another in many geographi- will be considered, together with a discussion
cal processes (such as those influencing of how the inferential approaches may be
house prices) this is clearly not the case. applied in this context. Finally, a set of
In some cases, the geographical model is a suggested computational approaches will be
generalization of a simpler aspatial model considered.
perhaps the situation where geography plays
no role is a special case where some
parameter equals zero. In these situations,
one highly intuitive inferential task is to
11.2. AN OVERVIEW OF FORMAL
determine whether this parameter does equal
INFERENTIAL FRAMEWORKS
zero. In other cases, the task is to estimate
the parameters (and find confidence intervals)
The two most commonly encountered
that appear in both spatial and aspatial
inferential frameworks are Classical and
cases of the models (for example regression
Bayesian. Suppose we assume a model M
coefficients). In these cases, the spatial part
with some unobserved parameters , and
of the model is essentially a nuisance, making
some data x. Two kinds of tasks commonly
the inferential task related to another aspect
encountered are:
of the model more difficult.
The previous examples are relatively
simple from a geographical viewpoint, but 1 Given M and x , to infer whether some statement
more sophisticated geographical inferential about is likely to be true.
tasks can be undertaken. In particular, the
tasks above are related to what Openshaw 2 Given M and x , to estimate the value of or
(1984) terms whole-map statistics. That is, some function of , f ().
they consider single parameters (or sets of
parameters) that define the nature of spatial
Although both methods can address both
interaction at all locations, but supply no
types of question, they do so in quite
information about any specific locations. To
different ways.
the geographer, or GIS user, it is often more
important to identify which locations are in
some way different or anomalous. Arguably,
this is a uniquely geographical inferential
11.2.1. Classical inference
task. Although this inferential task can be
approached with standard inferential frame- The classical framework is most commonly
works, some careful thought is required. used, and will be defined first. The classical
Thus, to address the issue of statistical framework generally addresses two kinds of
inference for geographical data one must inferential tasks. The first task is dealt with
consider the nature of statistical inference using the significance test.
210 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
Hypothesis testing where s12 and s22 are the respective sample
The statement about mentioned above variances for the two samples. A significance
is termed the null hypothesis. Next a test test is often performed by looking up the
statistic is defined. Of interest here is the critical value of t from a set of tables,
distribution of the test statistic if the null computing the observed t for the two
hypothesis is true. The significance (or samples and comparing this to the critical
p-value) of the test statistic is the probability value. In this case, the t statistic has
of obtaining a value at least as extreme = n1 + n2 2 degrees of freedom.
as the observed value of the test statistic The above outlines the procedure of a
if the null hypothesis is true. When the significance test, one of the two inferential
significance is very low, this suggests that tasks performed using classical inference.
the null hypothesis is unlikely to be true. Of course, such inference is probabilistic
To perform an % significance test one one cannot be certain if we reject the
calculates the value of the test statistic with null hypothesis that it really is untrue.
a significance of this is called the However, we do know what the probability
critical value. Typical values of are 0.05 of incorrectly rejecting the null hypothesis is.
and 0.01. If the observed value is more This kind of error is referred to as the type I
extreme than the critical value, then the null error. Another form of error results when we
hypothesis is rejected. Note that adopting incorrectly accept the null hypothesis this
the above procedure has a probability of is called a type II error. It is generally harder
of rejecting the null hypothesis when it is to compute the probability of committing a
actually true. type II error usually denoted as 1 .
This may seem rather abstract without The relationship between and is given
an example. One commonly used technique in Table 11.1.
based on these principles is the two-sample For the two-sample t-test, the null
t-test. Here = (1 , 2 ) where 1 and 2 hypothesis is 1 = 2 , and the alternatives to
are means of two normally distributed this take the form 1 = 2 , or equivalently
samples having the same variance 2 . The 1 2 = k for k = 0, will depend
null hypothesis here is that 1 = 2 . on the value of k. In general, if k is
Here the test statistic is the well-known large then there is a stronger chance of
t-statistic: obtaining a significant t value, and so a
smaller chance of incorrectly failing to reject
the null hypothesis. also depends on the
x1 x2
t=% (11.1) values of n1 and n2 the sizes of the two
1 1 samples. The larger these quantities are, the
s 2 +
n1 n2 smaller the probability of incorrectly failing
to reject the null hypothesis. Given any
could write the data as one long list, with distribution generating the data. A price
all of the observations in S1 followed by paid for this is that the computational
those in S2 : overhead is much higher and typically
nonparametric tests are not as powerful as
the simpler parametric equivalents, provided
x1 , x2 , . . ., xn1 , xn1+1 , . . ., xn1+n2 . (11.3) the assumptions underlying the parametric
tests hold. A final point is that there is
a subtle difference between randomization
In this case the ordering of the observations tests and standard classical tests, in that they
is of some consequence in the sense that are conditional on the exact set of observed x
an observation with an index greater than values, i.e., the null hypothesis only considers
n1 must have come from S2 . Now suppose the same values of xi in different orders
we wish to test the hypothesis that both sets unlike, for example, a t-test which considers
of observations come from distributions with a sampling frame that could generate any real
the same mean. Consider the quantity: values of xi .
1 1 1 2
i=n i=n
d= xi xi . (11.4) 11.2.3. Simple classical inference
n1 n2
i=1 i=n1 +1 in action
To illustrate some of the above ideas a simple
Suppose a null hypothesis that S1 and example is given. Here, the data consists
S2 come from the same distribution. Then, of a number of sale prices of houses from
there is no difference between the processes two adjacent districts in the greater London
generating the observations in {x1 , . . . , xn1 } area in 1991. The location of the districts in
and {xn1+1 , . . . , xn1+n2 } so that in fact any the context of greater London as a whole is
ordering of {x1 , . . . , xn2 } is equally likely. shown in Figure 11.1, as are the locations
Then, regardless of the distributions of S1 of the houses in the sample. There are
and S2 we would expect sample mean of d 220 houses in district 1 and 249 in district 2
to be zero. We could use this quantity as a (the district to the west).
test statistic, although we do not know its If we assume that house prices in both
distribution. However, if the null hypothesis districts have independent normal distribu-
were true, we may make use of Monte Carlo tions with equal variances, we may test the
methods. We simply randomly permute the hypothesis that the mean house price is the
ordering of the data set a large number same in each district. This null hypothesis,
of times, and obtain a corresponding set together with the assumptions set out above,
of values of d. We then compare the observed lead to the use of t-test as set out in
value of d against this set, to obtain a equation (11.1). The values of the relevant
value of as before. This in essence is quantities are set out in Table 11.2.
the randomization test. Here, it was shown Since we are interested in detecting
in the context of a test of difference of differences in the mean value of either
means, although it may be used to test any sign, we use the absolute value of t which
kind of statistic dependent on the ordering is 2.37. However, from tables, the critical
of the observations. The advantage of this value of t for (two-tailed) = 0.05 is
approach is that it allows tests to be made 1.96 suggesting we should reject the null
when one has no strong evidence of the hypothesis at the 5% level. Thus, with a
STATISTICAL INFERENCE FOR GEOGRAPHICAL PROCESSES 213
Figure 11.1 The location of study area (LHS) and the houses in the samples (RHS).
Table 11.2 Two sample t -test data, we can consider the joint probability
District 1 District 2 density of the two items given model M, say
n1 220 n2 249 f (x, | M). Standard probability theory tells
x1 77.7 x2 86.4 us that:
s1 37.3 s2 41.5
s2 39.6
467 f (x, | M) = f (x | , M) f ( | M)
t 2.37
= f ( | x, M) f (x | M) (11.5)
probability of observing x given theta (that sequence of posteriors derived from well-
is, f (x | ), gives an expression proportional defined priors for example if a sequence
to f ( | x). Note that in this framework, of priors with variances increasing without
f (x | ) is our process model, as set out bound were supplied). A prior such as this is
in section 11.1. We can interpret this last termed an improper prior.
expression as the knowledge the analyst has Having arrived at a posterior distribution
about given the observational data x. Thus, f (x | ) we may begin to address the two key
we have updated knowledge about in the inferential questions:
light of the observations x this is essentially
the inferential step.
(1) Estimate the value of or some function of
In standard Bayesian terminology f ( ) is , f(). Since we have a posterior distribution
referred to as the prior or prior distribution for we can obtain point estimates of using
for and f (x | ) is referred to as the estimates of location for the distribution
posterior or posterior distribution for . such as the mean or median. Alternatively,
Thus, starting out with a prior belief in the we can obtain interval estimates such as
value of , the analyst obtains observational the inter-quartile range derived from this
data x and modifies his or her belief in the distribution. Typically, one would compute
light of these data to obtain the posterior an interval [1 , 2 ] between which has a
distribution. The approach has a number of 0.95 probability of lying. Note that this is
elegant properties for example, if individual subtly different from the condence interval of
classical inference. The 95% in a condence
data items are uncorrelated and if data
interval refers to the probability that the
is collected sequentially, one can use the
randomly sampled data provides a number
posterior obtained from an earlier subset of pair that contains the unobserved, but non-
the data as a prior to be input to a later set of random . Here we treat as a random
data. However, the approach does require a variable distributed according to the posterior
major change in world view. The requirement distribution obtained from equation (11.7). To
of a prior distribution for from an analyst emphasize that these Bayesian intervals differ
could be regarded as removing objectivity from condence intervals, they are referred to
from the study. Where does the knowledge as credibility intervals.
to derive this prior come from?
(2) Infer whether some statement about
One way of overcoming this is the use
is likely to be true. If our statement is
of non-informative priors which represent
of the form a < < b where either a or
no knowledge of the value of prior to
b are innite, then this may be answered
analysis. For example, if were a parameter by computing areas underneath the posterior
between 0 and 1, then f ( ) = 1 a uniform density function. For example, to answer the
distribution would be a non-informative question is positive? one computes:
prior since no value of has a greater prior
probability density than any other. Some-
times this leads to problems for example
if is variable taking any real value. In this f (|x ) d
case, f ( ) = const. is not a well-defined 0
probability density function. However, this
shortcoming is usually ignored provided the and obtains the probability that the statement
posterior probability thus created is valid is true. However, questions of the form
(typically the posterior in this case could be addressed by classical inference such as
regarded as a limiting value of an infinite is zero? where typically one is concerned
STATISTICAL INFERENCE FOR GEOGRAPHICAL PROCESSES 215
with exact values of present more difculties. essentially stems from the fact that this is a
Since the output is a probability density, the scale parameter, rather than one of location
probability attached to any point value is zero. see Lee (1997) for example. In this case, it
There are a number of workarounds to this. can be shown that the posterior distribution
One quite sensible approach is to decide how for the quantity = 1 2 is that of the
far from zero could be for the difference to
expression:
be unimportant, and term this . If this is done,
we may then test the statement < <
using the above approach. Other approaches do 1/2
attempt to tackle the exact value test directly 1 1
(x1 x2 ) s t (11.8)
see Lee (1997) for further discussion. n1 n2
0.10
0.08
Posterior Density
0.06
0.04
0.02
0.00
20 10 0 10
(1000s Pounds)
can lead to mis-interpretation. This example probability distributions, but are in fact
is a cautionary tale about the consequences correlated. In the geographical context, the
of ignoring spatial effects in an inferential correlation is generally related to proximity
framework. nearby x values are more correlated than
So ignoring geography can lead to infer- values located far apart. Typical examples
ential problems. How can this difficulty be are the SAR (spatial autoregression) and
overcome? In particular this raises another CAR (conditional autoregression) models.
key question Does one need to modify the Unlike GWR, these regression models do
above ideas of inference when working with not assume that the regression parame-
spatial data? To answer this, we return to the ters vary over space however they do
four aspects of statistical inference listed in assume that the dependent variables are
section 11.1 once again: both Bayesian and correlated. Typically here, each record of
classical inferential frameworks can handle variables is associated with a spatial unit,
the key inferential tasks of hypothesis eval- such as a census tract, and the spatial
uation and parameter estimation for spatial dependence occurs between adjacent spatial
processes. However, for spatial data the units. As well as the regression coefficients
process model must allow for geographical and the variance of the error term, CAR
effects. Finally, it is also the case that the and SAR models have an extra parameter
computational approach must also be altered controlling the degree to which adjacent
on some occasions. These two key issues will dependent variables are related. In the
be considered in turn. classical inference case, parameter estimation
is typically based on maximum likelihood,
with the parameter vector containing the
extra parameter described above as well
11.3.1. Process models for
as the usual regression parameters. There
spatial data
is much work on the classical inferential
The process models for spatial data can treatment of such models: see, for example,
differ from more commonly used ones in a Cressie (1991). LeSage (1997) offers a
number of ways. The two most common ones Bayesian perspective.
are that they exhibit spatial non-stationarity
and spatial autocorrelation. Spatial non-
stationarity is essentially the characteristic
11.3.2. The computational
of the LLTI example above. The unknown
approach
parameter is not a constant, but in fact
a function of spatial location. In this case, Computational issues for geographical data
a technique like GWR may be used to are generally complex. The whole field
estimate at a set of given localities. of geocomputation has grown to address
Using this approach, one can apply the this. As well as problems of data storage,
classical inferential framework to obtain data retrieval and data mining, there are
estimates of , and test hypotheses such many computational overheads attributable
as is a global fixed value. A classical to inference in spatial data, for a number
inferential framework for GWR is detailed of reasons. In some cases, the issue is
in Fotheringham et al. (2002). related to Monte Carlo or randomization
The phenomenon of spatial autocorrelation methods this is particularly true of the
occurs when each of the observed x values Monte Carlo Markov Chain approach to
are not drawn from statistically independent Bayesian analysis. In others, it is linked
STATISTICAL INFERENCE FOR GEOGRAPHICAL PROCESSES 219
be applied to making inferences about large Note that on some occasions, we may have
populations. the entire population represented in our
For smaller populations, the discrete nature data, but even so it may be of interest to
of the sampling frame may suggest that understand the process(es) that brought about
such continuous approximations are not that data. For example, we look at daily
valid. Here, we are faced with two choices. records of rainfall from a one month period
First, if we assume that this population of the previous century. In this case, the list
really is the item of interest, and it is not of rainfall measurements is our population,
particularly large, then one approach might but the process of generating these can be
be to collect observations for the entire modelled as a random process and we
population. In this case, the conventional may wish to test hypotheses about whether
framework for statistical hypothesis testing average levels are similar to those in the
becomes meaningless to test a hypothesis present day. In this case, we wish to test
relating to the population simply look at the the (process-based) hypothesis that the mean
data and see if it is true or not! daily rainfall is equal to some given level. It is
A second alternative is to assume that the authors opinion that in most cases when
the population itself is of less interest than an entire population may be measured, it is
the process generating it. In this case, we the underlying process and not the values of
return to the process hypothesis framework. the population itself that is of most interest.
under 19.02
19.02 29.33
29.33 39.03
39.03 53.16
over 53.16
Figure 11.3 Crime rate distribution: vehicle thefts and residential burglaries per
1000 households (1980).
222 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
11.4.3. Software
11.4.2. Other types of inference
No chapter about inference would be com-
Although classical and Bayesian methods are
plete without some discussion of software.
both covered in this chapter, these are not
Having argued that making inferences about
the only possible approaches. For example
data is central to knowledge discovery in
Burnhan and Anderson (1998) outline ways
spatial analysis, one has every right to expect
in which Akaikes An Information Criterion
that software for inferential procedures will
(AIC; Akaike 1973) may be used to compare
be readily available. However, as mentioned
models. This approach is quite different
in the introduction, most readily available
in terms of its inferential task rather
GIS packages do not currently contain code
than testing whether a statement about a
for many of the procedures outlined here.
particular model is true or assuming a
Unfortunately, although several commercial
specific model holds and then attempting
statistics packages do contain code for
to estimate a parameter of that model, this
carrying out general inferential procedures,
approach takes several models and attempts
such as the t-test example discussed earlier
to identify which one is best in the sense
in the chapter, they offer less support for
that it best approximates reality. The AIC
more specific inferential tasks developed for
is an attempt to measure the nearness of
spatial data. Until recently, for a number
the model to reality obviously the true
of spatial inferential tasks one was forced
model is not known, but the observations
to write ones own code. However this
have arisen from that model, and this is
situation is now improving. A number
where the clues about the true model come
of packages that are either dedicated to
from. This is very different from the other
the analysis of spatial data or sufficiently
approaches because it regards all potential
flexible that they may be extended to
models as compromises none is assumed to
provide spatial data analysis now exist.
be perfect and attempts to identify the best
Although by no means the only option, the
compromise. This area may prove fruitful in
statistical programming language R provides
the future for example Fotheringham et al.
good spatial analysis options all of the
(2002) use a method based on this idea to
examples (most notably the spatial one)
calibrate GWR models. The idea of finding
in this chapter were based on calculations
a best approximation also sits comfortably
done in R. There are a number of spatial
with the idea of approximating a large finite
data analysis libraries written in R, enabling
sample with a continuous distribution put
this kind of geostatistical computation. For
forward in the previous section.
example:
Of course, exploratory data analysis can
be thought of as yet another inferential
framework, albeit a less formal one. Although
this can provide a very powerful framework sp provides basic spatial data handling facilities;
for discovering patterns in data, it could
maptools provides map drawing functionality as
be argued that this is an entire subject in
well as the ability to import geographical data
its own right, and that there will be many
in a number of common formats, such as ArcGis
examples elsewhere in this book, where the
shapeles;
production of maps and associated graphics
by various software packages provide excel- spdep provides a number of hypothesis tests
lent examples exhibiting the power and utility and model calibration facilities relating to models
of graphical data exploration. allowing for spatial dependencies; and
STATISTICAL INFERENCE FOR GEOGRAPHICAL PROCESSES 223
200
150
= 0.05
Observed Value
Frequency
100
50
0
spgwr provides a number of tools for Geo- 3 Conservative here means that the test has a
graphically Weighted Regression (GWR) analysis. signicance level of 5% or lower.
Diggle, P.J., Tawn, J.A. and Moyeed, R.A. (1998). Metroplois, N. and Ulam, S. (1949). The Monte
Model-based geostatistics. Applied Statistics 47: Carlo method. Journal of the American Statistical
299350. Association 44: 335341.
Fotheringham, A.S., Brunsdon, C. and Charlton, M. Openshaw, S. (1984). The Modiable Areal Unit
(1998). Scale issues and geographically weighted Problem. Norwich: Quantitative Methods Research
regression. In Tate, N. (ed.), Scale Issues and GIS. Group, Royal Geographical Society and Institute of
Chichester: John Wiley and Sons. British Geographers, Concepts and Techniques in
Modern Geography Publication No. 38.
Fotheringham, A.S., Brunsdon, C. and Charlton, M.
(2002). Geographically Weighted Regression: Openshaw, S. (1987). A mark i geographical analysis
The Analysis of Spatially Varying Relationships. machine for the automated analysis of point
Chichester: John Wiley and Sons. data sets. International Journal of Geographical
Fotheringham, A.S. and Brunsdon, C. (2004). Some Information Systems 1: 335358.
thought on inference in the analysis of spatial data. Ord, J.K. and Getis, A. (1995). Local spatial
International Journal of Geographical Information autocorrelation statistics: Distributional issues
Science 18: 44757. and an application. Geographical Analysis
Lee, P.M. (1997). Bayesian Statistics: An Introduction. 27: 286306.
London: Arnold. Rees, P. (1995). Putting the census on the researchers
LeSage, J. (1997). Bayesian estimation of spatial desk. In Openshaw, S. (ed.), Census Users
autoregressive models. International Regional Handbook, pp. 2782. Cambridge: GeoInformation
Science Review 20: 113129. International.
Manly, B. (1991). Randomization and Monte Carlo Siegel, S. (1957). Nonparametric Methods for the
Methods in Biology. London: Chapman and Hall. Behavioral Sciences. New York: McGraw-Hill.
12
Fuzzy Sets in Spatial Analysis
Vincent B. Robinson
analysis and planning. In addition, it is often fuzzy sets and their relevant use in spatial
the case that concepts, or parameters, in spa- analysis.
tially explicit models are inherently inexact. This chapter will first briefly review some
These, and other problems of uncertainty, of the more noteworthy accomplishments
have led many to use techniques based on using fuzzy sets in spatial analysis. Then
fuzzy set theory. Ironically, fuzzy sets can it will discuss the issue of assigning fuzzy
be used to help make analyses less fuzzy membership and how it has been approached
because the inexactness is managed explicitly for use in spatial analysis. Finally, it will
rather than implicitly. However, like other briefly discuss some issues and challenges of
efforts at formalization, it can help lay bare using fuzzy sets in spatial analysis.
assumptions and force us to be explicit about
their meaning.
The basic idea underlying fuzzy set theory
is that an element can be classified as 12.2. FUZZY SETS AND SPATIAL
being a member of more than one set ANALYSIS: SOME
and to varying degrees hold membership in ACCOMPLISHMENTS
each class. In the usual Boolean, or crisp,
set theory, membership of an element x Spatial analysis is a broad field not rele-
in a set A, is defined by a characteristic gated to just social science, ecology, soil
function that indexes the degree to which science, geography, or engineering. Fuzzy
the object in question is in the set. It set theory has been specifically noted as
should be noted that it is customary, but being a more natural way of representing and
not strictly necessary, for the index to range analyzing phenomena in such diverse areas as
from 0, for full non-membership, to 1.0 social science (Ragin and Pennings, 2005),
for full membership. Hence, the member- soil science (McBratney and Odeh, 1997),
ship function is the fundamental element ecology (Schaefer and Willson, 2002) as well
necessary to use fuzzy sets. A membership as geographical analysis and engineering.
function measures the fractional truth value Thus, there are many areas in which it
a statement such as Object Y is a member has been shown to be of value in spatial
of set S. analysis. Some of the more recent and
A variety of other works contain in-depth important accomplishments are noted in this
explanations of the relevant fundamental section.
concepts of fuzzy set theory. Studies such as Fundamental to some types of spatial anal-
that by Klir et al. (1997), Buckley and Eslami ysis is the generation of a surface from data
(2002), and Zimmerman (2001) cover many that generally contains some level of uncer-
aspects of fuzzy sets in considerable depth tainty. These data are generally represented
with applications as examples. More relevant as a set of points. Not only may there be
to those interested in spatial analysis is the uncertainty regarding the data measurement,
geographic information systems (GIS) text- but there may be duplicate data records
book by Burrough and McDonnell (1998). in the spatial database that may confound
Other informative perspective pieces have an analysis. Torres et al. (2004) present
appeared in the social sciences (Verkuilen, an asymptotically optimal algorithm for
2005), soil science (McBratney and Odeh, eliminating duplicates that incorporates the
1997) and GIS (Robinson, 2003). Hence handling of fuzzy uncertainty. The generation
there are many sources that one can turn of surfaces from point data entails some form
to for background on the fundamentals of of interpolation. In this regard there have
FUZZY SETS IN SPATIAL ANALYSIS 227
been several promising approaches presented Often spatial decision making is repre-
for the interpolation of spatial surfaces from sented using decision tables (DT). However,
point data (Anile et al., 2003; Gedeon et al., the problem of strict, crisp boundaries is
2003; Lodwick and Santos, 2003). viewed as a significant problem in the use
Sampling of spatial data is fundamental of DTs for locational decision making and
to spatial analysis. Fuzzy set theory has spatial analysis. Witlox and Derudder (2005)
been used relatively rarely in this regard. have demonstrated how fuzzy decision tables
It has been shown how a combination of can be formulated and used effectively. They
fuzzy clustering and regionalized variables show that it is possible to explicate the
can be used to estimate the optimal spacing imprecision involved in the decision-making
of sample collection sites for soils map- process using FDTs. However, like DTs,
ping (Odeh et al., 1990). Developments when the number of conditions becomes large
in mapping systems that integrate mobile then knowledge-based techniques may be
computing, GIS, and one or more sensors to more effective and manageable.
take physical measurements (Arvanitis et al., The use of fuzzy sets in spatial analysis
2000) suggest that it is becoming realistic has been shown to improve the accuracy of
to think in terms of spatial data collection representing spatial phenomena in a variety
agents that use fuzzy logic in an adap- of domains. Often times this improvement
tive spatial sampling strategy. Simulation is also coincident with a reduction in cost.
results of a prototypical system for adaptive Using a fuzzy similarity approach, Hwang
sampling along a transect suggest that the and Thill (2005) found that the rate of
fuzzy adaptive sampler usually produced success of a typically used georeferencing
better results and on average required fewer procedure went from 86% up to 94% of all
sampling locations (Graniero and Robinson, fatal accidents. This may not sound like a
2003). great many instances, but in a mission-critical
When using spatial analysis in support of application such as locating fatal accidents,
spatial decision making, it is sometimes noted this represents a significant gain in accuracy.
that the results of using a crisp, or nonfuzzy, In a different domain, the soilland inference
approach to provide an information space model (SoLIM) based on fuzzy set theory
for making decisions virtually guarantees (Zhu et al., 2001; Zhu, 1997) has been
that an analysis will ignore potentially estimated to have increased the accuracy
useful information (Morris and Jankowski, of spatially explicit soils data by as much
2005; Oberthur et al., 2000; Yanar and as 20% at a third of the cost of tradition
Akyurek, 2006). Thus, a nonfuzzy, or crisp, techniques (Zhu, 2004). A similar result of
approach may have the effect of hiding lower cost and higher accuracy when using
important spatially explicit information from a fuzzy logic-based methodology has been
decision makers, hence increasing the risk suggested by work on ecological landscape
of incurring additional costs by forgoing mapping (MacMillan et al., 2003; MacMillan
an opportunity because it was not known et al., 2000).
to the decision maker. This feature of For spatial analysis, map comparison is
fuzzy versus crisp approaches has been useful for purposes of studying dynamic
noted in studies as varied as landfill site processes such as land cover change, compar-
selection (Charnpratheep et al., 1997), real ing simulation model results with empirical
estate evaluation (Zeng and Zhou, 2001), data, map creation/revision and translating
and soil erosion potential (Ahamed et al., between maps using different semantics.
2000). Translating between map products from
228 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
differing sources with differing semantics of linguistic assessments that are often part
is a problem when assembling spatial of the qualitative criteria.
data for analysis. To address this prob- It is not uncommon for fuzzy set theory
lem, Ahlqvist (2005) used rough fuzzy applications to be incorporated in a compo-
sets to analyze the semantic similarity of nent of a larger decision support system. For
map products having differing classification example, in the DISCUSS system of spatially
semantics. Fritz and Lee (2005) used a disaggregating costbenefit analyses fuzzy
fuzzy logic based methodology to compare logic is used in only one component. The
two land cover datasets and found the fuzzy spatial disaggregation method uses
fuzzy agreement approach superior to the standard membership curves operating on
nonfuzzy approach in identifying areas of spatial variables. If the initial method of
severe disagreement. Using a hierarchical spatial disaggregation is not accepted, then
fuzzy pattern matching technique, Power a fuzzy disaggregation method is used that is
et al. (2001) were able to convincingly based on membership functions on distance
demonstrate the superiority of the use of variables and fuzzy addition. Using fuzzy
fuzzy logic to address the problems of map sets this work has shown how cost benefit
comparison. They noted the deficiencies of analyses can be spatially disaggregated,
using a map comparison statistic such as something that has rarely been accomplished
the Kappa measure that relies on crisp, in the past (Paez et al., 2006).
nonfuzzy categories. This has subsequently In some cases, when compared with more
been addressed by Hagen and others in traditional methods that are statistics-based,
their development of the K-fuzzy (or fuzzy fuzzy techniques have provided superior
kappa) (Hagen-Zanker et al., 2005) that results. For example, when they replaced
takes into consideration the fuzziness of their principal components model with a
both location and attribute quality (Hagen, fuzzy set analysis. Taylor and Derudder
2003). This is one of the more promis- (2004) noted that the fuzzy-based analysis
ing approaches to comparing spatial fields provided an exceptionally clear picture of
(Wealands et al., 2005). regional and hierarchical tendencies among
A variety of multi-criteria decision making world cities. In a similar vein, Katz et al.
efforts have used fuzzy techniques to address (2005) concluded that regression analysis
spatially explicit problems. A methodology did not provide meaningful results while
for assessing land for allocation to restoration fuzzy set analysis did provide meaningful
projects demonstrated that the additional results. In quite a different domain, Kuo
information afforded by fuzzy classification et al. (2003) incorporated a fuzzy analytical
can be of significance in avoiding misallo- hierarchical process (AHP) to support the
cations that would result in unnecessary cost locational decision for convenience stores.
(Guneralp et al., 2003). Similar conclusions Since the mean standard error (MSE) for
could be drawn from an earlier study on the fuzzy AHP was 0.0173 as opposed
allocation of land for industrial use. Jiang and to 0.091 for the regression model, Kuo
Eastman (2000) showed that results can vary et al. (2003) concluded that fuzzy AHP
significantly as a function of the method of plus artificial neural network (ANN) decision
aggregation. In their review of fuzzy-based support system provided more accurate
approaches Kahraman et al. (2003) suggest results than did a regression model. In
that nonfuzzy, conventional approaches to geographical soil science, Oberthur et al.
the facility location problem tend to be less (2000) showed that nonfuzzy approaches
effective in dealing with the imprecise nature severely misclassified land while fuzzy
FUZZY SETS IN SPATIAL ANALYSIS 229
to formally accommodate uncertainty, it has to fall between 0.0 and 1.0, they are
also been shown to enable model self- not probabilities. Probabilities and fuzzy
evaluation thus avoiding semantic errors in membership values measure very different
complex process models that unknowingly things. For example, one of the most impor-
compromise the integrity of an analysis tant properties of probability is additivity.
(Mackay and Robinson, 2000). There is no such inherent restriction on
With the advent of GIS the representation fuzzy memberships. In fact, the sum of
and query of uncertain spatial to support membership values is interpreted as fuzzy
spatial analysis has developed substantially. cardinality (i.e., the size of a fuzzy set).
Cross and Firat (2000) discuss the issues Second, membership values are not a simple
involved in construction of fuzzy spatial quantitative variable of the interval level. It
objects with specific reference to GIS. Morris is because the end points (i.e., 0 and 1) have
(2003) describes a fuzzy object-oriented more meaning than just being artifacts of
framework to model spatial objects with the membership function. Verkuilen (2005)
uncertain boundaries. Another object-based suggests it is really a generalization of the
effort is that of Bordogna et al. (2006) who case of dichotomous dummy variables that
developed a fuzzy object-based data model are often used to represent ordinary crisp sets.
as a tool for supporting spatial analysis. It In the spatial analytic and GIS literature
is based on the management of a linguistic it is common to refer to either the Semantic
granule. Import (SI) or Similarity Relation (SR) model
Verstraete et al. (2005) presented detailed (Burrough and McDonnell, 1998; Robinson,
techniques for modeling fuzzy spatial infor- 1988). However, it may be more useful
mation represented as triangular irregular to consider that fuzzy memberships are
networks (TINs) and raster (grid) layers. usually a function of a direct assignment
They show how processing, as well repre- (DA), indirect assignment (ID), or an assign-
sentation, can be carried out using fuzzy ment by transformation (AT) methodology
set theory to represent the uncertainty in (Verkuilen, 2005).
spatial data. One of the significant aspects
of this work is its presentation of the
use of type-2 fuzzy sets. In other words,
12.3.1. Direct assignment
it detailed how to formally represent and
process uncertainty not just about the spatial Studies where membership functions are pro-
data, but also uncertainty about the fuzzy vided directly by an expert is characteristic
membership functions themselves. of the direct assignment (DA) method. It is
also common in the DA method of assign-
ment to make use of standard membership
functions such as the triangular, trapezoidal,
12.3. ASSIGNING FUZZY bell, and others (Robinson, 2003). For exam-
MEMBERSHIPS ple, in consultation with experts, DeGenst
et al. (2001) made use of a standard curve
Crucial to any spatial analysis using fuzzy to describe a basic spatial relation in their
sets is the assignment of membership. With study of squirrel dispersal. Often, as in
regard to fuzzy membership, it is important Braimoh et al. (2004), and Zeng and Zhou
to realize that fuzzy memberships have (2001), the choice of membership function
special characteristics. First, although fuzzy is based on the literature, common-sense,
membership values are typically normalized and/or expert opinion. Sometimes these
FUZZY SETS IN SPATIAL ANALYSIS 231
membership functions are available as part crisp answers the system generates a fuzzy
of a geographic information system so that representation of a spatial concept (Robinson,
experts can specify them directly in an 2000). This approach may be useful to gen-
automated geospatial environment (Yanar eralize for obtaining fuzzy representations
and Akyurek, 2006). Neverthless, they are of individual concepts; it is not suitable for
still directly assigned by an expert. It should use in studies where more complex expert
be noted that this approach is sometimes knowledge representations are required.
criticized because of these deficiencies: One of the reasons indirect assignment
is less often used is the difficulty of the
knowledge elicitation process. Zhu (1999)
1 Interpretation is difcult because rarely is there
used personal construct theory to formu-
anything tangible underlying the number.
late a rigorous methodology for eliciting
expert knowledge about soils. Part of the
2 It may be too difcult for the expert(s) to do
process included the expert interacting with
reliably, especially if they are not well-versed in
a graphical user interface (GUI) to assist
fuzzy set theory.
in formalizing the relations. The result of
this intensive knowledge elicitation process
3 Can be biased. In particular, subjects may
was used to populate a fuzzy soil similarity
systematically be biased towards the end points
model. This is one of the rare studies in
(Thole et al., 1979).
the geographical literature where knowledge
consistency and validation were explicitly
4 Difculty in combining assignments from multiple
incorporated into the knowledge elicitation
experts. This is especially difcult when the
process. Although the process is rigorous
assignments are at extreme variance from one
and thorough, the interviews with the expert
another (Verkuilen, 2005).
that are essential to the process can be very
tense and often frustrating for the expert
Despite these deficiencies, direct assign- as well as the knowledge engineer. Hence,
ment remains a commonly used strategy for it can be difficult to secure an experts
assigning membership values. cooperation. This is perhaps why there are so
few studies in the spatial analytic literature
where a rigorous indirect assignment process
is followed.
12.3.2. Indirect assignment
Paired comparisons have been used in
Indirect assignment elicits responses of some conjunction with fuzzy sets and spatial
kind from experts and applies a model analysis, but generally not in the construction
to the judgments to generate membership of membership functions/values themselves.
values. There have been a number of For example, Charnpratheep et al. (1997)
approaches used to formalize the process used paired-comparison analytic hierarchy
of generating fuzzy set memberships from process (AHP) methodology to arrive at
expert knowledge. One of the simpler weights that were subsequently used in a
approaches showed how an intelligent, inter- convex combination model of fuzzy aggre-
active question/answer system could be used gation. However, their membership functions
to generate fuzzy representations of a spatial were by direct assignment. In another
relation such as near. In this approach the instance, Kuo et al. (2003) used a fuzzy
expert need only provide a yes/no answer to a AHP methodology that made use of a ques-
question posed by the software. From those tionnaire to acquire data on store location
232 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
decisions from 16 business experts. The wild lands. Although, respondents detailed
results of this questionnaire exercise provided what distances represented concepts like
enough information to estimate the weight near, the use of a default triangular
assigned to each factor (e.g., the competition membership function meant that the actual
dimension received the highest single weight membership function was not obtained
of 0.1922). They show that weights provided directly from respondent data. Neverth-
by fuzzy AHP can be applied as criteria for less, it does represent a more formalized
selecting important factors to subsequently be approach for proceeding from questionnaire
used in an artificial neural network location responses to construction of a fuzzy set or
analysis. These works are suggestive of a rule base.
linkage to discrete choice modeling (e.g.,
Fotheringham, 1988; Train, 2003). Some
work in the transportation field has explored
12.3.3. Assignment by
the use of fuzzy sets in modeling route
transformation
choice (Vythoulkas and Koutsopoulos, 2003).
Since preferences play an important role In this approach a numerical variable is
in discrete choice modeling, Ridwan (2004) taken and mapped into membership val-
introduced a model of route choice based ues by some transformation. There are
on fuzzy preference relations. The elements many different approaches that assign
the fuzzy relations were specified as fuzzy fuzzy membership using some version of
pairwise comparisons between alternative assignment by transformation. In this section
routes. Since the use of logit models are many of the approaches used to address
commonly used to estimate the probability problems in spatial analysis are briefly
of alternatives being chosen, it is interesting discussed.
to note that Henn (2000) presents a fuzzy Among the more typical approaches to
formulation that suggests the logit model assignment is the use of a fuzzy clustering
is a special case of his fuzzy based model algorithm. Perhaps the most commonly
when the similarity measure has a given used method across the spatial sciences for
shape. assigning membership is based on the fuzzy
Questionnaires have been reportedly used c-means algorithm originally developed by
in some studies as an instrument for Dunn (1973) later generalized by Bezdek
constructing fuzzy memberships. Although (1974, 1981). It is also known as the
details are not given, Lin et al. (2006) fuzzy k-means (FKM) or fuzzy ISODATA
describe a process using results of a ques- algorithm. It is derived to minimize an objec-
tionnaire survey to construct a fuzzy rule tive function with respect to the membership
base. They were able then to make some functions and centroids of c clusters. Hence
tentative statements about changes in activity it is useful for clustering multivariate data
centers in relation to a subway line. Simi- into a finite number of fuzzy sets (Brown,
larly, Fritz et al. (2000) used a web-based 1998; Cheng et al., 2002; Irvin et al.,
questionnaire where distances specified by 1997; McBratney and Odeh, 1997; Stefanakis
respondents were used to construct fuzzy et al., 1999). In spatial analytic studies
sets for defining the concepts near, medium each spatial object would be classified as a
and far for visible features and close and member of all classes but to varying degrees.
far away for nonvisible features. They then Although used in numerous studies since
detailed a methodology that combined the the algorithm was published (Bezdek et al.,
resulting fuzzy rules to aid in mapping of 1984), it continues to figure prominently in
FUZZY SETS IN SPATIAL ANALYSIS 233
These neural network approaches have Carlo sampling. Then each simulation result
advantages and disadvantages. The advan- is evaluated by regressing simulated evap-
tages include an ability to learn from training orative fraction from RHESSys and surface
data and they can handle noisy, incomplete temperature from thermal remote sensing
data. Once trained, an ANN can respond to data. For each regression, the coefficient
a new set of data instantly. However, they of determination (R2 ) is calculated and
can take a long time to train, especially used as a fuzzy measure of the goodness-
since training is still largely by trial and of-fit for its respective simulation result.
error complicated by the fact that incomplete Hence the fuzzy set is composed of the
training data can cause the network to provide set of R2 measures for all simulations, to
incorrect results. Perhaps the most important which an information-theoretic tool based on
disadvantage is that it is difficult to explain ordered possibility distributions is applied
the specific reasoning leading to the output to form a restricted set in which only
product. Hence it can be a kind of black-box good simulations retained. A restricted
approach. set is used as an ensemble solution in
The presence, or absence, of an asso- the second stage of parameter estimation.
ciation, interaction or interconnectedness Note that a separate ensemble solution is
between elements of two or more sets is produced for each hillslope (Mackay et al.,
represented by a crisp relation. Rather than 2003).
presence/absence of association, degrees of
association can be represented by member-
ship grades in a fuzzy relation in much the
same way as degrees of set membership are 12.4. COMBINING MEMBERSHIPS
represented in a fuzzy set. Thus, the classical
notion of relation can be generalized into A common requirement of fuzzy spatial
matter of degree as a fuzzy relation. Fuzzy analysis is the combination of several fuzzy
relations have been used to formally rep- sets in a desirable manner to produce
resent fuzzy regions and their relationships a single fuzzy set (Klir et al., 1997).
(Zhan and Lin, 2003). In addition, Kahraman This combination is often accomplished
et al. (2003) present an example of using using aggregation operators. In fuzzy set
fuzzy relations in a model of group decision theory there are many aggregation opera-
making for the facility location selection tors from which to choose with the most
problem. common being the min (intersection) and
Statistical data analysis has been suggested max (union) operators (Robinson, 2003).
as another way to choose fuzzy membership The choice of operator depends on the
functions and form fuzzy rules (Hanna nature of the underlying decision model.
et al., 2002). However, it has not been For example, in their fuzzy-base cellular
used widely in spatial analysis. An example automata model of insect infestation, Bone
of its application to a spatially explicit et al. (2006) used a compensatory operator
problem is illustrated by the problem of rather than the noncompensatory operators
estimating parameters to use in a regional (max or min) because the compensatory
ecohydrological simulation model. Mackay aggregation operator allows for the influence
et al. (2003) use a two stage methodology of each set to contribute to the final
where in the first stage many simulations are result.
run in which parameters affecting stomatal The basic aggregation operators have
conductance are assigned values using Monte been further developed using various
FUZZY SETS IN SPATIAL ANALYSIS 235
fuzzy sets and mainstream spatial analysis. own. This implies that they are competing
There are examples of the fuzzification of paradigms when they may be more properly
mainstream methods such as in the case of viewed as complementary paradigms of
kriging and the kappa statistic. However, analysis.
other aspects of fuzzy statistics have yet to
be explored in depth for the analysis of spa-
tial data. For example, although regression
analysis is often used in spatial analyses REFERENCES
the use of fuzzy regression techniques is
Ahamed, T.R.N., Rao, G.K. and Murthy, J.S.R. (2000).
virtually unheard of in spatial analysis
Fuzzy class membership approach to soil erosion
even though fuzzy regression techniques modelling. Agricultural Systems, 63: 97110.
address the case where the relations of the
Ahlqvist, O. (2005). Using uncertain conceptual
variables are subject to fuzziness or where
spaces to translate between land cover categories.
the variables themselves are fuzzy (Taheri, International Journal of Geographical Information
2003). Science, 19: 831857.
Many efforts in spatial analysis are
Ahn, C.-W., Baumgardner M.F. and Biehl L.L. (1999).
concerned with the testing of hypotheses. Delineation of soil variability using geostatistics and
Mainstream methods rely upon classical fuzzy clustering analyses of hyperspectral data. Soil
statistics to determine whether a hypothesis Sci. Soc. Am. J., 63: 142150.
should be rejected. Little investigation of Anile, M.A., Furno, P., Gallo, G. and Massolo, A.
fuzzy hypothesis testing has been done in (2003). A fuzzy approach to visibility maps creation
the context of spatial analysis. However, over digital terrains. Fuzzy Sets and Systems, 135:
as Smithson (2005) points out, fuzzy sets 6380.
and statistics work better together. There Arvanitis, L.G., Ramachandran, B., Brackett, D.P.,
are a few cases where this is demonstrated Abd-El Rasol, H. and Xu, X.S. (2000). Multire-
mostly in relation to the process of assigning source inventories incorporating GIS, GPS and
database management systems: a conceptual model.
membership values (Ahn et al., 1999; Brown,
Computers and Electronics in Agriculture, 28:
1998; Mackay et al., 2003), not in the explicit 89100.
testing of hypotheses.
Bardossy, A., Bogardi I. and Kelly W.E. (1989).
There are several broad issues that will Geostatistics utilizing imprecise (fuzzy) information.
face researchers attempting to use fuzzy Fuzzy Sets and Systems, 31: 311328.
sets in spatial analysis. Perhaps, the most
Bezdek, J.C. (1974). Cluster validity with fuzzy sets.
fundamental issue is when, or when not, Journal of Cybernetics, 3: 5873.
to use fuzzy-based analysis. This is not
Bezdek, J.C. (1981). Pattern Recognition with Fuzzy
easily answered and demands considerable
Objective Function Algorithms. New York: Plenum
knowledge of both the problem at hand as Press.
well as both mainstream methods as well as
Bezdek, J.C., Ehrlich, R. and Full, W. (1984). FCM: the
fuzzy-based methods. However, fuzzy-based
fuzzy c -means clustering algorithm. Computers and
approaches are showing great promise, yet Geosciences, 10: 191203.
are still not as widely known, or understood,
Bone, C., Dragicevic, S. and Roberts, A. (2006). A fuzzy-
as many of the mainstream approaches constrained cellular automata model of forest insect
detailed in other chapters of this book. infestations. Ecological Modelling, 192: 107125.
Another issue is whether or not a fuzzy-
Bordogna, G., Chiesa, S. and Geneletti, D. (2006).
based spatial analysis should be evaluated Linguistic modelling of imperfect spatial information
against nonfuzzy-based techniques or are as a basis for simplifying spatial analysis. Information
they now developed enough to stand on their Sciences, 176: 366389.
FUZZY SETS IN SPATIAL ANALYSIS 237
Bossomaier, T., Amri, S. and Thompson, J. (2005). Cobb, M.A., Chung, M.J., Foley III, H., Petry, F.E. and
Agent-based modelling of house price evolution. Shaw, K.B. (1998). A rule-based approach for the
In: Proceedings of CABM-HEMA-SMAGET 2005 conation of attributed vector data. Geoinformatica,
Joint Conference on Multi-Agent Modelling for 2: 735.
Environmental Management, Bourg StMaurice-Les
Cross, V., Firat, A. (2000). Fuzzy objects for
Arc, France.
geographical information systems. Fuzzy Sets and
Botia, J.A., Gomez-Skarmeta, A.F., Valdes, M., and Systems, 113: 1936.
Padilla, A. (2001). Fuzzy and hybrid methods applied
DeGenst, A., Canters, F. and Gulink, H. (2001).
to GIS interpolation. In: The 10th IEEE International
Uncertainty modeling in buffer operations applied
Conference on Fuzzy Systems, 453456, Melbourne,
to connectivity analysis. Transactions in GIS, 5:
Australia.
305326.
Bragato, G. (2004). Fuzzy continuous classication and
Dunn, J.C. (1973). A fuzzy relative of the ISODATA
spatial interpolation in conventional soil survey for
process and its use in detecting compact well-
soil mapping of the lower Piave plain. Geoderma,
separated clusters. Journal of Cybernetics, 3: 3257.
118: 116.
Braimoh, A.K., Vlek, P.L., Stein, A. (2004). Land evalu- Fisher, P., Wood, J. and Cheng, T. (2005). Fuzziness
ation for maize based on fuzzy set and interpolation. and ambiguity in multi-scale analysis of landscape
Environmental Management, 33: 226238. morphometry. In: Petry, F.E., Robinson, V.B. and
Cobb, M.A. (eds.), Fuzzy Modeling with Spatial
Brown, D.G. (1998). Classication and boundary Information for Geographic Problems. pp. 207232.
vagueness in mapping presettlement forest types. Heidelberg: Springer.
International Journal of Geographical Information
Science, 12: 105129. Foody, G.M. (1999). The continuum of classication
fuzziness in thematic mapping. Photogrammetric
Brown, D.G. (1998). Mapping historical forest types in Engineering and Remote Sensing, 65: 443451.
Baraga County Michigan, USA as fuzzy sets. Plant
Ecology, 134: 97118. Foody, G.M. and Boyd, D.S. (1999). Fuzzy mapping of
tropical land cover along an environmental gradient
Buckley, J.J. and Eslami, E. (2002). An Introduction from remotely sensed data with an articial neural
to Fuzzy Logic and Fuzzy Sets. New York: Physica- network. Journal of Geographical Systems, 1: 2335.
Verlag.
Fotheringham, A.S. (1988). Consumer store choice
Burrough, P.A., McDonnell, R.A. (1998). Principles and choice set denition. Marketing Science,
of Geographical Information Systems. New York: 7: 299310.
Oxford University Press.
Fritz, S. and See, L. (2005). Comparison of land
Burrough, P.A., Wilson, J.P., van Gaans Pauline, cover maps using fuzzy agreement. International
F.M. and Hansen, A.J. (2001). Fuzzy k -means Journal of Geographical Information Science,
classication of topo-climatic data as an aid to forest 19: 787807.
mapping in the Greater Yellowstone area, USA.
Landscape Ecology, 16: 523546. Fritz, S., Carver, S. and See, L. (2000). New GIS
approaches to wild land mapping in Europe.
Charnpratheep, K., Zhou, Q. and Garner, B. (1997).
In: Wilderness Science in a Time of Change
Preliminary landll site screening using fuzzy
Conference Volume 2: Wilderness Within
geographical information systems. Waste Manage-
the Context of Larger Systems, Missoula, MT,
ment & Research, 15: 197215.
pp. 120127.
Cheng, T., Molenaar, M. and Lin, H. (2002). Formalizing
Gale, S. (1972). Inexactness, fuzzy sets, and the
fuzzy objects from uncertain classication results.
foundations of behavioral geography. Geographical
International Journal of Geographical Information
Analysis, 4: 337349.
Science, 15: 2742.
Gedeon, T.D., Wong, K.W., Wong, P. and Huang, Y.
Chiou, A. and Yu, X. (2001). Prediction of Parthenium
(2003). Spatial interpolation using fuzzy reasoning.
weed infestation using fuzzy logic applied to
Transactions in GIS, 7: 5566.
geographic information system (GIS) spatial image.
In: The 10th IEEE International Conference on Fuzzy Graniero, P.A. and Robinson, V.B. (2003). A real-
Systems, pp. 13631366, Melbourne, Australia. time adaptive sampling method for eld mapping in
238 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
patchy, heterogeneous environments. Transactions Klir, G.J., Ute, S.C., Yuan, B. (1997). Fuzzy Set Theory:
in GIS, 7: 3154. Foundations and Applications. Upper Saddle River,
NJ: Prentice Hall.
Guneralp, B., Mendoza, G., Gertner, G. and
Anderson, A. (2003). Spatial simulation and fuzzy Kosko, B. (1992). Neural Networks and Fuzzy Systems.
threshold analyses for allocating restoration areas. Englewood Cliffs, NJ: Prentice-Hall.
Transactions in GIS, 7: 325343.
Kuo, R.J., Chi, S.C. and Kao, S.S. (2003). A decision
Hagen, A. (2003). Fuzzy set approach to assess- support system for selecting convenience store
ing similarity of categorical maps. International location through integration of fuzzy AHP and
Journal of Geographical Information Science, 17: articial neural network. Computers in Industry, 47:
235249. 199214.
Hagen-Zanker, A., Straatman, B., Uljee, I. (2005). Leung, Y. (1983). Fuzzy sets approach to spatial
Further developments of a fuzzy set map comparison analysis and planning, a nontechnical evaluation.
approach. International Journal of Geographical Geograska Annaler, Series B, Human Geography,
Information Science, 19: 769785. 65: 6575.
Hanna, A.S., Lotfallah, W.B. and Lee, M.J. (2002). Liew, A.W.C., Leung, S.H. and Lau, W.H. (2000). Fuzzy
Statistical-fuzzy approach to quantify cumulative image clustering incorporating spatial continuity. IEE
impact of change orders. Journal of Computing in Proc Vision, Image, and Signal Processing, 147:
Civil Engineering, 16: 252258. 185192.
Heikkila, E.J., Shen, T.-Y., Yang, K.-Z. (2003). Fuzzy Lin, J.-J., Feng, C.-M., Hu, Y.-Y. (2006). Shifts in activity
urban sets: theory and application to desakota centers along the corridor of the Blue subway line in
regions in China. Environment and Planning B: Taipei. Journal of Urban Planning and Development,
Planning and Design, 30: 239254. 132: 2228.
Henn, V. (2000). Fuzzy route choice model for Liu, Z. and George, R. (2005). Mining weather data
trafc assignment. Fuzzy Sets and Systems, 116: using fuzzy cluster analysis. In: Petry, F.E., Robinson,
77101. V.B. and Cobb, M.A. (eds.), Fuzzy Modeling
Hwang. S., Thill. J.-C. (2005). Modeling localities with with Spatial Information for Geographic Problems,
fuzzy sets and GIS. In: Petry, F.E., Robinson, V.B. pp. 105119. Heidelberg: Springer.
and Cobb, M.A. (eds.), Fuzzy Modeling with Spatial Lodwick, W.A. and Santos, J. (2003). Constructing
Information for Geographic Problems. pp. 71104. consistent fuzzy surfaces from fuzzy data. Fuzzy Sets
Heidelberg: Springer. and Systems, 135: 259277.
Irvin, B.J., Ventura, S.J. and Slater, B.K. (1997). Fuzzy Lundberg, C.G. (1982). Modeling constraints and
and isodata classication of landform elements from anticipation: linguistic variables, foresight-hindsight
digital terrain data in Pleasant Valley, Winsconsin. and relative alternative attractiveness. Geographical
Geoderma, 77: 137154. Analysis, 14: 347355.
Jang, J.-S.R. (1993). ANFIS: Adaptive-network-based
Mackay, D.S. and Robinson, V.B. (2000). A multiple
fuzzy inference systems. IEEE Trans. Systems, Man
criteria decision support system for testing integrated
& Cybernetics, 23: 665685.
environmental models. Fuzzy Sets and Systems, 113:
Jiang, H. and Eastman, J.R. (2000). Application of 5367.
fuzzy measures in multi-criteria evaluation in GIS.
Mackay, D.S., Samanta S., Ahl, D.E., Ewers,
International Journal of Geographical Information
B.E., Gower, S.T., Burrows, S.N. (2003). Auto-
Science, 14: 173184.
mated parameterization of land surface process
Kahraman, C., Ruan, D. and Dogan, I. (2003). Fuzzy models using fuzzy logic. Transactions in GIS,
group decision-making for facility location selection. 7: 139153.
Information Sciences, 157: 135153.
MacMillan, R.A., Martin, T.C., Earle, T.J. and McNabb,
Katz, A., Vom, H.M. and Mahoney, J. (2005). Explaining D.H. (2003). Automated analysis and classication
the great reversal in Spanish America: fuzzy set of landforms using high-resolution digital elevation
analysis versus regression analysis. Sociological data: applications and issues. Canadian Journal of
Methods and Research, 33: 539573. Remote Sensing, 29: 592606.
FUZZY SETS IN SPATIAL ANALYSIS 239
MacMillan, R.A., Pettapiece, W.W., Nolan, S.C. Power, C., Simms, A. and White, R. (2001).
and Goddard, T.W. (2000). A generic proce- Hierarchical fuzzy pattern matching for the regional
dure for automatically segmenting landforms into comparison of land use maps. International Journal
landform elements using DEMs, heuristic rules, of Geographical Information Science, 15: 77100.
and fuzzy logic. Fuzzy Sets and Systems, 113:
Ragin, C.C. and Pennings, P. (2005). Fuzzy sets and
81109.
social research. Sociological Methods and Research,
Matsakis, P. and Nikitenko, D. (2005). Combined 33: 423430.
extraction of directional and topological relationship
Ridwan, M. (2004). Fuzzy preference based trafc
information from 2D concave objects. In: Petry,
assignment problem. Transportation Research Part
F.E., Robinson, V.B. and Cobb, M.A. (eds.), Fuzzy
C, 12: 209233.
Modeling with Spatial Information for Geographic
Problems. pp. 143158. Berlin: Springer. Robinove C.J. (1989). Principles of logic and the
use of digital geographic information systems.
McBratney, A.B. and Odeh, I.O.A. (1997). Application In: Ripple, W.J. (ed.), Fundamentals of GIS:
of fuzzy sets in soil science: fuzzy logic, fuzzy A Compendium, Washington, D.C.: American
measurements and fuzzy decisions. Geoderma, 77: Society for Photogrammetry and Remote Sensing.
85113. pp. 6179.
Morris, A. (2003). A framework for modeling Robinson, V.B. and Strahler, A.H. (1984). Issues in
uncertainty in spatial databases. Transactions in GIS, designing geographic information systems under
7: 83103. conditions of inexactness. In: Proceedings of
Morris, A. and Jankowski, P. (2005). Spatial decision 10th International Symposium on Machine Pro-
making using fuzzy GIS. In: Cobb, M.A., Petry, F. and cessing of Remotely Sensed Data, pp. 179188,
Robinson, V.B. (eds.), Fuzzy Modeling with Spatial Terre Haute, IN.
Information for Geographic Problems. pp. 275298. Robinson, V.B. (1988). Some implications of fuzzy set
Heidelberg: Springer. theory applied to geographic databases. Computers,
Oberthur, T., Dobermann, A. and Aylward, M. Environment, and Urban Systems, 12: 8997.
(2000). Using auxiliary information to adjust fuzzy Robinson, V.B. (2000). Individual and multipersonal
membership functions for improved mapping of fuzzy spatial relations acquired using human-
soil qualities. International Journal of Geographical machine interaction. Fuzzy Sets and Systems, 113:
Information Science, 14: 431454. 133145.
Odeh, I.O.A., McBratney, A.B. and Chittleborough, Robinson, V.B. (2003). A perspective on the funda-
D.J. (1990). Design of optimal sample spacings for mentals of fuzzy sets and their use in geographic
mapping soil using fuzzy k -means and regionalized information systems. Transactions in GIS, 7: 330.
variable theory. Geoderma, 47: 93112.
Robinson, V.B. and Graniero, P.A. (2005). Spatially
Paez, D., Bishop, I.D. and Williamson, I.P. (2006). explicit individual-based ecological modeling with
DISCUSS: a soft computing approach to mobile fuzzy agents. In: Petry, F.E., Robinson, V.B.
spatial disaggregation in economic evaluation and Cobb, M.A. (eds.), Fuzzy Modeling with Spatial
of public policies. Transactions in GIS, 10: Information for Geographic Problems. pp. 299334.
265278. Heidelberg: Springer.
Peschel, J.M. (2002). Creating land cover input Schaefer, J.A., Veitch, A.M., Harrington, F.H., Brown,
datasets from the SWAT 2000 model using remotely W.K., Theberge, J.B. and Luttich, S.N. (2001).
sensed data. Texas A&M University, http://ceprofs. Fuzzy structure and spatial dynamics of a declining
tamu.edu/folivera/TxAgGIS/Spring2002/Peschel/ woodland caribou population. Oecologia, 126:
peschel.htm, visited on April 14, 2006. 507514.
Pham, D.L. (2001). Spatial models for fuzzy clustering. Schaefer, J.A. and Willson, C.C. (2002). A fuzzy
Computer Vision and Image Understanding, 84: structure of populations. Canadian Journal of
285297. Zoology, 80: 22352241.
Pipkin, J.S. (1978). Fuzzy sets and spatial choice. Scull, P., Franklin, J., Chadwick, O.A. and McArthur, D.
Annals of Association of American Geographers, 68: (2003). Predictive soil mapping: a review. Progress
196204. in Physical Geography, 27: 171197.
240 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
Skubic, M., Blisard, S., Bailey, C., Adams, J.A., from fuzzy set theory, approximate reasoning and
Matsakis, P. (2004). Qualitative analysis of sketched neural networks. Transportation Research Part C, 11:
route maps: translating a sketch into linguistic 5173.
descriptions. IEEE Trans. on Systems, Man, and
Wanek, D. (2003). Fuzzy spatial analysis techniques in a
Cybernetics, 34: 12751282.
business GIS environment. In: European Regional Sci-
Smithson, M. (2005). Fuzzy set inclusiong: linking ence Association 2003 Congress, Jyvaskla, Finland
fuzzy set methods with mainstream techniques. [CD-ROM (paper no. 177)].
Sociological Methods and Research, 33: 431461.
Wealands, S.R., Grayson, R.B. and Walker, J.P.
Stefanakis, E., Vazirgiannis, M. and Sellis, T. (1999). (2005). Quantitative comparison of spatial elds for
Incorporating fuzzy set methodologies in a DBMS hydrological model assessment some promising
repository for the application domain of GIS. approaches. Advances in Water Resources, 28:
International Journal of Geographical Information 1532.
Science, 13: 657675.
Wilson, J.P., Burrough, P.A. (1999). Dynamic modeling,
Taheri, S.M. (2003). Trends in fuzzy statistics. Austrian geostatistics, and fuzzy classication: new sneakers
Journal of Statistics, 32: 239257. for a new geography? Annals of the Association of
Taylor, P.J. and Derudder, B. (2004). Porous Europe: American Geographers, 89: 736746.
european cities in global urban arenas. Tijdschrift Witlox, F. and Derudder, B. (2005). Spatial decision-
voor Economishe en Sociale Geogra phie, 95: making using fuzzy decision tables: theory, applica-
527538. tion and limitations. In: Petry, F.E., Robinson, V.B.
Teng, C.H. and Fairbairn, D. (2002). Comparing and Cobb, M.A. (eds.), Fuzzy Modeling with Spatial
expert systems and neural fuzzy systems for object Information for Geographic Problems. pp. 253275.
recognition in map dataset revision. International Berlin: Springer.
Journal of Remote Sensing, 23: 555567. Wu, F. (1998). Simulating urban encroachment on rural
The Math Works Inc. (2002). Fuzzy Logic Toolbox Users land with fuzzy-logic-controlled cellular automata
Guide. Natick, MA. USA: The Math Works Inc. in a geographical information system. Journal of
Environmental Management, 53: 293308.
Thole, U., Zimmermann, H.-J. and Zysno, P. (1979). On
the suitability of minimum and product operators for Yanar Tahsin, A. and Akyurek, Z. (2006). The
the intersection of fuzzy sets. Fuzzy Sets and Systems, enhancement of the cell-based GIS analyses with
2: 167180. fuzzy processing capabilities. Information Sciences,
176: 10671085.
Torres, R., Keller, G.R., Kreinovich, V., Longpre, L. and
Starks, S.A. (2004). Eliminating duplicates under Zadeh, L.A. (1965). Fuzzy sets. Information and Control,
interval and fuzzy uncertainty: an asymptotically 8: 338353.
optimal algorithm and its geospatial applications. Zeng, T.Q. and Zhou, Q. (2001). Optimal spatial
Reliable Computing, 10: 401422. decision making using GIS: a prototype of a real
Train, K.E. (2003). Discrete Choice Methods with estate geographical information system (REGIS).
Simulation. Cambridge, UK: Cambridge University International Journal of Geographical Information
Press. Science, 15: 307321.
Verkuilen, J. (2005). Assigning membership in a fuzzy Zhan, F.B., Lin, H. (2003). Overlay of two simple poly-
set analysis. Sociological Methods and Research, 33: gons with indeterminate boundaries. Transactions in
462496. GIS, 7: 6781.
Verstraete, J., De Tre, G., De Caluwe, R. and Hallez, A. Zheng, D. and Kainz, W. (1999). Fuzzy rule extraction
(2005). Field based methods for the modeling of from GIS data with a neural fuzzy system for
fuzzy spatial data. In: Petry, F.E., Robinson, V.B. decision making. In: Proceedings of the Seventh
and Cobb, M.A. (eds.), Fuzzy Modeling with Spatial ACM International Symposium on Advances in
Information for Geographic Problems. pp. 4170. Geographic Information Systems, Kansas City, MO,
Heidelberg: Springer. USA, pp. 7984.
Vythoulkas, P.C. and Koutsopoulos, H.N. (2003). Zhu, A.X., Hudson, B., Burt, J., Lubich, K. and
Modeling discrete choice behavior using concepts Simonson, D. (2001). Soil mapping using GIS, expert
FUZZY SETS IN SPATIAL ANALYSIS 241
knowledge, and fuzzy logic. Soil Science Society of resource mapping. International Journal of
America Journal, 65: 14631472. Geographical Information Science, 13: 119141.
Zhu, A.-X. (1997). A similarity model for representing Zhu, A.-X. (2004). Personal Communication. Depart-
soil spatial information. Geoderma, 77: 217242. ment of Geography, University of Wisconsin.
Zhu, A.-X. (1999). A personal construct-based Zimmermann, H.-J. (2001). Fuzzy Set Theory and Its
knowledge acquisition process for natural Applications. Boston, MA: Kluwer Academic.
13
Geographically Weighted
Regression
A. Stewart Fotheringham
entire study area we are investigating and the model to vary over space rather than to
that the same stimulus provokes the same calibrate a stationary model and then trying
response in all parts of the study region. to examine a possible error in the model
In a linear framework, we can represent through the spatial patterning of the residuals.
these relationships with the following general The specification of a model that allows
model:1 the parameter estimates to vary over space
is the essence of geographically weighted
regression (GWR).
yi = 0 + 1 x1i + 2 x2i + n xni + i
(13.1)
where win is the weight given to data point for many different locations as we will
n for the estimate of the local parameters at see below.
location i. Although the exact specification of the
There are many possible weighting func- weighting function can take many forms,
tions that could be specified which relate there are two broad categories of weighting
the weighting of an observed value at functions: fixed or adaptive. An example of
location j to the distance location j is a fixed spatial weighting function is shown
from the regression point i but they tend in Figure 13.2. In this case, the specified
to be Gaussian or Gaussian-like, reflecting weighting function or kernel is constant
the nature of many spatial processes. The across the study area and therefore has the
operation of a typical weighting function is undesirable property that in areas where data
shown in Figure 13.1. points are relatively sparse, the resulting local
Data points that are located close to parameter estimates will have high standard
the regression point are weighted highly errors attached to them reflecting the added
whereas data points that are far from the uncertainty in the estimates caused by the
regression point get a very low weight. relative lack of data.
Hence, the weighting matrix will change There are many functions that could be
every time the regression point changes. used to represent a fixed spatial weighting
GWR thus produces a model that effectively function. One is a Gaussian expression:
answers the question what do the relation-
ships in my model look like around this
location? The question can be answered wij = exp [ (dij /h)2 ] (13.6)
1 wij
Bandwidth
0
x dij
wij
wij
x Regression point
Data point
wij
wij
x Regression point
Data point
where dij is the distance between locations i outwards in order to capture more data. The
and j, and h is a parameter often referred to as operation of an adaptive kernel is shown in
the bandwidth as h increases, the gradient of Figure 13.3.
the kernel becomes less steep and more data Again, there are several functions that one
points are included in the local calibration. could use to produce a spatially adaptive
An alternative, and generally preferred, weighting function. One, for example, is the
alternative is an adaptive kernel where the following:
spatial extent of the kernel is dictated by
the underlying density of data points. In
areas where data are plentiful, the kernel wij = [1 (dij2 /h2 )]2 if j is one of the Nth
is relatively tightly defined around the nearest neighbours of i
regression point; in areas where the data are
relatively sparse, the kernel has to extend =0 otherwise (13.7)
GEOGRAPHICALLY WEIGHTED REGRESSION 247
and data on x1 and x2 drawn randomly for In this case, where there is no spatial
2500 locations on a 50 50 matrix subject to nonstationarity (the parameters are the same
the correlation between x1 and x2 , r(x1 , x2 ), everywhere), the global model is clearly
being controlled. In fact, the results of this appropriate and replicates the y variable
experiment can be shown to be independent perfectly and the estimated parameters are
of r(x1 , x2 ) so we will ignore this feature of equal to their known values. K represents
the experiment here. the number of parameters estimated in the
model. The results are not surprising the
processes being modelled are stationary so
the global model works well. The question
13.4.1. Experiment 1 (parameters is, how well does the GWR model perform
spatially invariant) in this situation? The results of the GWR
In this experiment, we set the three para- calibration are given below.
meters in the model to known, constant
values:
Local model calibrated by GWR
i = 10 for all i
Adj. R2 = 1.0
1i = 3 for all i
AIC = 59,386
2i = 5 for all i.
K = 6.5
N = 2,434
With everything on the right-hand side of
equation (13.10) now known, we can derive a
i (est.) = 10 for all i
value of yi at each location and then use these
data to calibrate the model both by ordinary 1i (est.) = 3 for all i
least squares regression and by GWR. The
results are as follows: 2 i(est.) = 5 for all i.
AIC = 2,218
so that 1i ranges between 5 and 5; and:
K = 167
2i = 5 + 0.2i + 0.2 j (13.13)
N = 129
The local model clearly captures the spatial various model diagnostics plus geocoded
nonstationarity in relationships extremely local parameter estimates, their local standard
well. The y variable is replicated accurately errors, local t-values and local goodness-of-
with the adjusted r-squared statistic being fit measures.
close to 1.0 and the AIC value being much An example of the interface is shown
lower than the comparable value from the in Figure 13.4. The user is asked to input
global model (2,218 versus 17,046). In this a data file from which the variable names
case, the local model is trying to make are stripped off and loaded into the GWR
itself as local as possible and the number of model editor for placement in the appropriate
nearest neighbours in each local regression model form. The user defines the dependent
is only 129. Recall also that the data on variable and a set of independent variables
these 129 observations are not weighted as 1 for the model from the variable list. The x
but will have a weight somewhere between and y coordinates of the data locations must
0 and 1 depending on their distance from also be designated. A kernel type (either fixed
the regression point. The effective number or adaptive), a calibration criterion (either
of parameter estimates is 167 reflecting the CV or AICc), and an output format for the
spatially varying nature of the processes geocoded information must then be selected
underlying the model and the ranges of before the model is saved and run. The output
local parameter estimates are close to their is presented in both a listing file on the
known values. The local parameter estimates screen and an output file which is saved for
are geocoded and can easily be mapped to subsequent processing generally mapping
display the nature of their spatial variation. of the output to see the spatial variation in
The conclusion from these two experi- local parameter estimates and goodness-of-fit
ments is that calibration of local models statistics.
by GWR allows the identification of spatial The model editor also allows extra
nonstationarity where it exists. Further, the computations. The user can select a Monte
GWR calibration procedure does not appear Carlo simulation exercise to examine the
to introduce any spurious nonstationarity significance of any spatial variability in local
in situations where a global model is parameter estimates and various other diag-
appropriate. nostics can be chosen. The user also has the
facility to by-pass the optimization routine
for the bandwidth and input his/her own
bandwidth. This can be useful to examine
13.5. SOFTWARE FOR GWR the effects of scale on the output: large
bandwidths essentially perform regional
Software for running GWR (GWR 3.1) is calibrations on the data; small bandwidths
available from the author and runs on any perform very local calibrations.
Windows platform. It has a very simple The software is distributed on a self-
point-and-click interface which makes it very loading CD which also contains sample data.
easy to calibrate models by GWR. The
user can select from a Gaussian, Poisson,
or binary logit GWR models. The current
restrictions on data size are a maximum 13.6. RESEARCH TOPICS
of 80,000 observations and 50 variables.
The software also calibrates a global model Although the initial development of GWR
for comparison and the output consists of took place over a decade ago and it is
GEOGRAPHICALLY WEIGHTED REGRESSION 251
the residuals from global models applied variation from that which is likely to be
to spatial data might be autocorrelated but attributable to something more interesting.
we now have the means of examining the Currently, this is done via Monte Carlo
relative contributions of different processes simulation but more formal methods might
to such autocorrelation. That said, there is be developed. One aspect of inference that is
still a potentially useful merger of spatial well known in these situations is that of the
regression models and GWR one could multiple hypothesis testing problem which
have, for example, a GWR version of suggests that the traditional cut-off points
a spatial regression model. If the spatial on a statistical distribution for rejecting a
regression model were an autoregressive null hypothesis is too liberal. Bonferroni-
model, for example, this would provide type adjustments should be made although
an easy way of calibrating local spatial recognizing that the hypothesis tests in GWR
autocorrelation statistics which are free from are not independent. Probably the ratio of
covariate effects. the effective number of parameters in the
A second research area is that of the GWR model to the number of parameters
development of what are termed mixed in the global model should be used as the
or semi-parametric GWR models where adjustment factor rather than the number
some of the parameters are allowed to vary of tests.
spatially whilst others are fixed globally. In Although the primary rationale for cali-
some instances, for example, there is no brating a GWR model is to uncover facets
reason to suspect that a particular relationship of possible nonstationarity in the processes
would be spatially varying and it makes being examined, a common question is to
sense to set such a parameter in the model what extent can the methodology be used
as fixed. The calibration of such models, for prediction? To answer this, research is
however, is somewhat more complex than the currently being undertaken to compare GWR
full GWR model. as a prediction method with various forms
This topic leads into a related one of kriging. The results so far suggest GWR
which concerns variable selection in GWR. provides much better estimates of unknown
It should be realized that simply because a values than do many types of kriging and
variable is insignificant at the global level, about the same level of predicative ability as
does not mean it might not be impor- universal kriging with external covariates. Of
tant locally. Consequently, variable selection course, the advantage of GWR is that much
should ideally be at the level of the GWR more information is yielded on the processes
model and not at that of the global model. at work.
Following from the above, however, vari- Finally, the most powerful aspect of GWR
ables could either be: unimportant at the local is the concept of geographically weighting
level, important but with a stationary effect, models. Anything that can be weighted can
or important with a spatially varying effect. be geographically weighted. The models
Consequently, variable selection, along the need not be linear nor even in a regression
lines of stepwise regression, is considerably format. One can generate, for example,
more complex in GWR. GW versions of any descriptive spatial
Another topic that needs further research statistic or GW versions of any multivariate
is that of statistical inference in GWR. It is statistical method such as GWR PCA or GW
necessary to distinguish the degree of spatial discriminant analysis. The task in these latter
variation in local parameter estimates that cases is probably to handle the large volumes
could reasonably be attributed to sampling of output that will be generated.
GEOGRAPHICALLY WEIGHTED REGRESSION 253
respectively xi and , such that the regression observations are for a single point in time,
becomes: the actual dynamics of the interaction among
agents (peer effects, neighborhood effects,
spatial externalities) cannot be observed, but
yi = xi + i . (14.2) the correlation structure that results once the
process has reached equilibrium is what can
be modeled (Brock and Durlauf, 2001, 2004).
In the classic regression specification, the This is also referred to as a spatial reaction
error terms have mean zero (E[i ] = 0, i), function (Brueckner, 2003). In the spatial
and they are identically and independently regression equation, this is accomplished
distributed (i.i.d.). Consequently, their vari- by including a function of the dependent
ance is constant, Var[i ] = 2 , and they are variable observed at other locations on the
uncorrelated, E[i j ] = 0, for all i, j. right-hand side:
In matrix notation, the N observations on
the dependent variable are stacked in an N 1
yi = g(yJi , ) + xi + i (14.4)
vector y, the observations on the explanatory
variables in an N K matrix X, and the
random error terms in an N 1 vector , where Ji includes all the neighboring loca-
such that: tions j of i, with j = i. The function g can be
very general (and non-linear), but typically is
y = X + (14.3) simplified by using a spatial weights matrix
(see also Chapter 8 in this volume). The
N N spatial weights matrix W has non-
with E[] = 0 (an N 1 vector of zeros), and zero elements wij in each row i for those
E[] = 2 I (with I as the identity matrix). columns j that are neighbors of location i.
Spatial dependence is introduced into The notion of neighbors is very general, and
this specification in two major ways, one not limited to geographical concepts, but can
referred to as spatial lag dependence, the readily be extended to neighbors in social
other as spatial error dependence (Anselin, network space (Leenders, 2002).
1988b). While the former pertains to spatial A so-called mixed regressive, spatial
correlation in the dependent variable, the autoregressive model (Anselin, 1998b) then
latter refers to the error term. Spatial takes on the form:
autocorrelation can also be introduced in
the explanatory variables, in so-called spatial
cross-regressive models (Florax and Folmer, yi = wij yj + xi + i (14.5)
j
1992). However, in contrast to the lag and
error models, cross-regressive models do not
require the application of special estimation where is the spatial autoregressive coef-
methods. They will therefore not be further ficient, and the error term i is i.i.d.
considered here. Alternatively, in matrix notation:
such that j wij = 1, i), this amounts to expected value (since the errors all have
including the average of the neighbors as mean zero):
an additional variable into the regression
specification. This added variable is referred
to as a spatially lagged dependent variable, or E[y|X] = X +WX + 2W2 X +
a spatial lag. For example, in a model for tax (14.8)
rates of local communities, this would add
the average of the tax rates in the neighboring
locations as an explanatory variable. The powers of matching the powers of the
The inclusion of the spatial lag is similar to weights matrix (higher orders of neighbors)
an autoregressive term in a time series con- ensures that a distance decay effect is
text, hence it is called a spatial autoregressive present.
model, although there is a fundamental differ- Even when the spatial lag specification is
ence. Unlike time dependence, dependence in not necessarily the result of a process of
space is multidirectional, implying feedback interaction among agents, it remains a useful
effects and simultaneity. More precisely, if model to deal with spatial autocorrelation,
i and j are neighboring locations, then yj and can be interpreted as a filtering model.
enters on the right-hand side in the equation More precisely, moving the spatial lag term
for yi , but yi also enters on the right-hand to the left-hand side reveals:
side in the equation for yj (the neighbor
relation is symmetric). This endogeneity
must be accounted for in the estimation yi = yi wij yj = xi + i (14.9)
process. j
The proper solution to the equations for
all observations is the so-called reduced
form, which no longer contains any spatially i.e., a standard regression model in a
lagged dependent variables on the right- dependent variable yi from which the spatial
hand side. After some matrix algebra, this correlation has been removed (filtered).
follows as: Unlike detrending time series data, however,
the parameter cannot take on the value
of 1 and must be estimated jointly with the
y = (I W)1 X + (I W)1 other parameters of the model. The spatial
(14.7) filtering interpretation is often useful when
there is a mismatch between the spatial
scale of observations and the spatial scale at
a model that is nonlinear in and and has which the phenomenon of interest manifests
a spatially correlated error structure (more itself. For example, this would be the case
precisely, a spatial autoregressive structure, when a regional phenomenon (e.g., a labor
see below). More importantly, this reveals market or housing market) is measured at
the spatial multiplier, i.e., the notion that a subregional scale, resulting in a high
the value of y at any location i is not only degree of positive spatial autocorrelation
determined by the values of x at i, but also (very little change across the sub-regional
of x at all other locations in the system. scale). In that situation, the estimation of the
This can be seen after a simple expansion spatial lag model will yield estimates for
of the inverse matrix term (for | | < 1 and the parameters that properly control for
with a row-standardized W), and using the the spatial autocorrelation.
SPATIAL REGRESSION 259
with E[uu] = 2 I, so that the complete error of an SMA (see also Anselin and Moreno,
variancecovariance matrix follows as: 2003). The common shocks framework
outlined in Andrews (2005) can encompass
general factor structures yielding different
E[] = 2 (I W)1 (I W )1 . specifications for the range and strength of
(14.14) spatial autocorrelation. This approach has
seen increased application in recent work on
spatial autocorrelation in panel data models
Even though the spatial weights matrix (Pesaran, 2005).
W may contain only a few neighbors for A final approach to provide structure to
each observation, the variancecovariance spatial error variancecovariance matrices is
structure that results from the SAR process based on a non-parametric rationale, which
is a non-sparse matrix, representing a global is particularly appropriate for local patterns
pattern of spatial autocorrelation. Moreover, of spatial autocorrelation. Using the formal
unless the number of neighbors is constant properties for a kernel estimator of spatial
for each observation, the diagonal elements autocovariance established by Hall and Patil
in the variancecovariance matrix will not (1994), a general non-parametric covariance
be constant, resulting in heteroskedasticity. matrix estimator has been suggested by
This induced heteroskedasticity is a distin- Conley (1999), and, more recently, by
guishing characteristic for spatial processes, Kelejian and Prucha (2007).
and it complicates specification testing and
estimation. More precisely, since many of the
theoretical asymptotic results in time series
analysis are based on assumptions of constant 14.3. HIGHER ORDER MODELS
variance, they do not translate directly to
spatial processes; for technical details, see, In addition to the basic spatial lag and spatial
e.g., Anselin (2006). error models just reviewed, higher order
Other spatial processes used to provide models can be specified as well, by including
structure to the error variancecovariance multiple weights matrices, by combining
matrix include a conditional autoregressive lag and error structures, and by including
process (CAR) and a spatial moving average specification for spatial heterogeneity jointly
process (SMA). The CAR model is often with spatial dependence. An extensive review
used as a prior in hierarchical Bayesian spec- of these specifications can be found in
ifications, whereas the SMA specification Anselin (2006).
is appropriate for local patterns of spatial
autocorrelation (for details, see Anselin,
2006).
Error component models have been sug- 14.4. SPECIFICATION TESTS
gested as well, and some recent theoretical
results provide the basis for a wide range In empirical practice, there are often no
of structures for error spatial autocorrelation. strong a priori reasons to consider a
In Kelejian and Robinson (1992), an error spatial lag or spatial error model in a
decomposition was proposed that combined cross-sectional situation. Instead, the need
a local or location-specific component with for such a specification follows from the
a spillover component, yielding an error result of model diagnostics. Specifically,
variancecovariance structure similar to that diagnostic tests derived from the residuals of
SPATIAL REGRESSION 261
unrestricted (spatial) model. This requires and Anselin (1988a), the statistic is:
both the point estimate of the parameter as
well as an estimate of the asymptotic variance
matrix (for technical details, see Anselin, [eWe/(e e/N)]2
LM = (14.18)
1988b, Ch. 6). tr[WW + WW]
The likelihood ratio test statistic is
obtained in the standard manner as well, where e is a N 1 vector of OLS residuals,
as twice the difference between the log- and tr stands for the trace operator (the sum
likelihood of the unrestricted (i.e., the spatial) of the diagonal elements of a matrix). Except
model, and that of the restricted model for the scaling factor in the denominator, this
(i.e., the standard regression without spatial statistic is essentially the square of Morans I.
autocorrelation). This thus requires the esti- It is asymptotically distributed as 2 (1).
mation of two models, and an assumption Using similar principles, the LM lag
of normality for the OLS regression. The statistic follows as:
statistic is asymptotically distributed as 2 (1)
(see Anselin, 1988b, Ch. 6).
The Lagrange multiplier (LM) test only LM = [eWy/(e e/N)]2 /D (14.19)
requires estimation of the model under the
null hypothesis of no spatial dependence. It
therefore lends itself well to specification with e as the OLS residuals, and the
searches in practice, since the extra step of denominator term:
estimating a spatial lag or spatial error model
can often be avoided. In the spatial case, 1
the LM statistic does not follow the standard D = [(WX) [I X(X X) X ](WX)/ 2 ]
result from econometrics, where in many + tr(WW + WW) (14.20)
instances it can be obtained as a measure
of fit in an auxiliary regression. Instead, it
needs to be derived explicitly, as in Burridge where the estimates for and 2 are from
(1980) and Anselin (1988a) (for extensive OLS. The test statistic is asymptotically
technical details, see also Anselin and Bera, distributed as 2 (1).
1998; Anselin, 2001a). A related test statistic, also based on the
Even though the LM statistic is constructed maximum likelihood principle, applies the
from the OLS residuals, a complete alter- idea of double length artificial regressions
native model must be specified. In some (DLR, Davidson and MacKinnon, 1984,
instances, two different alternatives yield the 1988) to tests for spatial error and spatial
same LM statistic. These are called locally lag dependence (Baltagi and Li, 2001a).
equivalent alternatives (Godfrey, 1981). SAR The DLR approach consists of expressing
and SMA error processes fall into this cate- the regression model as a function of
gory. As a result, a LM test statistic against standard normal error terms. In the spatial
spatial error autocorrelation cannot distin- models, this follows as a simple standard-
guish between these two different processes. ization (for technical details, see Baltagi and
In practice, this affects the interpretation of Li, 2001a).
the results, since SAR is a global spatial The LM principle can be applied to
process, while SMA is local. alternatives other than the SAR/SMA error
The LM error statistic is very similar to processes or the spatial lag model. Test
Morans I. As shown in Burridge and (1980) statistics can be derived for higher-order
264 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
processes (multiple orders of contiguity), and (1993), robust versions of these test statistics
for different error models, such as spatial have been developed in Anselin et al.
error components or direct representation (1996) (see also Anselin and Bera, 1998,
(Anselin, 2001a; Anselin and Moreno, 2003). pp. 273278).
So far, only a single alternative has been A second strategy is that of a joint test,
taken into account. However, in practice, where the null hypothesis is to set all spatial
it is often more reasonable to consider an parameters equal to zero. For example, for the
alternative hypothesis that contains both a spatial lag model with a SAR or SMA error
spatial lag and spatial error autocorrelation: term, H0 : = = 0. In contrast to standard
results in the econometric literature, the joint
test statistic is not simply the sum of the
y = Wy + X + (14.21) marginal test statistics, i.e., LM = LM +
LM , but it takes on a far more complex form
(Anselin, 1988a).
with: A third strategy is a so-called conditional
approach, where a test on the null hypothesis
= 0 is carried out in a model with = 0,
= W + u (14.22)
and vice versa. This can no longer be based
on OLS estimates, but requires estimation of
a SARSAR model, or, with: the proper spatial model by means of ML.
Using the same principles as before, but now
with the residuals of the ML estimation, a
= Wu + u (14.23) test statistic for H0 : = 0 in the spatial
lag model (i.e., with = 0) can be derived.
Similarly, a test statistic can be constructed
a SARMA model. for H0 : = 0 in the spatial error model
In this more general case, there are (i.e., with = 0). While straightforward, the
three ways to proceed. One is as before, derivations are quite tedious and the resulting
considering a one-directional alternative only test statistics complex (for technical details,
and ignoring the other form of spatial see Anselin, 1988a; Anselin et al., 1996;
autocorrelation. For example, the LM error Anselin and Bera, 1998).
test above has the null hypothesis H0 : = 0, The LM principle can also be extended
irrespective of the value of , which is to multiple sources of mis-specification, such
considered to be a nuisance parameter. This as spatial dependence and heteroskedasticity
is referred to as a marginal test. (Anselin, 1988b), or spatial dependence
A problem with the marginal approach is and functional mis-specification (Baltagi and
that the LM and LM test statistics are no Li, 2001b).
longer 2 (1) in the presence of local mis-
specification in the form of the other type
of spatial dependence, but they become non-
14.4.3. Specication search
central 2 . In other words, in the presence of
spatial lag dependence, the LM test against In practice, the sheer number of available
error correlation becomes biased, and, in the test statistics can seem overwhelming and a
presence of spatial error dependence, the strategy needs to be developed to move from
LM test against lag dependence becomes the null model to a superior alternative (when
biased. Using a result of Bera and Yoon appropriate). Given that tests may be based
SPATIAL REGRESSION 265
on marginal, joint, or conditional approaches, out to yield consistent estimates (Lee, 2002;
the results of a specification search may be Kelejian and Prucha, 2002).
subject to the order in which tests are carried Two general sets of methods have been
out, and whether or not adjustments are made developed to address the estimation of
for pre-testing (see, e.g., Florax and Folmer, spatial regression models, one based on the
1992; Anselin and Florax, 1995b; Florax and maximum likelihood (ML) principle, the
de Graaff, 2004). other on the (general) method of moments
Based on a large number of simulation (GMM). Each will be considered in turn,
results, an ad hoc decision rule was suggested followed by a brief overview of semi-
in Anselin and Rey (1991) for the simple case parametric methods.
of choosing between a spatial lag or spatial
(SAR) error alternative. There is considerable
evidence that the proper alternative is most
likely the one with the largest significant LM 14.5.1. Maximum likelihood
test statistic value. This was later refined in estimation
light of the robust forms of the statistics
The point of departure for maximum likeli-
in Anselin et al. (1996). In a recent paper
hood estimation in spatial regression models
by Florax et al. (2003), this classic forward
is an assumption of normality for the error
stepwise specification search is compared to
term. In general, allowing for heteroskedas-
a general-to-simple model selection rule
ticity and/or error correlation, the N 1 error
(for further discussion, see also Florax et al.,
vector has a multivariate normal distribution,
2006; Hendry, 2006).
N(0, ), with the subscript denoting
that may be a function of a p 1 vector
of parameters. In the commonly considered
i.i.d. case, this simplifies to N(0, 2 I),
14.5. ESTIMATION
with = 2 .
To move from the likelihood for the
The estimation problems associated with
error vector to a likelihood for the observed
spatial regression models are distinct for the
dependent variable, a Jacobian of the
spatial lag and spatial error case. Spatial error
transformation needs to be inserted, which
models are special instances of specifications
corresponds to the determinant |I W|
with a non-spherical error. As a result, OLS
in the spatial lag model, and |I W|
may still be applied, as long as the estimated
in the spatial error model. The presence
standard errors are adjusted to take into
of the Jacobian term constitutes a major
account the error correlation. In contrast,
computational complication.
the inclusion of a spatially lagged dependent
Using the standard result for a multivariate
variable in a regression specification yields
normal distribution, and taking into account
a form of endogeneity. As a result, for
the Jacobian term, the log-likelihood for the
most spatial weights used in practice, OLS
spatial lag model follows as:
in the spatial lag model is not an appro-
priate method, and the simultaneity must
be accounted for explicitly. An exception
to this general rule is when the weights L = (N/2)(ln2 )(1/2)ln| |
represent subgroups in the data (i.e., all the +ln|IW|(1/2)(yWyX)
observations in the same group are neighbors
of each other), in which case OLS turns 1 (yWyX). (14.24)
266 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
sets, the computation of these terms by brute moments). This approach does not require an
force is impractical. assumption of normality and it avoids some
An early solution was suggested by Ord of the computational problems associated
(1975), who exploited the decomposition of with ML for very large data sets.
the Jacobian in terms of the eigenvalues of The spatial lag model can be formu-
the spatial weights matrix. This facilitates lated as a linear model that contains an
computation greatly, since the eigenvalues endogenous variable (Wy) and exogenous
only need to be calculated once. The trace variables (X):
terms used in the information matrix can be
expressed in terms of the eigenvalues as well
(Anselin, 1980). y = Z + (14.28)
The computation of eigenvalues becomes
impractical and computationally unstable for
medium and large-sized data sets (n > 1000). with Z = [Wy, X] and = [, ]. A classic
This precludes the application of the Ord solution to the endogeneity problem is to use
approach. Several alternatives have been instrumental variables. A matrix of additional
suggested that either approximate or bound variables Q (N q) is used to obtain an
the Jacobian or log-Jacobian term (e.g., instrument for the spatially lagged dependent
Martin, 1993; Griffith and Sone, 1995; Barry variable:
and Pace, 1999; Pace and LeSage, 2002,
2004a), or exploit the sparse nature of spatial
& = Q(Q Q)1 Q Wy
Wy (14.29)
weights (Pace and Barry, 1997a, b; Smirnov
and Anselin, 2001).
A second important computational prob-
such that '
Z = [Wy,& X], resulting in the
lem pertains to the presence of terms like
spatial two-stage least squares estimator
tr[W(I W)1 ]2 in the information matrix.
(S2SLS):
The calculation of these inverse matrices is
impractical in large data settings. As a result,
most large data ML methods developed so far S2SLS = ['
Z'
Z]1'
Zy. (14.30)
have not based inference on the asymptotic
variance matrix, but instead use a sequence
of likelihood ratio tests. Recently, Smirnov Inference on the S2SLS is based on the
(2005) developed a solution to this problem, asymptotic variance matrix:
based on the use of a conjugate gradient
approach.
1
AsyVar[S2SLS ] = 2 [Z Q(Q Q) Q Z]1
(14.31)
14.6. INSTRUMENTAL
VARIABLES/METHOD OF with 2 = (y ZS2SLS ) (y ZS2SLS )/N.
MOMENTS ESTIMATION The application of instrumental variables
to the spatial lag model was initially outlined
An alternative to maximum likelihood esti- in Anselin (1980, 1988b, pp. 8286), where
mation is the use of the method of moments some ad hoc suggestions were made for the
(including instrumental variables, general- selection of the instruments (see also Land
ized method of moments, and generalized and Deane (1992) for an early discussion).
268 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
Specifically, the choice of a spatial lag of results also yield an asymptotic variance
the predicted values of the y (using only matrix, so that tests of significance can be
the exogenous variables) or of spatially carried out on the spatial parameters as well.
lagged exogenous variables was considered.
In Kelejian and Robinson (1993), proof is
provided of the consistency of S2SLS and
14.6.1. Semi-parametric methods
the selection of instruments is couched in
terms of the reduced form. This suggests Semi-parametric methods provide a compro-
the use of a subset of columns from mise between a full parametric specification
{X, WX, W2 X, W3 X, . . . } as the instruments and a non-parametric approach where the
(see also Kelejian and Prucha, 1998). parameters are completely determined by
Recent work has focused on the selection the data, with very little prior structure.
of optimal instruments (Lee, 2003; Das The combination of a full specification of
et al., 2003; Kelejian et al., 2004), and the parts where theory or previous results
on establishing formal proofs of consistency provide a strong support for the model and
and asymptotic normality. In Lee (2007), the relaxing the functional and distributional
S2SLS estimator is compared to a GMM assumptions for the rest has become very
method with superior asymptotic properties. attractive, especially when large data sets
Extensions of the instrumental variables provide ample information (for a recent
approach to systems of simultaneous equa- review, see Horowitz and Lee, 2002).
tions are considered in Rey and Boarnet While by far the predominant paradigm
(2004) and Kelejian and Prucha (2004). in spatial regression analysis is the para-
Moment methods have been developed to metric approach, the use of semi-parametric
address spatial error autocorrelation as well, techniques has seen a recent increase and
both in isolation as well as in combination is an area of very active research, both
with a spatial lag model (the SARSAR theoretical as well as applied. A semi-
model). The basic results were obtained parametric approach has seen application in
by Kelejian and Prucha (1998, 1999), who four main areas in spatial regression analysis.
initially treated the spatial autoregressive One is as an alternative to specifying
coefficient in the error SAR process as a a specific spatial process for the error
nuisance parameter. Specifically, attention term. Instead, the error covariance may
focused on obtaining a consistent estimate be estimated in a non-parametric fashion.
for the nuisance parameter as the solution of This follows along the lines of the work
a set of moment conditions. This consistent in econometrics by White (1980) on a
estimate could then be used in a second heteroskedastic-consistent approach, and its
step of a FGLS estimation. One drawback extension to both heteroskedasticity and
of the nuisance parameter approach is that serial correlation by Newey and West
no inference can be carried out on the (1987), and others. The incorporation of
spatial autoregressive parameter, since no spatial dependence in this framework was
result existed on its asymptotic variance. In first considered by Conley (1999) in the
recent work by Lin and Lee (2005) and context of GMM estimation, and recently
Kelejian and Prucha (2006), this problem has elaborated upon in Kelejian and Prucha
been alleviated, in the context of an extended (2007) (see also Chen and Conley (2001),
set of moment conditions that account for for a related approach). The basic idea is to
both spatial autoregressive errors as well as avoid specifying a particular spatial process
heteroskedasticity of unspecified form. Their or spatial weights matrix and to extract
SPATIAL REGRESSION 269
the spatial covariance terms from weighted A semi-parametric spatial error model is
averages of cross-products of residuals, considered as well, using residuals from a
using a kernel function. This yields a so- non-parametric regression of y on g(X), as
called heteroskedastic and spatial autocorre- a special application of local linear weighted
lation consistent (HAC) estimator. The HAC least squares (Henderson and Ullah, 2005).
approach is asymptotic and in finite samples A fourth approach is akin to spatial
a major practical problem is to ensure that filtering, and purports to model unspecified
the estimated variancecovariance matrix is spatial spillover effects non-parametrically,
positive semidefinite. A number of sugges- in a so-called smooth spatial effects (SSE)
tions have been formulated, but considerable estimator. In Gibbons and Machin (2003), the
research remains to be done to obtain insight model considered is:
into finite sample properties (see Kelejian
and Prucha (2007), for some technical
details). yi = xi + g(ci ) + i (14.34)
In a second approach, the focus is on
relaxing the requirements to specify a spatial
where g is an unknown function, intended
weights matrix W in the construction of
to capture all spatial correlation, and ci
the spatially lagged dependent variable in a
represents the location of i. The model is
spatial lag model. In Pinkse et al. (2002), a
estimated by means of the classic two-step
model is considered of the form:
procedure suggested by Robinson (1988).
In the SSE estimator, both the dependent
yi = g(dij )yj + xi + i (14.32) variable and the explanatory variables are
j=i replaced by deviations from the conditional
expectation, which is obtained as a spatial
kernel smoother. OLS can be applied to the
in which the unspecified function g relates
transformed regression to obtain consistent
the values of y at other locations j to that
estimates for , (for a recent application, see
at i through a distance measure dij . The
Day et al., 2004).
function g is approximated by a polynomial
series expansion in distance measures, the
coefficient of which are estimated jointly with
the other parameters in the model. 14.7. CONCLUSION
In a third approach, suggested in the work
of Gress (2004a) (see also Gress (2004b), and The methodological toolbox for spatial
Basile and Gress (2005), for applications), regression has reached a certain maturity
the spatial weights specification is kept in the when it comes to the classical linear
spatial lag part, but the other variables enter regression model. However, much less has
into the model in a non-parametric way. For been accomplished beyond this context and
example, a semi-parametric spatial lag model the development of new models, estimation
takes the form: techniques and specification tests is a very
active area of research, both in statistics
as well as in econometrics. Given space
y = Wy + g(X) + (14.33)
constraints, it was impossible to review all
these efforts in a comprehensive way, but it
where g is an unspecified function, to is hoped that through the references provided
be estimated in a non-parametric way. an entry into this field has been facilitated.
270 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
Considerable theoretical research is ongo- Anselin, L. (2002). Under the hood. Issues in the
ing to develop the formal conditions and specication and interpretation of spatial regression
proofs needed to obtain the asymptotic models. Agricultural Economics, 27(3): 247267.
properties of estimators and tests in various Anselin, L. (2003). Spatial externalities. International
settings. New techniques are being developed Regional Science Review, 26(2): 147152.
to deal with spatial effects in panel data, Anselin, L. (2005). Spatial statistical modeling in a
count models, probit and tobit, and other GIS environment. In: Maguire, D.J., Batty, M. and
specifications that are the mainstay of Goodchild, M.F. (eds), GIS, Spatial Analysis and
Modeling, pp. 93111. Redlands, CA: ESRI Press.
applied empirical regression analysis. The
growth in applications is encouraging as Anselin, L. (2006). Spatial econometrics. In Mills, T.
well, providing a greater empirical basis and Patterson, K. (eds), Palgrave Handbook of
Econometrics: Volume 1, Econometric Theory,
to document the importance of location
pp. 901969. Basingstoke: Palgrave Macmillan.
and distance in explaining socioeconomic
phenomena. Lastly, while in the past the lack Anselin, L. and Bera, A. (1998). Spatial dependence
in linear regression models with an introduction
of software may have been an impediment
to spatial econometrics. In: Ullah, A. and Giles,
to the dissemination of spatial regression D.E. (eds), Handbook of Applied Economic Statistics,
methods, this is no longer the case, as attested pp. 237289. New York: Marcel Dekker.
by several active open source developments
Anselin, L., Bera, A., Florax, R.J. and Yoon, M.
(for a recent review, see Anselin, 2005, (1996). Simple diagnostic tests for spatial depen-
pp. 101106). dence. Regional Science and Urban Economics,
26: 77104.
Anselin, L. and Florax, R.J. (1995a). New Directions in
Spatial Econometrics. Berlin: Springer-Verlag.
Anselin, L. and Florax, R.J. (1995b). Small sample
REFERENCES properties of tests for spatial dependence in
regression models: Some further results. In
Andrews, D.W. (2005). Cross-section regression with Anselin, L. and Florax, R.J. (eds), New Directions
common shocks. Econometrica, 73: 15511585. in Spatial Econometrics, pp. 2174. Berlin: Springer-
Anselin, L. (1980). Estimation Methods for Spatial Verlag.
Autoregressive Structures. Regional Science Disser-
Anselin, L., Florax, R.J. and Rey, S.J. (2004). Advances
tation and Monograph Series, Cornell University,
in Spatial Econometrics. Methodology, Tools and
Ithaca, New York.
Applications. Berlin: Springer-Verlag.
Anselin, L. (1988a). Lagrange multiplier test diagnostics
Anselin, L. and Grifth, D.A. (1988). Do spatial effects
for spatial dependence and spatial heterogeneity.
really matter in regression analysis? Papers, Regional
Geographical Analysis, 20: 117.
Science Association, 65: 1134.
Anselin, L. (1988b). Spatial Econometrics: Methods
Anselin, L. and Kelejian, H.H. (1997). Testing for spatial
and Models. Dordrecht, The Netherlands: Kluwer
error autocorrelation in the presence of endogenous
Academic Publishers.
regressors. International Regional Science Review,
Anselin, L. (1992). Space and applied econometrics. 20: 153182.
Introduction. Regional Science and Urban Eco-
Anselin, L., Le Gallo, J. and Jayet, H. (2008). Spatial
nomics, 22: 307316.
panel econometrics. In: Matyas, L. and Sevestre, P.
Anselin, L. (2001a). Raos score test in spatial (eds), The Econometrics of Panel Data, Fundamentals
econometrics. Journal of Statistical Planning and and Recent Developments in Theory and Practice
Inference, 97: 113139. (3rd Edition), pp. 627662. Berlin: Springer-Verlag.
Anselin, L. (2001b). Spatial econometrics. In: Baltagi, B. Anselin, L. and Moreno, R. (2003). Properties of tests
(ed.), A Companion to Theoretical Econometrics, for spatial error components. Regional Science and
pp. 310330. Oxford: Blackwell. Urban Economics, 33(5): 595618.
SPATIAL REGRESSION 271
Anselin, L. and Rey, S.J. (1991). Properties of tests Burridge, P. (1980). On the CliffOrd test for spatial
for spatial dependence in linear regression models. autocorrelation. Journal of the Royal Statistical
Geographical Analysis, 23: 112131. Society B, 42: 107108.
Anselin, L. and Rey, S.J. (1997). Introduction to the Chen, X. and Conley, T.G. (2001). A new semipara-
special issue on spatial econometrics. International metric spatial model for panel time series. Journal of
Regional Science Review, 20: 17. Econometrics, 105: 5983.
Arbia, G. (2006). Spatial Econometrics: Statistical Foun- Cliff, A. and Ord, J.K. (1972). Testing for spa-
dations and Applications to Regional Convergence. tial autocorrelation among regression residuals.
Berlin: Springer-Verlag. Geographical Analysis, 4: 267284.
Baltagi, B.H. and Li, D. (2001a). Double length Cliff, A. and Ord, J.K. (1973). Spatial Autocorrelation.
articial regressions for testing spatial dependence. London: Pion.
Econometric Reviews, 20(1): 3140.
Cliff, A. and Ord, J.K. (1981). Spatial Processes: Models
Baltagi, B.H. and Li, D. (2001b). LM tests for and Applications. London: Pion.
functional form and spatial error correlation.
Conley, T.G. (1999). GMM estimation with cross-
International Regional Science Review, 24(2):
sectional dependence. Journal of Econometrics,
194225.
92: 145.
Banerjee, S., Carlin, B.P. and Gelfand, A.E. (2004). Cook, D. and Pocock, S. (1983). Multiple regression
Hierarchical Modeling and Analysis for Spatial Data. in geographic mortality studies, with allowance for
Boca Raton: Chapman & Hall/CRC. spatially correlated errors. Biometrics, 39: 361371.
Barry, R.P. and Pace, R.K. (1999). Monte Carlo Cressie, N. (1993). Statistics for Spatial Data. New York:
estimates of the log determinant of large sparse Wiley.
matrices. Linear Algebra and its Applications, 289:
4154. Das, D., Kelejian, H.H. and Prucha, I.R. (2003). Finite
sample properties of estimators of spatial autore-
Bartels, C. and Ketellapper, R. (1979). Exploratory gressive models with autoregressive disturbances.
and Explanatory Analysis of Spatial Data. Boston: Papers in Regional Science, 82: 127.
Martinus Nijhoff.
Davidson, R. and MacKinnon, J.G. (1984). Model
Basile, R. and Gress, B. (2005). Semi-parametric specication tests based on articial regressions.
spatial auto-covariance models of regional International Economic Review, 25: 485502.
growth in Europe. Rgion et Dveloppement, 21:
93118. Davidson, R. and MacKinnon, J.G. (1988). Double-
length articial regression. Oxford Bulletin of
Bera, A. and Yoon, M.J. (1993). Specication testing Economics and Statistics, 50: 203217.
with misspecied alternatives. Econometric Theory,
9: 649658. Day, B., Bateman, I. and Lake, I. (2004). Omitted loca-
tional variates in hedonic analysis: A semiparametric
Beron, K.J., Murdoch, J.C., and Vijverberg, W.P. (2003). approach using spatial statistics. Working Paper
Why cooperate? Public goods, economic power, and 0404, Center for Social and Economic Research on
the Montreal Protocol. The Review of Economics and the Global Environment (CSERGE), University of East
Statistics, 85(2): 286297. Anglia, UK.
Besag, J. (1974). Spatial interaction and the statistical Dubin, R. (1988). Estimation of regression coefcients
analysis of lattice systems. Journal of the Royal in the presence of spatially autocorrelated errors.
Statistical Society B, 36: 192225. Review of Economics and Statistics, 70: 466474.
Brock, W. and Durlauf, S. (2001). Discrete choice with Dubin, R., Pace, R.K. and Thibodeau, T.G. (1999).
social interactions. Review of Economic Studies, 59: Spatial autoregression techniques for real estate
235260. data. Journal of Real Estate Literature, 7: 7995.
Brueckner, J.K. (2003). Strategic interaction among Durlauf, S.N. (2004). Neighborhood effects. In:
governments: An overview of empirical studies. Henderson, J. and Thisse, J.-F. (eds), Handbook
International Regional Science Review, 26(2): of Regional and Urban Economics, Volume 4,
175188. pp. 21732242. Amsterdam: North Holland.
272 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
Elhorst, J.P. (2001). Dynamic models in space and time. Gibbons, S. and Machin, S. (2003). Valuing English
Geographical Analysis, 33: 119140. primary schools. Journal of Urban Economics, 53:
197219.
Elhorst, J.P. (2003). Specication and estimation of
spatial panel data models. International Regional Godfrey, L. (1981). On the invariance of the Lagrange
Science Review, 26(3): 244268. Multiplier test with respect to certain changes
in the alternative hypothesis. Econometrica, 49:
Ellner, S.P. and Seifu, Y. (2002). Using spatial statistics 14431455.
to select model complexity. Journal of Computational
and Graphical Statistics, 11: 348369. Goodchild, M.F., Anselin, L., Appelbaum, R. and
Harthorn, B. (2000). Toward spatially integrated
Fischer, M.M., Reismann, M. and Scherngell, T. (2006). social science. International Regional Science
From conventional to spatial econometric models Review, 23(2): 139159.
of spatial interaction. Paper presented at the Fifth
International Workshop on Spatial Econometrics and Gotway, C.A. and Stroup, W.W. (1997). A generalized
Statistics, Rome, Italy, May 2006. linear model approach to spatial data analysis and
prediction. Journal of Agricultural, Biological and
Fleming, M. (2004). Techniques for estimating spatially Environmental Statistics, 2(2): 157178.
dependent discrete choice models. In: Anselin, L.,
Florax, R.J. and Rey, S.J. (eds), Advances in Spatial Gotway, C.A. and Wolnger, R.D. (2003). Spatial
Econometrics, pp. 145168. Heidelberg: Springer- prediction of counts and rates. Statistics in Medicine,
Verlag. 22: 14151432.
Gress, B. (2004a). Semi-Parametric Spatial Autocovari-
Florax, R. and Folmer, H. (1992). Specication and
ance Models. PhD thesis, University of California,
estimation of spatial linear regression models: Monte
Riverside, CA.
Carlo evaluation of pre-test estimators. Regional
Science and Urban Economics, 22: 405432. Gress, B. (2004b). Using semi-parametric spatial
autocorrelation models to improve hedonic housing
Florax, R.J. and de Graaff, T. (2004). The performance
price prediction. Working paper, Department of
of diagnostic tests for spatial dependence in linear
Economics, University of California, Riverside, CA.
regression models: A meta-analysis of simulation
studies. In Anselin, L., Florax, R.J. and Rey, S.J. (eds), Grifth, D.A. (1988). Advanced Spatial Statistics.
Advances in Spatial Econometrics. Methodology, Dordrecht: Kluwer Academic.
Tools and Applications, pp. 2965. Berlin: Springer- Grifth, D.A. and Sone, A. (1995). Trade-offs
Verlag. associated with normalizing constant computational
Florax, R.J., Folmer, H. and Rey, S.J. (2003). Specica- simplications for estimating spatial statistical models.
tion searches in spatial econometrics: The relevance Journal of Statistical Computation and Simulation,
of Hendrys methodology. Regional Science and 51: 165183.
Urban Economics, 33(5): 557579. Haining, R. (1990). Spatial Data Analysis in the Social
Florax, R.J., Folmer, H. and Rey, S.J. (2006). A comment and Environmental Sciences. Cambridge: Cambridge
on specication searches in spatial econometrics: University Press.
The relevance of Hendrys methodology: A reply. Haining, R. (2003). Spatial Data Analysis: Theory and
Regional Science and Urban Economics, 36: Practice. Cambridge: Cambridge University Press.
300308.
Hall, P. and Patil, P. (1994). Properties of nonparametric
Florax, R.J.G.M. and van der Vlist, A. (2003). estimators of autocovariance for stationary random
Spatial econometric data analysis: moving beyond elds. Probability Theory and Related Fields, 99:
traditional models. International Regional Science 399424.
Review, 26(3): 223243.
Henderson, D.J. and Ullah, A. (2005). A nonparametric
Fortin, M.-J. and Dale, M. (2005). Spatial Analysis: random effects estimator. Economics Letters, 88:
A Guide for Ecologists. Cambridge: Cambridge 403407.
University Press.
Hendry, D.F. (2006). A comment on specication
Getis, A., Mur, J. and Zoller, H.G. (2004). Spatial searches in spatial econometrics: The relevance of
Econometrics and Spatial Statistics. London: Palgrave Hendrys methodology. Regional Science and Urban
Macmillan. Economics, 36: 309312.
SPATIAL REGRESSION 273
Horowitz, J.L. and Lee, S. (2002). Semiparametric Kelejian, H.H. and Robinson, D.P. (1993). A suggested
methods in applied econometrics: Do the models t method of estimation for spatial interdependent
the data? Statistical Modelling, 2: 322. models with autocorrelated errors, and an appli-
cation to a county expenditure model. Papers in
Kelejian, H.H. and Prucha, I. (1998). A generalized
Regional Science, 72: 297312.
spatial two stage least squares procedures for
estimating a spatial autoregressive model with Kelejian, H.H. and Robinson, D.P. (1995). Spatial corre-
autoregressive disturbances. Journal of Real Estate lation: A suggested alternative to the autoregressive
Finance and Economics, 17: 99121. model. In: Anselin, L. and Florax, R.J. (eds),
New Directions in Spatial Econometrics, pp. 7595.
Kelejian, H.H. and Prucha, I. (1999). A generalized
Berlin: Springer-Verlag.
moments estimator for the autoregressive parameter
in a spatial model. International Economic Review, Kelejian, H.H. and Robinson, D.P. (1998). A sug-
40: 509533. gested test for spatial autocorrelation and/or
heteroskedasticity and corresponding Monte Carlo
Kelejian, H.H. and Prucha, I. (2001). On the asymptotic results. Regional Science and Urban Economics, 28:
distribution of the Moran I test statistic with 389417.
applications. Journal of Econometrics, 104(2):
219257. Kelejian, H.H. and Robinson, D.P. (2004). The inuence
of spatially correlated heteroskedacity on tests for
Kelejian, H.H. and Prucha, I.R. (2002). 2SLS and OLS spatial correlation. In: Anselin, L. and Florax, R.J.
in a spatial autoregressive model with equal spatial (eds), Advances in Spatial Econometrics, pages
weights. Regional Science and Urban Economics, 7997. Heidelberg: Springer-Verlag.
32(6): 691707.
King, M. (1981). A small sample property of the
Kelejian, H.H. and Prucha, I.R. (2004). Estimation of CliffOrd test for spatial correlation. Journal of the
simultaneous systems of spatially interrelated cross Royal Statistical Association B, 43: 264.
sectional equations. Journal of Econometrics, 118:
2750. Land, K. and Deane, G. (1992). On the large-
sample estimation of regression models with
Kelejian, H.H. and Prucha, I.R. (2005). HAC estimation spatial or network-effect terms: A two stage least
in a spatial framework. Working paper, Department squares approach. In: Marsden, P. (ed.), Socio-
of Economics, University of Maryland, College logical Methodology, pp. 221248. San Francisco:
Park, MD. Jossey-Bass.
Kelejian, H.H. and Prucha, I.R. (2007). HAC estimation Lee, L.-F. (2002). Consistency and efciency of least
in a spatial framework. Journal of Econometrics, squares estimation for mixed regressive, spatial
140: 131154. autoregressive models. Econometric Theory, 18(2):
Kelejian, H.H. and Prucha, I.R. (2006). Specication 252277.
and estimation of spatial autoregressive models with Lee, L.-F. (2003). Best spatial two-stage least squares
autoregressive and heteroskedastic disturbances. estimators for a spatial autoregressive model with
Working paper, Department of Economics, University autoregressive disturbances. Econometric Reviews,
of Maryland, College Park, MD. 22: 307335.
Kelejian, H.H., Prucha, I.R. and Yuzefovich, Y. Lee, L.-F. (2004). Asymptotic distributions of quasi-
(2004). Instrumental variable estimation of a maximum likelihood estimators for spatial autore-
spatial autoregressive model with autoregressive gressive models. Econometrica, 72: 18991925.
disturbances: Large and small sample results.
Lee, L.-F. (2006). GMM and 2SLS estimation of mixed
In: LeSage, J.P. and Pace, R.K. (eds), Advances
regressive, spatial autoregressive models. Journal of
in Econometrics: Spatial and Spatiotemporal
Econometrics. Forthcoming.
Econometrics, pp. 163198. Oxford, UK: Elsevier
Science Ltd. Lee, L.-F. (2007). GMM and 2SLS estimation of mixed
regressive, spatial autoregressive models. Journal of
Kelejian, H.H. and Robinson, D.P. (1992). Spatial
Econometrics, 137: 489514.
autocorrelation: A new computationally simple
test with an application to per capita county Leenders, R.T.A.J. (2002). Modeling social inuence
police expenditures. Regional Science and Urban through network autocorrelation: Constructing the
Economics, 22: 317333. weights matrix. Social Networks, 24: 2147.
274 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
LeSage, J.P. (2000). Bayesian estimation of limited Pace, R.K. and Barry, R. (1997b). Sparse spatial
dependent variable spatial autoregressive models. autoregressions. Statistics and Probability Letters,
Geographical Analysis, 32: 1935. 33: 291297.
LeSage, J.P. and Pace, R.K. (2004). Advances in Econo- Pace, R.K., Barry, R. and Sirmans, C. (1998). Spatial
metrics: Spatial and Spatiotemporal Econometrics. statistics and real estate. Journal of Real Estate
Oxford, UK: Elsevier Science Ltd. Finance and Economics, 17: 513.
LeSage, J.P. and Pace, R.K. (2005). Spatial econo- Pace, R.K. and LeSage, J.P. (2002). Semiparametric
metric modeling of origin-destination ows. Paper maximum likelihood estimates of spatial depen-
Presented at the 52nd North American Meeting dence. Geographical Analysis, 34: 7690.
for the Regional Science Association International,
Las Vegas, NV, Nov. 2005. Pace, R.K. and LeSage, J.P. (2004a). Chebyshev
approximation of log-determinants of spatial weights
LeSage, J.P., Pace, R.K. and Tiefelsdorf, M. (2004). matrices. Computational Statistics and Data Analysis,
Methodological developments in spatial economet- 45: 179196.
rics and statistics. Geographical Analysis, 36: 8789.
Pace, R.K. and LeSage, J.P. (2004b). Spatial statistics
Lin, X. and Lee, L.-F. (2005). GMM estimation
and real estate. Journal of Real Estate Finance and
of spatial autoregressive models with unknown
Economics, 29: 147148.
heteroskedasticity. Working paper, The Ohio State
University, Columbus, OH. Paelinck, J. and Klaassen, L. (1979). Spatial
Econometrics. Farnborough: Saxon House.
Magnus, J. (1978). Maximum likelihood estimation
of the GLS model with unknown parameters Pesaran, M.H. (2005). Estimation and inference in
in the disturbance covariance matrix. Journal of large heterogenous panels with cross section
Econometrics, 7: 281312. Corrigenda, Journal of dependence. Working paper, Faculty of Economics
Econometrics 10: 261. and Politics, University of Cambridge, Cambridge,
Mardia, K. and Marshall, R. (1984). Maximum likeli- United Kingdom.
hood estimation of models for residual covariance in Pinkse, J. and Slade, M.E. (1998). Contracting in space:
spatial regression. Biometrika, 71: 135146. An application of spatial statistics to discrete-choice
Martin, R. (1993). Approximations to the determinant models. Journal of Econometrics, 85: 125154.
term in Gaussian maximum likelihood estimation of
Pinkse, J., Slade, M.E. and Brett, C. (2002). Spatial
some spatial models. Communications in Statistics:
price competition: A semiparametric approach.
Theory and Methods, 22: 189205.
Econometrica, 70(3): 11111153.
Moran, P.A. (1948). The interpretation of statistical
Rey, S.J. and Boarnet, M.G. (2004). A taxonomy
maps. Biometrika, 35: 255260.
of spatial econometric models for simultaneous
Moran, P.A. (1950). A test for the serial dependence of equations systems. In Anselin, L., Florax, R.J. and
residuals. Biometrika, 37: 178181. Rey, S.J. (eds), Advances in Spatial Econometrics.
pp. 99119, Heidelberg: Springer-Verlag.
Nelson, G.C. (2002). Introduction to the special issue
on spatial analysis. Agricultural Economics, 27(3): Ripley, B.D. (1981). Spatial Statistics. New York: Wiley.
197200.
Robinson, P.M. (1988). Root-n-consistent semipara-
Newey, W.K. and West, K.D. (1987). A simple, positive metric regression. Econometrica, 56: 931954.
semi-denite, heteroskedasticity and autocorrelation
consistent covariance matrix. Econometrica, 55: Schabenberger, O. and Gotway, C.A. (2005). Statistical
703708. Methods for Spatial Data Analysis. Boca Raton, FL:
Chapman & Hall/CRC.
Ord, J.K. (1975). Estimation methods for models of
spatial interaction. Journal of the American Statistical Schmoyer, R. (1994). Permutation tests for correlation
Association, 70: 120126. in regression errors. Journal of the American
Statistical Association, 89: 15071516.
Pace, R.K. and Barry, R. (1997a). Quick computation
of spatial autoregressive estimators. Geographical Smirnov, O. (2005). Computation of the information
Analysis, 29: 232246. matrix for models with spatial interaction on a lattice.
SPATIAL REGRESSION 275
Journal of Computational and Graphical Statistics, Upton, G.J. and Fingleton, B. (1985). Spatial Data
14: 910927. Analysis by Example. Volume 1: Point Pattern and
Quantitative Data. New York: Wiley.
Smirnov, O. and Anselin, L. (2001). Fast maximum
likelihood estimation of very large spatial autoregres- Waller, L.A. and Gotway, C.A. (2004). Applied Spatial
sive models: A characteristic polynomial approach. Statistics for Public Health Data. Hoboken, NJ:
Computational Statistics and Data Analysis, 35: John Wiley.
301319. White, H. (1980). A heteroskedastic-consistent covari-
Tiefelsdorf, M. (2002). The saddlepoint approximation ance matrix estimator and a direct test for
of Morans I and local Morans Ii s reference distri- heteroskedasticity. Econometrica, 48: 817838.
bution and their numerical evaluation. Geographical Whittle, P. (1954). On stationary processes in the plane.
Analysis, 34: 187206. Biometrika, 41: 434449.
Tiefelsdorf, M. and Boots, B. (1995). The exact Zhang, H. (2002). On estimation and prediction for
distribution of Morans I. Environment and Planning spatial generalized linear mixed models. Biometrics,
A, 27: 985999. 56: 129136.
15
Spatial Microsimulation
D. Ballas and G.P. Clarke
more aggregate spatial scale). For area- Orcutt et al. (1961) who argued for a
based policy evaluation such models allow new type of socio-economic system and
differential impacts between and within areas described a simple model of demographic
to be analysed more effectively. The necessity transitions based on micro-analytical simula-
of predicting the impacts of social and area- tion. In particular, microsimulation methods
based policies at the local or micro-level aim to examine changes in the characteristics
has also been emphasized by Openshaw or lifestyles of individuals within households
(1995, p.60). Governments need to predict and to analyse the impact of government
the outcomes of their actions and produce policy changes on these individuals or
forecasts at the local level. households. Microsimulation models can be
For these reasons Wilson (2000, p. 98) distinguished between two main types. First,
identified microsimulation as one the most there are static models that are based on
important methods in regional science mod- simple snapshots of the current circumstances
elling: Simulation is a critical concept in the of a sample of the population at any one time.
future development of modelling because it Second, there are dynamic models that vary
provides a way of handling complexity that or age the attributes of each micro-unit in a
cannot be handled analytically. Microsimula- sample to build up a synthetic longitudinal
tion is a valuable example of a technique that database forecasting the sample members
may have increasing prominence in future lifestyles into the future.
research. The first geographical application
This chapter reviews the history of spatial of microsimulation was developed by
microsimulation and spells out a research Hgerstrand (1967) who employed micro-
agenda for the further exploitation of the analytical techniques for the study of spatial
technique. First, the semantics of microsimu- diffusion of innovation. Nevertheless,
lation are revisited and we describe the differ- it can be argued that the basis for
ent types of microsimulation models and how spatial microsimulation of households
they can be formulated (section 15.2). We and individuals was founded in the 1970s. In
then provide a brief overview of applications particular, Wilson and Pownall (1976) were
of microsimulation models which includes among the first to address the aggregation
use in economics, social policy, geography difficulties that were associated with
and regional science (section 15.3). Then, traditional comprehensive spatial models of
we spell out a research agenda for spatial urban systems. They suggested a new spatial
microsimulation (section 15.4) and offer modelling framework for representing the
some concluding comments in section 15.5. urban system based on the micro-level
interdependence of household and individual
characteristics. Further, they concentrated
on the spatial distribution of population
15.2. WHAT IS SPATIAL and its activities and suggested that persons
MICROSIMULATION? and their associated attributes should be
defined separately in the form of lists, rather
Microsimulation can generally be defined as than represented in the form of matrices. In
a methodology that is concerned with the this manner, there is no loss of information
creation of large-scale population microdata and the storage is computationally efficient.
sets to enable the analysis of policy impacts In their representational framework, they
at the micro-level. The approach dates were interested in estimating all the
back to the work of Orcutt (1957) and characteristics of the individuals that
SPATIAL MICROSIMULATION 279
comprise the urban population. Formally, demonstrate how they applied the method
they defined variables for each person in the to estimate joint probability distributions
system separately by adding a person label of household attributes. The IPF procedure
r to each person attribute x1 , x2 , x3 , . . . , xn . adopts a synthetic reconstruction method
The person attributes would therefore which calculates conditional probabilities
become x1r , x2r , x3r , . . ., xnr for the rth person of of having particular attributes and it then
the population. This means that if there are M assigns these attributes on the basis of
people in the population, there will be N M random sampling procedures (Monte Carlo
variables in total. After suggesting the above simulation). Table 15.1 depicts the steps
framework for representing individuals, that need to be followed in the procedure
Wilson and Pownall (1976) proposed a for allocating economic activity status for
modelling procedure to estimate each example.
characteristic for each person in turn. They More recently researchers have argued
formally expressed this procedure as follows: that reweighting existing survey data can
produce more robust results than these
synthetic probabilistic reconstruction models,
xjr = (xjr (Pj (x/. . .)Rjr , )
which involve the use of random sam-
pling (Williamson et al., 1998; Huang and
where Pj (x/. . .) is the probability of xj taking Williamson, 2001; Ballas et al., 2005). Two
the value x conditional on variables yet to be well-used reweighting procedures are:
specified, Rjr is a random number selected for
person r and characteristic j, and represents
a relevant constraint set (Wilson and Pownall, Reweighting probabilistic approaches, which
1976). One of the most significant properties typically reweight an existing national microdata
of the above model is its causal structure, set to t a geographical area description on
the basis of random sampling and optimization
which is largely reflected in the order in
techniques
which the characteristics are estimated for
each person. Reweighting deterministic approaches, which
Almost a decade later, Birkin and Clarke reweight a non geographical population micro-
(1988) built a synthetic spatial information data set to t small area descriptions, but without
system for urban and regional analysis. the use of random sampling procedures
It can be argued that this model is the
first comprehensive spatial microsimulation
model in the UK. Birkin and Clarke (1988) These new methods involve the reweight-
discussed the difficulties of performing ing of an existing microdata sample (which
micro-level spatial analysis using the existing is usually only available at coarse levels of
published data sources and they proposed a geography), so that it would fit small area
methodology for generating synthetic micro- population statistics tables. For instance, an
data from a number of different aggregate existing microdata set such as the British
sources. This microsimulation methodology Household Panel Survey (BHPS) described
was underpinned by a technique known as in Table 15.2 can be reweighted to populate
iterative proportional fitting (IPF) (see Birkin small areas.
and Clarke (1988) and Ballas (2001) for a The BHPS provides a detailed record for
more detailed discussion of this technique). a sample of households and all of their
Birkin and Clarke (1988) briefly discuss occupants (Taylor et al. 2001). Reweighting
the theoretical properties of IPF and they methods aim to sample from all the microdata
280 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
Table 15.1 Microsimulation procedure for the allocation of economic activity status (after
the similar example of tenure allocation procedure given by Clarke, 1996: 3)
Head of household (hh)
records to find the set of household records a data-fitting exercise. However once built
that best matches the population described the model can be used for static what-if
in the Small Area Statistics or Census Area simulations, in which the impacts of alter-
Statistics tables for the small area under native policy scenarios on the population
study. First, a series of small area tables are estimated: for instance if there had been
(e.g., from the Census or other sources) no poll tax in 1991 which communities
that describe the small area of interest must would have benefited most and which
be selected. For example, a reweighting would have had to have paid more tax
method would sample from the BHPS to find in other forms? Second it can be used
a suitable combination of households that for dynamic modelling, to update a basic
would fit the data described in Table 15.3. micro-dataset and future-oriented what-if
This first stage of population estima- simulations: for instance if the current
tion at the household level is primarily government had raised income taxes in
SPATIAL MICROSIMULATION 281
Transportation (16%)
Economics (41%)
Other (2%)
Medicine (25%)
Geography (3%)
variety of social sciences. Figure 15.1 shows 15.3.2. Tax and income modelling
the results of a basic keyword search in
the Sciencedirect academic journal database, A large number of papers in economics on
searching the word microsimulation in the microsimulation relate to work on household
titles or abstracts of papers in the last finance. Indeed, amongst the first applied
30 years. As can be seen, the majority of microsimulation models was TAX, devel-
the papers were in economics (41%) with oped at the US Treasury department in the
very few papers in geography (3%), although 1960s (Nelissen, 1993). Since then there
spatial applications may also lie in fields have been many models built to examine the
such as population, transport and health. impacts on individual households of various
There is also a relatively high number of tax or welfare changes (Bekkering, 1995;
microsimulation applications in medicine. Falkingham and Lessof, 1992; Falkingham
However these are applications of a different and Hills, 1995a, b; Glennerster et al., 1995;
nature, as their main focus is the effectiveness Propper, 1995).
of medicines (e.g., simulating the impact of The first task in the modelling of
medicines on human well-being, etc.) household income is to link households
The rest of this section explores some with job type. Birkin and Clarke (1989)
well-known examples of microsimulation for used the SYNTHESIS model to generate
certain types of policy work. This includes incomes for individuals. They used an IPF
static models (simply run for one period based microsimulation approach to estimate
of time) and dynamic models (where the earned income at ward level for the Leeds
attributes of the population are updated Metropolitan District by assigning each
constantly or over yearly totals). household a job and an occupation using
SPATIAL MICROSIMULATION 283
information from the New Earnings Survey is built, dynamic microsimulation procedures
to allocate an income variable accordingly. can be introduced in order to update these
In addition, they estimated income from databases. Amongst the first applied dynamic
transfer payments such as the Family Income microsimulation models was DYNASIM
Supplement for each household. This was (DYNAmic Simulation of Income Model;
probably the first successful attempt to see Orcutt et al., 1961; Wertheimer et al.,
generate income at the small area level in 1986), which was the base for later, more
the literature. Ballas and Clarke (2001a) sophisticated, models such as CORSIM.
extended this work by increasing the number One of the descendants of DYNASIM
of transfer or welfare payments included was DYNASIM2, which was developed
in the model (such as detailed work on and maintained at the Urban Institute in
child benefits) and also including household Washington D.C. (Wertheimer et al., 1986).
taxation levels. Williamson and Voas (2000) DYNASIM2 comprised two sub-models: a
report ongoing research to provide more Family and Earnings History (FEH) model
robust and reliable estimates of income at and a Jobs and Benefit History (JBH) model
the small area level. They argue that income (Wertheimer et al., 1986).
estimation at the small area level may be Work on income and taxation can be
seen as a multilevel analysis problem where more focused onto particular problems.
variables at individual and area levels may Currently in many Western countries there
interact. is a problem relating to pensions given
Work in the US has tended to extend that an ageing population will need more
such work to include not only house- financial support from a declining workforce
hold income but also household wealth. population. Notable here is the work of
In particular, Caldwell and Keister (1996) Hancock et al. (1992), who built PENSIM.
present CORSIM, which is a dynamic This is a microsimulation model designed
microsimulation model that has been under for the simulation of pensioners incomes
development at Cornell University since up to the year 2030. Hancock et al.
1986. CORSIM has been used to model (1992) point out that the simulation of
wealth distribution in the United States pensions is another good example of the
over the historical period 19601995 and to application of dynamic microsimulation tech-
forecast wealth distribution over the future niques, given that pension rights accumulate
(Caldwell and Keister, 1996). It is noteworthy over a long period of time and their
that over 17 different national microdata files estimation requires the processing of data
have been used to build the model, which pertaining to individuals entire working
incorporated 50 economic, demographic and lives. PENSIM aims at predicting aggregate
social processes by means of approximately income by source within certain subsets
900 stochastic equations and rule-based of the pensioner population under different
algorithms (Ibid.). Furthermore, Caldwell alternative assumptions. These assumptions
et al. (1998) review the geography of wealth pertain to the rules controlling the treatment
in the USA and show how CORSIM has of pensioners by the social security system,
included many variables relating to assets pension entitlement regulations, projected
and debts. demographic movements and movements in
As mentioned in the previous section, aggregate economic variables such as unem-
microsimulation models can be even more ployment and inflation. Davies and Joshi
powerful when they become dynamic. In (1992) also focused on modelling pensions.
particular, once a microsimulation database In particular, they employed microsimulation
284 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
of spatial mobility of households and firms policies. In order to model the environmental
in three different time sets (daily commuting, impacts there is a need for small-area
relocation, and lifetime mobility). The prob- forecasts of emissions from stationary and
lem associated with this type of modelling is mobile sources as well as of emissions
the order in which processes are modelled. in terms of the affected population. After
It could be argued for example, that labour outlining the main characteristics of a micro-
force participation is dependent on family analytic theory of urban change, Wegener
status and attributes or that the family and Spiekermann (1996) report on modelling
formation procedure is dependent on the efforts carried out at the University of
labour market situation of each individual. As Dortmund to integrate microsimulation into
Falkingham and Lessof (1992) put it: a comprehensive urban land-use transport
model (see also Veldhuisen et al. (2000).
. . . while a womans labour force status can The links between households, housing
depend on the number of children she has and
on her marital status, it cannot also inuence
markets and labour markets have been
the probability of the woman having a child in explored more recently in Ballas et al.
any year. The ordering of the modules necessarily (2005).
involves making assumptions about the direction
of causality in relationships between variables.
(Falkingham and Lessof, 1992: 9)
Retail models
Their LIFEMOD model is based on Traditional spatial interaction or discrete
the assumption that demographic variables choice models have been used to estimate
determine labour-force participation and expenditure flows from households to each
that labour-force participation influences store. It is argued by Nakaya et al. (2005) that
health, although it is pointed out that it is possible to improve the applicability of
evidence suggests causality in either direction the retail interaction model, not by increasing
(Falkingham and Lessof, 1992). the complexity of the model formulation,
but by integrating the interaction modelling
framework with spatial microsimulation. To
Transport and land-use models attain a high level of predictive accuracy,
Wegener and Spiekermann (1996) explore models of retail interaction usually require
the potential of microsimulation for urban a high degree of disaggregation (Birkin
models, focusing on land-use and travel et al., 2002). Even if a survey of consumer
models. They argue that a new generation behaviour is conducted by randomly dis-
of travel models has emerged which requires tributing a questionnaire to local residents,
more detailed information on household response rates would vary by consumer type
demographics and employment character- and place of residence based on peoples
istics at the small area level. They also different levels of interest and tolerance of
point out that there are new neighbourhood- such a survey. Consequently, survey data
scale transport policies aimed at promoting of this type often contain bias in the type
public transport, walking and cycling. These of consumer behaviour measured, swayed
policies require detailed information on the towards the behaviour of individuals who
precise location of the population and its least object to completing surveys. This prob-
activities. Wegener and Spiekermann (1996) lem of missing data tends to get worse as the
also stress the need for urban models to spatial units used in the analysis get smaller.
predict not only the economic but also the A solution to this problem is to generate data
environmental impacts of land-use transport through spatial microsimulation which can be
286 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
Another example is the research conducted trip making models into microsimulation.
by Ballas et al. (2005) using SIMBRITAIN. The obvious next step is to link all these
This model assumes that the initial simulation components into a more comprehensive
of the future population of Britain could be urban model. First, more linkage is required
based on population projections (such as between households and the supply-side
those of the ONS) and on the assumption that of the economy. For example it should
the trends in the changes to society to 2021 be possible to link all households to a
are similar to that of the previous decade. retail destination (by type of good) and a
However, alternative projections would also destination for primary and secondary health
be provided on the basis of hypothetical and education. By adding more information
social policy changes. They also examined on linkages or flows within the city it can be
child poverty as a major application area. argued that such modelling would offer major
For example, it is possible to use a dynamic new insights into urban deprivation or quality
spatial microsimulation model to estimate of life. Many households will be identified as
the degree of child poverty eradication within having poor accessibility to major services.
the next 20 years under different policies and However, multiple deprivation may well
assumptions, such as the onset of a major exist in many areas where poor accessibility
recession or a redistribution of wealth, and exists to all major urban services. For
the model would provide projections in order example, a neighbourhood may be a long way
to suggest where current strategies are failing from decent retail opportunities, a hospital
to eradicate child poverty within a generation. and a GP. In addition, although close to
Microsimulation still has to gain credibility a secondary school, that school may be
amongst the social science community in suffering from very low examination success
general and social policy researchers in par- and hence access is constrained to only a
ticular. Thus, there is currently a major chal- poor-performing school.
lenge to build on the work described above Once all the relevant demand-side and
in order to project the population into the supply-side databases are constructed, the
future to predict what would happen under next step would be to perform what-if
different macro-economic, micro-economic policy impact analysis. In particular, it
and social policy scenarios. This will enable will be possible to model what would
an evaluation of the short and long-term be the impact on the quality of life of
impacts that various government policies residents in different localities, under dif-
are likely to have on different segments of ferent scenarios. For instance, it would be
society and different geographical areas. possible to estimate what would be the
socio-economic and spatial impact of a new
hospital in an area, new retail facilities,
new schools, etc. It will also be possible to
link these activities to events taking place
15.4. THE WAY FORWARD: THE elsewhere in the city. For example, the
RESEARCH AGENDA impact analysis of the factory closure that
has been given by Ballas and Clarke (2001b)
15.4.1. Towards a comprehensive
can be extended by estimating multiplier
spatial microsimulation of
effects and the loss of spending power
urban systems
in the local community. Further, it would
We have seen in section 15.3 that progress be possible to estimate the downgrading
has been made on adding behavioural or of service facilities as businesses close or
288 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
relocate to more affluent areas. It would based models (ABM). ABM models are
then be possible to determine whether this normally associated with the behaviour of
development leads to even poorer service multiple agents in a social or economic sys-
provision for those communities affected. tem. These agents usually interact constantly
The possibility of individuals made redun- with each other and the environment they live
dant finding new jobs in the area, migrating or or move within. Thus their actions are driven
being retrained could also then be estimated. by certain rules. Although this methodology
Ballas et al. (2006) have made a start in this sounds similar to microsimulation (where
direction. agents could be the individuals within the
The second major effort needed is to link households) Davidsson (2000) notes that
such models more into the local and regional ABM may offer a better framework for
labour market through a framework which including behavioural rules into the actions
combines spatial microsimulation models of agents (including an element of random
and regional inputoutput models or regional behaviour) and for allowing interactions
econometric models. It has long been argued between agents. There are a number of good
that the treatment of the household sector has illustrations in a geographical setting (Batty
been ignored by most inputoutput modellers and Densham, 1996; Heppenstall et al.,
who at best would model or aggregate 2006). Clearly there is a research agenda
variables such as household income and to link these two complementary approaches
expenditure in aggregate form, making no more effectively. Microsimulation could be
distinction between the behaviour of different used to give the agents in ABM their
types of household defined in terms of socio- initial characteristics and locations whilst
economic status, employment profile, skill ABM could then provide the capacity to
level, etc. (Batey, 2003). It can be argued model individual adaptive behaviours and
that spatial microsimulation can address this emergence of new behaviours (see also the
issue. For instance, the prediction of input discussion of Boman and Holm, 2004).
output models for different sectors of the In addition, data from household panel
local economy can be spatially disaggregated surveys such as the British Household
with the use of a spatial microsimulation Panel Survey (BHPS) may be utilized to
model. Likewise, predictions of regional formulate plausible assumptions regarding
econometric models for the whole region these behaviours. For instance, it is possible
can be disaggregated at the individual and to use panel data from surveys such as
household level with the use of spatial the BHPS to model the life paths of
microsimulation. Jin and Wilson (1993) particular individuals and households who
made some progress here but data limi- have moved into and out of work. Such
tations made it difficult to operationalize data can also be combined with information
their models. Microsimulation potentially from more qualitative analyses to simulate
has the ability to provide much of that the behaviour of workers made redundant
missing data. following plant closures and how they
fare in adapting to the changing labour
market and how long term unemployment
is increased for those unable to retrain
15.4.2. Linking microsimulation
(Ballas et al., 2006). The findings of quali-
and agent-based models
tative studies such these can provide useful
Microsimulation is closely linked to another insights when formulating the rules that
type of individual level modelling: agent determine the likely behaviour of households
SPATIAL MICROSIMULATION 289
Figure 15.2 Combining spatial microsimulation and Remote sensing (Ballas et al., 2000).
the new framework comes when the potential might involve giving numerous labels to each
of microsimulation for business applications locality. For more discussion on this see
is considered. Given the potential to create Feng and Flowerdew (1998) and See and
lists of household attributes, it has long been Openshaw (2001). However, microsimula-
recognized that microsimulation could be tion would potentially offer another route
useful as a business tool. However, to date, to finding customers or consumer groups
little progress has been made in exploring of various types. From a main database of
this potential. In a sense, the database say 100 household variables it is possible
underpinning the microsimulation model to search for distributions made up of any
offers the same kind of information cur- of these variables. The possible number of
rently in many geodemographic or lifestyle combinations is very large indeed and the
data systems. Nevertheless, microsimulation user could ask for very specific combina-
offers much greater flexibility than many tions of variables, adding great flexibility
standard geodemographic systems. In most to the task of finding customers. Second,
cases, the geodemographic systems provide it would be possible to provide unique
only one label for each locality. This is classifiers for different localities. At the
based on the greatest percentage of each moment the underprivileged group may be
group represented in the locality. Unless made up of key census variables clustered
this percentage match is close to 100% in many different ways to end up with
there are always ecological fallacy prob- this classification. A major research ques-
lems: i.e., the label does not capture all tion is whether the underprivileged groups
consumer types resident in a particular identified in Liverpool are the same as
area. This has led a number of authors those identified in the East End of London.
to suggest fuzzy geodemographics, which A more subtle look at the outputs of the
SPATIAL MICROSIMULATION 291
microsimulation could offer new insights into data tables together with some knowledge
this issue. of Java programming not a desirable
Finally, the framework suggested here task for the average policy or decision
would add much to the potential of remotely maker. MicroMaPPAS provides a spatial
sensed data. It would be possible to put decision making interface which is much
estimations on the types of buildings in more user-friendly and suitable for decision
terms of housing types and characteristics of makers who can utilize the power of the
their inhabitants. Clearly, it is not possible spatial microsimulation methodology. The
to categorically say what types of families MicroMaPPAS software also provides
were in each building. However, it may be some basic mapping functions such as
possible to give an estimation of the types panning and zooming and symbology
of families within blocks thus giving very editing. The mapping capability in the
detailed portraits of small areas of our cities. software is provided by the GeoTools
(www.geotools.org) open source Java
mapping library, which has been written
by a group of researchers independent of
15.4.4. Spatial microsimulation,
the MicroMaPPAS project. GeoTools is
spatial decision support
a versatile Java library which conforms
systems and virtual
to the Open GIS Consortium standard
decision-making
specifications in relation to GIS open
environments
operability. The library can be adapted to
Another area where spatial microsimulation work in any Java based GUI or web-based
models can play an important role is in the Applet. The mapping controls allow the user
ongoing debates on the potential of new to select a microsimulated variable from a
technologies to promote local democracy query and map the results at a wide range
and electronic decision-making. It can be of different geographical scales (see Ballas
argued that spatial microsimulation models et al. (2004) and Ballas et al. (2006) for
can be used not only to provide information more details).
on the possible consequences and the local It can be also argued that systems such as
multiplier effects of major policy changes MicroMaPPAS can have an e-government
but also to inform the general public about dimension by allowing networking tech-
these and to enhance, in this way, the public nologies including the Internet to be used
participation in policy making procedures. by policy makers as well as the gen-
An example of work moving towards eral public. In particular, these systems
this direction is the Microsimulation can be converted into web-based GIS
Modelling and Predictive Policy Analysis to enhance public involvement and par-
System (Micro-MaPPAS) developed for ticipation in environmental planning and
the Leeds City Council by researchers at decision making processes. Such systems
the Universities of Sheffield, Leeds and are typically referred to in the literature
Manchester (Ballas et al., 2004, 2006). as Public Participation GIS (PPGIS) and
MicroMaPPAS is a planning support are based on the belief that by providing
system based on the SimLeeds geographical citizens with access to information and data
microsimulation model mentioned above. in the form of maps and visualizations,
The SimLeeds software (Ballas, 2001) has they can make better informed decisions
been run from a Command prompt and about the natural and built environment
required the hard coding of parameters and around them. It is possible to build on
292 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
on the specific microsimulation methodology (also see Voas and Williamson (2000) for
that is employed. In addition, spatial a more detailed discussion and an in-depth
microsimulation outputs generally depend evaluation of combinatorial optimization
on subjective judgements associated with techniques). Further, there is a need to
the ordering of the conditional probability build on existing work on the validity and
tables that are used as inputs and/or with reliability of microsimulation models (such
the selection of the data sets that are used as the work of Pudney and Sutherland (1994)
as small area constraints. As Birkin and who investigated the role of sampling error
Clarke (1995) point out, the modellers art in a tax-benefit model and the work of
in microsimulation is to generate population Voas and Williamson (2001) who present
characteristics in an appropriate order so new goodness-of-fit measures for synthetic
that potential errors are minimized. These microdata).
aspects should always be taken into account
when using spatial microsimulation models
for policy impact assessment.
However, there is the related problem 15.5. CONCLUSIONS
of how to validate microsimulation outputs,
since there are no available micro-data sets We hope that we have demonstrated that
at the desired level of geographical scale spatial microsimulation is a useful technique
(hence the need for microsimulation in the for estimating the characteristics of individ-
first place!). Model output validation is one uals or households which can then be used
of the biggest problems of microsimulation in a variety of what-if situations regarding
methodologies. As Williamson (1999) points policy change. The key advantage of this
out, in the United States the National methodology is data fusion or linkage
Academy was commissioned to evaluate a variety of data sets can be combined
the effectiveness of microsimulation for to provide new insights into household
tax-benefit analysis purposes. The National characteristics and, ultimately, household
Academy found that there is a general lack behaviour. Thus these models can help to
of thorough validation for microsimulation solve the problem of missing data such
models and proposed a number of validation as, in the UK, household income, wealth,
measures such as external validity studies tax payment, water demand, health problems,
in which model results are compared with crime, etc. Once built, these models can also
data from program administrative sources be linked to meso or macro models (such
(Williamson, 1999). Moreover, sensitivity as discrete choice models, spatial interaction
analysis and computer-intensive sample models, logit models, inputoutput models,
reuse technique methods to measure the etc.) to show how households interact with
variance in model estimates were proposed. the supply side of the economy (where they
Thus, further research is required, in go to work, shop, visit the doctor, etc.).
order to improve the performance of spatial The ability to change these circumstances
microsimulation models and to highlight the and assess the impacts of such actions is
sources of error. For instance, as Williamson another major advantage of this methodol-
et al. (1998) point out, there are many ogy. Simulations can be run which change
ways in which combinatorial optimization either the characteristics of the households
methodologies can be fine-tuned, through the (population ageing, new job allowing greater
evaluation of the use of more or different SAS income to be earned, change of residence,
tables or by changing the model parameters etc.) or the characteristics of the supply
294 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
side (new retail centre, closure of a major and Planning C: Government and Policy,
employer, new hospital, etc.). This ability to 19: 587606.
examine both household dynamics and the Ballas, D. and Clarke, G.P. (2001b). Towards local
impacts of infrastructure change allow the implications of major job transformations in the city:
analyst to explore both social policy impacts a spatial microsimulation approach. Geographical
Analysis, 33: 291311.
(tax or welfare changes for example) and/or
area-based policy impacts (new job creation, Ballas, D., Clarke, G.P. and Dewhurst, J. (2006).
new retail centre, etc.). Modelling the socio-economic impacts of major job
The research agenda outlined in the second loss or gain at the local level: a spatial microsimula-
tion framework. Spatial Economic Analysis, vol. 1(1),
half of the chapter is clearly our personal one pp. 127146.
but one that we hope other microsimulation
Ballas, D., Clarke, G.P., Dorling, D., Eyre, H., Rossiter, D.
modellers would at least partially agree
and Thomas, B. (2003). SimYork: Simulating Current
with. The agenda has not been presented and Future Trends in the Life of Households in
in any particular order of importance but York, report to the Joseph Rowntree Foundation,
the issue of how such models can support May 2003.
traditional spatial modelling seems a key Ballas, D., Clarke, G.P., Dorling, D., Eyre, H., Rossiter, D.
task to address in the short term. As we and Thomas, B. (2005). SimBritain: A spatial
noted above, a start has been made in this microsimulation approach to population dynamics,
direction but perhaps the greatest challenge Population, Space and Place, 11: 1334.
is merging microsimulation with more macro Ballas, D., Clarke, G.P., Feldman, O., Gibson, P.,
techniques such as input-output models. The Jianhui, J., Simmonds, D. and Stillwell, J. (2005b).
latter models are excellent for modelling A Spatial Microsimulation Approach to Land-use
Modelling, CUPUM 2005 (Computers in Urban
the interactions between key sectors of
Planning and Urban Management ) Conference
the economy but not so good at spatially Proceedings, UCL, London 29 June1 July 2005
disaggregating the outputs within cities and (available on-line from: http:// 128.40.59.163/
regions. A methodology which could feed cupum/searchPapers/papers/paper276.pdf)
individual households into the economic
Ballas, D., Clarke, G.P., Dorling, D. and Rossiter, D.
system at both stages of the modelling (2007). Using SimBritain to Model the Geographical
process (inputs and outputs) could be a Impact of National Government Policies, Geograph-
major advantage in future policy work. We ical Analysis, 39(1): 4477.
hope we can address this issue in the next Ballas, D., Kingston, R. and Stillwell, J. (2004). Using a
few years. spatial microsimulation decision support system for
policy scenario analysis. In: van Leeuwen, J. and
Timmermans, H. (eds), Recent Advances in Design
and Decision Support Systems in Architecture and
Urban Planning, pp. 177192. Dordrecht: Kluwer.
REFERENCES
Ballas, D., Kingston, R. and Stillwell, J. and Jin, J.
Ballas, D. (2001). A spatial microsimulation approach (2007). Building a spatial microsimulation-based
to local labour market policy analysis, unpublished planning support system for local policy making.
PhD thesis, School of Geography, University of Leeds. Environment and Planning A, 39(10): 24822499.
Ballas, D. and Clarke, G.P. (2000). GIS and microsim- Ballas, D., Rossiter, D., Thomas, B., Clarke, G.P. and
ulation for local labour market policy analysis. Dorling, D. (2005). Geography Matters: Simulating
Computers, Environment and Urban Systems, 24: the Local Impacts of National Social Policies,
305330. Joseph Rowntree Foundation contemporary research
issues, Joseph Rowntree Foundation, York.
Ballas, D. and Clarke, G.P. (2001a). Modelling
the local impacts of national social policies: a Batey, P.W.J. (2003). Extended inputoutput modelling
spatial microsimulation approach. Environment of regional impacts: does detail make a difference?
SPATIAL MICROSIMULATION 295
paper presented at the Royal Geographical Society Davidsson, P. (2000). Multi agent based simulation:
Annual Conference 2003 (Special session on beyond social simulation. In: S. Moss and
50 years of Regional Science or the Return of P. Davidsson (eds), Multi Agent Based Simulations,
Quantitative Economic Geography), London, 35 pp. 97100. Berlin: Springer.
September 2003.
Davies, H. and Joshi, H. (1992). Constructing
Batty, M. and Densham, P. (1996). Decsion support, GIS Pensions for Model Couples, in R. Hancock and
and urban planning. Systemma Terra, V(1): 7276. H. Sutherland (eds), Microsimulation Models for
Public Policy Analysis: New Frontiers, Suntory-Toyota
Birkin, M. and Clarke, M. (1988). SYNTHESIS a International Centre for Economics and Related
synthetic spatial information system for urban Disciplines LSE, London, 6796.
and regional analysis: methods and examples.
Environment and Planning A, 20: 16451671. Evans, A., Kingston, R., Carver, S. and Turton, I.
(1999). Web-based GIS to enhance public demo-
Birkin, M. and Clarke, G.P. (1995). Using microsim- cratic involvement, paper presented at the 4th
ulation methods to synthesize census data. In: International Conference on GeoComputation,
Openshaw, S. (ed.), Census Users Handbook, Fredericksburg, Virginia, USA, 2528 July.
pp. 363387. London: GeoInformation International.
Evandrou, M. and Falkingham, J. (1995). Gender, Lone-
Birkin, M. and Clarke, M. (1989). The generation parenthood and Lifetime Incomes, in J. Falkingham
of individual and household incomes at the small and J. Hills (eds), The dynamic of welfare: the welfare
area level using Synthesis, Regional Studies, 23: state and the life cycle, Prentice Hall/Harvester
535548. Wheatsheaf, New York, pp. 167183.
Birkin, M., Clarke, G.P. and Clarke, M. (1996).
Falkingham, J., Harding, A. and Lessof, C. (1995).
Urban and regional modelling at the microscale. In:
Simulating lifetime income distribution and redis-
Clarke, G.P. (ed.), Microsimulation for Urban and
tribution. In: J. Falkingham and J. Hills (eds),
Regional Policy Analysis, pp. 1027. London: Pion.
The Dynamic of Welfare: the Welfare State and
Boman, M. and Holm, E. (2004). Multi-agent the Life Cycle, pp. 6282. New York: Prentice
systems, time geography and microsimulations. Hall/Harvester Wheatsheaf.
In: M.O. Olsson and G. Sjostedt (eds), Systems,
Falkingham, J. and Lessof, C. (1992). Playing God or
Approaches and their Application, pp. 95118.
LIFEMOD The construction of a dynamic microsim-
Dordrecht: Kluwer Academic.
ulation model. In: R. Hancock and H. Sutherland
Caldwell, S.B. and Keister, L.A. (1996). Wealth in (eds), Microsimulation Models for Public Policy
America: family stock ownership and accumulation, Analysis: New Frontiers, pp. 532. London: Suntory-
19601995. In: Clarke, G.P. (ed.), Microsimulation Toyota International Centre for Economics and
for Urban and Regional Policy Analysis, pp. 88116. Related Disciplines LSE.
London: Pion.
Falkingham, J. and Hills, J. (1995a). The effects of the
Caldwell, S.B., Clarke, G.P. and Keister, L.A. (1998). welfare state over the life cycle. In: J. Falkingham and
Modelling regional changes in US household income J. Hills (eds), The Dynamic of Welfare: the Welfare
and wealth: a research agenda. Environment and State and the Life Cycle, pp. 83107. New York:
Planning C: Government and Policy, 16: pp. Prentice Hall/Harvester Wheatsheaf.
707722.
Falkingham, J. and Hills, J. (1995b). Redistribution
Clarke, G.P. (1996). Microsimulation: an introduction. between people or across the life cycle? In:
In: Clarke, G.P. (ed.), Microsimulation for Urban and J. Falkingham and J. Hills (eds), The Dynamic of
Regional Policy Analysis, pp. 19. London: Pion. Welfare: the Welfare State and the Life Cycle,
pp. 137149. New York: Prentice Hall/Harvester
Clarke, M. and Spowage, M. (1984). Integrated models Wheatsheaf.
for public policy analysis: an example of the practical
use of simulation models in health care planning. Fotheringham, A.S., Brunsdon, C. and Charlton, M.
Papers of the Regional Science Association, 55: (2000). Quantitative Geography: Perspectives on
2546. Spatial Data Analysis. Sage Publications.
Clarke, G. and Stillwell, J.C.H. (eds) (2004). Applied GIS Glennerster, H., Falkingham, J. and Barr, N. (1995).
and Spatial Modelling, London, Wiley. Education funding, equity and the life cycle.
296 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
In: J. Falkingham and J. Hills (eds), The Dynamic decision-making, Computers, Environment and
of Welfare: the Welfare State and the Life Cycle, Urban Systems, 24: 109.
pp. 150166. New York: Prentice Hall/Harvester
Longley, P.A. and Batty, M. (2003). (eds), Advanced
Wheatsheaf.
spatial analysis: The CASA book of GIS, Redlands,
Hgerstrand, T. (1967). Innovation diffusion as a spatial CA: ESRI Press.
process, University of Chicago Press, Chicago.
Longley, P.A., Goodchild, M.F., Maguire, D.J. and
Hancock, R. (2000). Changing for care in later life: an Rhind, D.W. (eds.) (1999). Geographical Information
exercise in dynamic microsimulation. In: L. Mitton, Systems: Management Issues and Applications.
H. Sutherland and M. Weeks (eds), Microsimulation New York: Wiley.
Modelling for Policy Analysis: Challenges and
Martin, D. (1996). Geographic Information Systems:
Innovations, pp. 226237. Cambridge: Cambridge
Socioeconomic Applications. London: Routledge.
University Press.
Mertz, J. (1991). Microsimulation A survey of prin-
Hancock, R. and Sutherland, H. (eds) (1992). ciples developments and applications, International
Microsimulation Models for Public Policy Analysis: Journal of Forecasting, 7: 77104.
New Frontiers. London: Suntory-Toyota Inter-
national Centre for Economics and Related Mitton, L., Sutherland, H. and Weeks, M. (eds) (2000).
Disciplines LSE. Microsimulation Modelling for Policy Analysis:
Challenges and Innovations. Cambridge: Cambridge
Hancock, R., Mallender, J. and Pudney, S. (1992). University Press.
Constructing a computer model for simulating the
future distribution of pensioners incomes for Great Nakaya, T., Yano, K., Fotheringham, A.S., Ballas, D.
Britain. In: R. Hancock and H. Sutherland (eds), and Clarke, G.P. (2003). Retail interaction modelling
Microsimulation Models for Public Policy Analysis: using meso and micro approaches, Paper presented
New Frontiers. pp. 3366. London: Suntory-Toyota at the 33rd Regional Science Association, RSAI
International Centre for Economics and Related British and Irish Section Conference, St. Andrews,
Disciplines LSE. Scotland, 2022 August.
Harding, A. (ed.) (1996). Microsimulation and Public Nelissen, J.H.M. (1993). Labour market, income
Policy. Amsterdam: North Holland, Contributions to formation and social security in the microsimula-
Economic Analysis 232. tion model NEDYMAS, Economic Modelling, 10:
225272.
Heppenstall, A.J., Evans, A.J. and Birkin, M.H. (2007).
Openshaw, S. (1995). Human systems modelling as a
Genetic Algorithm Optimisation of a Multi-Agent
new grand challenge area in science. Environment
System for Simulating a Retail Market. Environment
and Planning A, 27: 159164.
and Planning B, 34: 10511070.
Orcutt, G.H. (1957). A new type of socio-economic
Holm, E., Lindgren, U., Makila, K. and Malmberg, G.
system. The Review of Economics and Statistics,
(1996). Simulating an entire nation. In: Clarke, G.P.
39: 116123.
(ed.), Microsimulation for Urban and Regional Policy
Analysis, pp. 164186. London: Pion. Orcutt, G.H., Mertz, J. and Quinke, H. (eds) (1986).
Microanalytic Simulation Models to Support Social
Hooimeijer, P. (1996). A life-course approach to
and Financial Policy. Amsterdam: North-Holland.
urban dynamics: state of the art in and research
design for the Netherlands. In: Clarke, G.P. (ed.), Orcutt, G.H., Greenberger, M., Korbel, J. and Rivlin, A.
Microsimulation for Urban and Regional Analysis, (1961). Microanalysis of Socioeconomic Systems:
pp. 2863: London: Pion. A Simulation Study, Harper and Row, New York.
Huang, Z. and Williamson, P. (2001). A comparison Paas, G. (1986). Statistical match: Evaluation of
of synthetic reconstruction and combinatorial opti- existing procedures and improvements by using
misation approaches to the creation of small-area additional information, in G.H. Orcutt, J. Mertz and
microdata, Working Paper 2001/2, Department of H. Quinke (eds), Microanalytic Simulation Models to
Geography, University of Liverpool. Support Social and Financial Policy, North-Holland,
Amsterdam, 401421.
Kingston, R., Carver, S., Evans, A. and Turton, I.
(2000). Web-based public participation geographical Propper, C. (1995). For richer, for poorer, in sickness
information systems: an aid to local environmental and in health: The lifetime distribution of NHS
SPATIAL MICROSIMULATION 297
health care. In: J. Falkingham and J. Hills (eds), and Quantitative Geography European colloquium,
The Dynamic of Welfare: the Welfare State and Durham Castle, Durham, 37 September, 1999.
the Life Cycle. pp. 184203. New York: Prentice
Voas, D. and Williamson, P. (2000). An evaluation
Hall/Harvester Wheatsheaf.
of the combinatorial optimisation approach to the
Pudney, S. and Sutherland, H. (1994). How reliable are creation of synthetic microdata. International Journal
microsimulation results? An analysis of the role of of Population Geography, 6: 349366.
sampling error in a UK tax-benet model, Journal of
Voas, D. and Williamson, P. (2001). Evaluating
Public Economics, 53: 327365.
goodness-of-t measures for synthetic microdata,
Redmond, G., Sutherland, H. and Wilson, M. (1998). Geographical and Environmental Modelling,
The Arithmetic of Tax and Social Security Reform: 5: 177200.
a Users Guide to Microsimulation Methods and
Wegener, M. and Spiekermann, K. (1996). The potential
Analysis, Cambridge: Cambridge University Press.
of microsimulation for urban models, in G.P. Clarke
See, L. and Openshaw, S. (2001). Fuzzy geodemo- (ed.) Microsimulation for Urban and Regional Policy
graphic targeting. In: G.P. Clarke and M. Madden Analysis, Pion, London, 149163.
(eds), Regional Science in Business, 269282. Berlin:
Wertheimer II, R., Zedlewski, S.R., Anderson, J.,
Springer-Verlag.
Moore, K. (1986). DYNASIM in comparison with
Smith, D., Clarke, G.P., Ransley, J. and Cade, J. other microsimulation models, in G.H. Orcutt,
(2006) Food access and health: a microsimulation J. Mertz and H. Quinke (eds), Microanalytic
framework for analysis. Studies in Regional Science, Simulation Models to Support Social and Financial
35(4), 909927. Policy, North-Holland, Amsterdam, 187206.
Sutherland, H. and Piachaud, D. (2001). Reducing child Williamson, P. (1992). Community care policies for the
poverty in Britain: an assessment of government elderly: a microsimulation approach. Unpublished
policy 19972001, The Economic Journal, 111: PhD Thesis, School of Geography, University of
85101. Leeds, Leeds.
Sutherland, H., Sefton, T. and Piachaud, D. (2003). Williamson, P., Birkin, M. and Rees, P. (1998).
Poverty in Britain: the Impact of Government Policy The estimation of population microdata by using
since 1997, Joseph Rowntree Foundation, York (also data from small area statistics and samples of
available on-line from: http://www.jrf.org.uk) (ISBN anonymised records. Environment and Planning A,
1 85935 152 2). 30: 785816.
Sutherland, H., Taylor, R. and Gomulka, J. (2002). Williamson, P. and Voas, D. (2000). Income estimates
Combining household income and expenditure data for small areas: lessons from the census rehearsal,
in policy simulations, Review of Income and Wealth, BURISA, 146: 210.
48(4): 7594.
Williamson, P. (1999). Microsimulation: An idea whose
Taylor, M.F., Brice J., Buck, N., Prentice-Lane, E. time has come? paper presented at the 39th
(2001). British Household Panel Survey User Manual European Regional Science Association Congress,
Volume A: Introduction, Technical Report and University College Dublin, Dublin, Ireland, 2327
Appendices. Colchester: University of Essex. August.
Veldhuisen, K.J., Kapoen, L.L. and Timmermans, H.J.P. Wilson, A. and Pownall, C.E. (1976). A new
(2000). RAMBLAS: A regional planning model based representation of the urban system for modelling and
on the micro-simulation of daily activity patterns, for the study of micro-level interdependence, Area, 8:
Environment and Planning A, 31: 427443. 246254.
Vencatasawmy, C.P., Holm, E. and Rephann, T. Wilson, A.G. (2000). Complex Spatial Systems: the
et al. (1999). Building a spatial microsimulation Modelling Foundations of Urban and Regional
model, paper presented at the 11th Theoretical Analysis. London: Prentice Hall.
16
Detection of Clustering in
Spatial Data
Lance A. Waller
4. Question we can
1. Question we
answer with data and
want to answer.
methods we have.
interest driving different families of analytic spatial data with the increasing sophistication
approaches. and data holdings of geographic information
To set the stage conceptually, Figure 16.1 systems (GISs). One is increasingly faced
provides a starting point for developing and with the ease of including found data
evaluating analytic methods for detecting collected by others that seems to fit the bill
clusters and clustering. We begin with for the data one would really like to have.
Step 1 with a question we wish to answer After obtaining the data we can retrieve,
(for example, Are disease risks elevated we conduct analysis on these available data,
for individuals living near a source of often without explicitly acknowledging that
pollution?). The question of interest defines our analyses may be addressing slightly
the sorts of data and methods we require to different questions (e.g., in our conceptual
answer the question (for example, individual- example, we have moved from a question
level case status and individual exposure involving associations between disease and a
histories). However, the data required often particular exposure, to associations between
are unavailable for reasons varying from cost disease and present proximity to a known or
to privacy and we often settle for related data suspected exposure source). As a final step,
we can obtain within budget and satisfying we should carefully examine how closely
availability constraints (for example, present the questions we do answer mirror those we
residential location of cases and proximity to originally intended to answer. All too often,
known sources of pollution). Similarly, avail- this last step is ignored.
able methods may only address part of the While we can consider the steps shown in
question or may be particularly susceptible Figure 16.1 as a linear set of steps (1, 2, 3, 4),
to data shortcomings (for example, missing it is often a loop where the answers obtained
data or location inaccuracy). This situation on the available observational data in Step 4
is particularly relevant in the analysis of inform on refinements to the questions
DETECTION OF CLUSTERING IN SPATIAL DATA 301
asked in Step 1 and suggest limitations the national annual incidence rate applied
arising due to the data compromises between directly to all individuals in the study area.
Steps 2 and 3. That is, the aggregation of six cases appears
to be unlikely under a simple model of all
children experiencing equal risk. Contrast
this example with that of clustering where we
16.2. WHAT ARE WE LOOKING FOR? observe multiple pockets of higher incidence
than expected from national rates, perhaps
It is appropriate to begin by considering the separated by areas of lower-than-expected
very basic question: What exactly do we hope local rates.
to find? Besag and Newell (1991) provide Besag and Newell (1991) also note the
several important observations relevant to the difference between seeking clusters or clus-
search for clusters. The first key distinction is tering anywhere versus around particular
between detection of clusters and the detec- locations of interest. They denote the former
tion of clustering. A cluster represents an as general methods and the later as focused
unusual collection of events while clustering methods, also referred to as global and
represents a general tendency for events to local methods, respectively, in the geog-
occur nearer other events than one might raphy literature by Anselin (1995) and in
expect. the disease clustering literature by Kulldorff
These definitions of cluster and et al. (2003).
clustering differ from those found in cluster As suggested by Figure 16.1, seeking
analysis, a set of analytical classification general or focused clusters or clustering
methods designed to group observations into defines different questions of interest and,
clusters wherein observations within the as a result, methods appropriate for seeking
same cluster are more alike than those from individual clusters might not be the best
different clusters. The overlap in terminology approach to measure clustering and vice
can be confusing when reviewing the versa. We will explore this in more depth in
literature, especially since some spatial the examples below.
methods to detect clusters and/or clustering The general ideas of clusters and clus-
utilize concepts and methods from cluster tering arise in many different disciplines.
analysis (Knorr-Held and Raer, 2000; However, each discipline often brings its
Denison and Holmes, 2001). As illustrated own particular sets of questions of interest,
in Figure 16.1, it remains critical to clearly assumptions regarding data availability, and
identify goals and conclusions in the context familiar statistical methods. For example, the
of both the questions addressed and the fields of epidemiology and criminology both
methods used to address them. exhibit interest in the detection of clustering
In the discussion below, we follow Besag within geographically referenced data sets.
and Newell (1991) and take the term However, the sets of techniques appearing in
cluster to define an anomaly, an interesting their respective literatures are largely distinct
collection of spatial locations that appears and cross-references between the fields are
to be inconsistent with some background rare. This situation is unfortunate since both
conceptual model of how events arise. For fields could draw from the experiences and
instance, a cancer registry might report six ideas of the other. Figure 16.1 provides
new cases of childhood leukemia in a small a general context for comparison and we
neighborhood in a particular year, when express and compare ideas from recent
only one new case would be expected if surveys in both fields in the sections below.
302 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
operationalizing the null model mentioned area is a constant, denoted l, across the
above. As a result, most tools aim to define entire study area. CSR corresponds to a
some measure of the unusualness of a spatial Poisson point process yielding the
cluster, then determine the distribution of following features: the number of events
this quantity under the null (uninteresting) observed in a region A within the study
model, and compare the quantity based on the area follows a Poisson distribution with
observed data to this null distribution (Waller mean l|A| where |A| denotes the area of
and Jaquez, 1995). In a statistical hypothesis A, the number of observed events in non-
setting, the null hypothesis is defined concep- overlapping areas are independent of one
tually as the absence of clusters/clustering, another, and, given the observed number
and operationally as the expected distribution of events, events are uniformly distributed
of our measure (statistic) under the null across the study area (and any region within
model. it). For clarity we follow Diggle (2003)
As a result, the analytic tools required and distinguish between an event location
for statistical inference are a definition of where an observed event did occur, and a
our statistic and its null distribution. In the point location where an event could occur.
sections below, we will illustrate several A typical data set consists of a set of
types of statistics and contrast the underlying event locations and we often compare the
questions addressed by each. value of our statistic based on events to
Before defining particular methods, we the distribution of values associated with
offer a brief review of some basic proba- randomly selected events.
bilistic elements for point-referenced event While CSR represents a complete lack of
locations driving many of the methods clustering, data generated by CSR nonethe-
illustrated below. The first is the definition less visually exhibits some clumping and
of complete spatial randomness (CSR). gapping due to the inherent randomness,
A set of events arising from CSR has the and one purpose of a statistical test is to
following properties: first, the total number determine whether the observed patterns in
of events observed in the study area follows our data are more extreme than the amount of
a Poisson distribution; second, given the clumping and gapping expected under CSR.
observed number of events, event locations Figure 16.2 illustrates three realizations of
occur independently of one another and CSR with 100 events uniformly distributed
the expected number of events per unit across a square. It is worth noting that the
1.0
1.0
1.0
0.8
0.8
0.8
0.6
0.6
0.6
y
y
0.4
0.4
0.4
0.2
0.2
0.2
0.0
0.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x x x
uniform distribution of event locations rep- a lurking issue in the analysis of spatial
resents a uniform probability of occurrence, pattern in general, and specifically in
not an evenly spaced set of events. the detection of clusters. Bartlett (1964)
CSR represents a convenient null model showed that, without additional information,
and many tests of CSR exist in the literature a pattern of independent events arising
(Cressie, 1993, p. 604), but CSR may from a process with spatially varying
not be the appropriate reference pattern intensity is mathematically indistinguishable
for applications where the population at from a process of spatially dependent
risk is spatially heterogeneous. A common events arising from a process with
adjustment is the use of a heterogeneous spatially constant intensity, let alone
Poisson process where the number of events from patterns based on spatial variations
expected per unit area is allowed to vary with in both correlation and intensity. The
location. If we define l(x) as the expected additional information allowing one
number of events per unit area at location to separate the intensity and correlation
(point) x, we refer to l(x) as the intensity effects could be based on temporal
function of the process. We adjust the Poisson ordering of events to see if the location
process properties as follows: first, the of past events influences future events
number of events observed in any region still (for example, with infectious diseases or
follows a Poisson distribution but now with diffusion of new technologies), or replicated
the mean defined as the integral of l(x) over observations of the same process over
that region, counts from non-overlapping time to see if a suspected cluster remains
regions remain statistically independent, and in the same location (for example, near
events are distributed according to a (spatial) a putative source of increased risk) or
probability density function proportional to if one observes similar patterning but in
the intensity function. That is, more events different locations for each time period.
are expected in locations where the intensity When contrasting methods based on
function is high, and fewer events are independent or dependent events, it is
expected in locations where the intensity important to recognize that both approaches
function is low. represent an idealization of reality: neither
The heterogeneous Poisson process approach is right, both are useful, but each
offers a flexible model of the spatial answers our questions of interest in slightly
distribution of point-locations of events, different ways.
and its properties regarding counts for non- The basic probability models described
overlapping regions define the distributional above also provide a recipe for simulating
basis for several commonly-used models sets of events following a given null model,
for regional count data. However, the thereby providing a powerful tool for Monte
assumed independence of counts raises some Carlo-based statistical inference. Recall that
eyebrows, especially among geographers in frequency-based statistical hypothesis test-
for whom spatial autocorrelation is often ing, one often considers the p-value, the
a fundamental assumption in any spatial probability under the null hypothesis of
analysis (Toblers First Law of Geography; observing a more extreme value of the test
Tobler, 1970). The distinction between statistic than one observes in the data set.
a process defined by independent events Monte Carlo hypothesis testing (Barnard,
with spatially patterned means versus a 1963; Waller and Gotway, 2004; Chapter 5)
process defined by spatially correlated uses simulation to estimate this probability
counts with identical means represents by generating multiple data sets under the
DETECTION OF CLUSTERING IN SPATIAL DATA 305
null model, calculating the test statistic for the local probability of an event. Random
each, constructing a histogram of these values labelling provides a second null model,
as an approximation to the null distribution similar to the first, but designed when one has
of the test statistic, and calculating the a sample of event locations and a sample of
proportion of test statistic values associated non-event or control locations (individuals
with null simulations exceeding the value sampled from the population at risk of events)
of the test statistic associated with the (Diggle, 2003; Waller and Gotway, 2004,
observed data. Note that the accuracy of Chapter 6) wherein we condition on the
the estimated p-value is a function of observed locations and randomly assign the
the number of simulations, not the sample event status (label) among the full set of
size of the observed data, thereby putting locations. That is, if we observe 30 events
the level of accuracy into the analysts and have locations for 70 individuals not
hands. This is not to say that sample size experiencing an event (controls), we keep the
is unimportant. Sample size impacts the set of 100 locations, and randomly assign
variation of the statistic under the null and 30 of these to be events in each simulated
alternative hypotheses, while the number data set. Note that random labelling assumes
of simulations controls the accuracy of the a constant probability of event assignment,
simulation-based tail probability ( p-value) based on the observed ratio of events to non-
estimates. In some cases, theoretical deriva- events. At first glance, this seems identical
tions of proposed test statistics exist, but to the constant risk assumption but two
often these are based on particular distribu- subtle differences remain. First, the random
tional assumptions (for example, Gaussian labelling hypothesis is conditional on the
or normally-distributed observations) and it set of locations (both event and non-event)
is not always immediately clear whether so random labelling simulations will not
the results apply in settings having different place events in any other locations. Second,
structures. In contrast, as long as one can constant risk simulations could be based
simulate data under a reasonable null model, on an event probability estimated from the
the Monte Carlo approach yields accurate observed data or could be based on an
inference. externally reported probability (for example,
Two general null models are worth national disease or crime rates). If the
mention in our discussion of Monte Carlo study takes place in an area different from
techniques for the detection of clus- that providing the basis for the external
ters/clustering. The first, mentioned above, probability, it is possible that the local
is that of constant risk, that is, an assumed probability is sufficiently higher or lower
constant probability of the event outcome than the external probability so the observed
for each individual under study. If one has data will seem inconsistent with simulated
either point locations or regional counts values based on the external value for
reflecting a census, one can estimate the no other reason than the discrepancy in
overall global risk of the event through the the background probability and not due
overall observed proportion of individuals to spatial clusters or clustering within the
experiencing the event. Then, one may data set.
randomly assign the observed number of Again referring to Figure 16.1, each
events to the population at risk to obtain of these steps represents a decision that
each simulated data set. The constant risk may, subtly or not, impact the question
null model can also adjust for local covariate addressed in the analysis. In the develop-
effects by using the covariates to define ment, implementation, and review of specific
306 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
spatial analyses, it is important to design, field research in the area between 1967
report, and understand the type of null model and 1987 leading to a body of research
driving the simulations in order to place summarized in texts by Gumerman (1970),
results within the proper context and to Gumerman et al. (1972), Plog (1986), and
connect Steps 4 and 1 in Figure 16.1. Powell and Smiley (2002). The study is
Finally, it is worth noting that there are relatively unique in its careful survey of
many more advanced computational and a large tract of land and detailed mapping
mathematical methods of statistical analysis of the location of every site discovered on
of point patterns under current development. the surface. For our illustrative purposes,
Such models allow one to define parametric we make the simplifying assumption of a
models of clustering of event locations constant probability of detection of surface
(Lawson and Denison, 2002), assign random sites regardless of age or location. Figure 16.3
measurements (often referred to as marks) represents data locations abstracted from
to event locations, or allow interactions maps presented in Plog (1996). The 100
between multiple point processes observed open circles represent sites dated to the
over the same spatial study area (see Mller time period 9501049 CE and the 390
and Waagepetersen (2002) for detailed tech- filled circles represent sites dated to the
nical development). Many of these make time period 10501150 CE. The later period
use of computationally intensive Markov represents a time of great expansion of
chain Monte Carlo (MCMC) methods for the Anasazi culture (as represented by the
likelihood or Bayesian inference for para- increased number of settlement sites), but
metric models of point processes. However, ends coincident with a time of large-
the non-parametric Monte Carlo approaches scale abandonment of sites by the Anasazi
presented below represent exploratory tech- throughout the southwestern United States c.
niques for detecting the presence of clusters 11001150 CE.
and/or clustering without explicitly modeling To illustrate the methods described below,
the type of clustering. The approaches illus- we will compare spatial patterns between
trated here offer robust statistical inference the early and late sites represented in the
and a good starting place for analysis. data set, seeking both clusters and clustering
within the data sets.
Anasazi sites
150
100
Early sites
v
Late sites
50
0
0 50 100 150
u
Figure 16.3 The Anasazi data set from the Peabody Coal Eastern Lease on Black Mesa in
northeastern Arizona. Empty circles represent early sites (dated 9501149 CE) and lled
circles represent locations of later sites (dated 10501150 CE).
Frequency
50
50
0
0
900 920 940 960 1520 1540 1560 1580 1600
Frequency
50
50
0
2120 2140 2160 2180 2200 2220 2240 2750 2800 2850
Frequency
50
50
0
Figure 16.4 Histograms and associated p-values of the cumulative number of late events
among the nearest neighbors of early events based on 999 random labelling simulations for
the Anasazi data set.
310 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
be observed. An edge corrected (ec) version not Do the late sites appear consistent
is provided by with CSR? but rather Do the late sites
exhibit more clustering than the early sites?
We can use a random labelling Monte
1
N
N
'ec (d) = '
K l (wij )1 (d(i, j) < d)
Carlo approach to address this question
i=1 j=1
by repeatedly sampling 390 sites from
j=i the set of early and late sites combined,
(16.2) estimating the K function and exploring the
variability of these estimates. Figure 16.5
illustrates the pointwise median, 2.5th and
where the average is replaced by a weighted
97.5th percentiles of estimates of 'L(d) d,
average with weight wij defined as the
based on 999 random labelling samples.
proportion of the circumference of the circle
We note that the estimate based on the
centered at event i with radius d(i, j)
data falls well within the band of values
which lies within the study area. With a
likely under the random labelling hypothesis
constant intensity, wij denotes the conditional
so that the observed set of late sites
probability of an event occurring at distance
does not differ from the patterns expected
d(i, j) from event i falling within the study
under random labelling in a statistically
area, given the location of event i. Note that
significantly way.
wij = 1 if the distance between events i and j
At this point, the pattern of the late sites
is less than the distance between event i and
does not appear to differ significantly from
the edge of the study area.
the pattern of the early sites either in its
Under CSR, K(d) = d 2 (the area of
observed nearest neighbor relationships or its
a circle with radius d and patterns exhibit
distance-based associations. However, both
clustering for K(d) > d 2 . To simplify the
approaches applied so far explore clustering
graphical expression of the K function, Besag
and we next consider approaches to evaluate
(1977) proposed a transformation:
the possible existence of clusters within the
late sites.
' 1/2
' Kec (d)
L(d) d = d
0 20 40 60 80 100
Distance (d)
Figure 16.5 The estimate of the standardized K function (L(d )) for the late Anasazi sites
(solid line) compared to the median (dashed) and 95 percent tolerance bands based on
999 random labeling simulations.
local likelihood ratio statistics represents an different exercise than seeking the most likely
independent sample under the null hypothesis cluster of early sites. In some applications
and its histogram provides an estimate it is clear which events one wishes to find
of the null distribution of the maximized a cluster of (e.g., cases versus non-case
local likelihood ratio statistic. Note that this controls in epidemiology); in others it is not
approach compares the maximized likelihood as obvious and both questions are of interest.
statistic regardless of where it occurs rather Second, we must interpret the results in light
than comparing the measure of unusualness of the set of potential clusters considered.
at its observed location to the measures of Here, we only consider circular clusters and
unusualness at that same location. may miss more oblong or sinuous clusters,
We can contrast these two approaches perhaps following rivers. The most recent
by considering the questions answered by version of SaTScan incorporates elliptical
each. By comparing the observed measure potential clusters and recent methodological
of unusualness to the measure observed work by Assuno (2006) and Patil and Tallie
anywhere in the simulated data sets we (2004) further expand the set of potential
answer How unusual does our most likely clusters at increasing computational cost. The
cluster appear compared to how unusual the impact of expanding the set of potential
most likely cluster appears under the null clusters on the statistical power of detection
hypothesis? If we compare the observed for subsets of this class remains to be studied
measure at a particular location to the in detail. For instance, it is not known
observed measure at that location in each to what extent including both circular and
of the simulated data sets, we answer elliptical clusters might reduce the power to
How unusual does our most likely cluster detect only elements of the subset of circular
appear compared to any other cluster at clusters.
this location? The first question represents
a single question particular to the most likely
cluster but the second is particular to a
16.7.2. Finding peaks and valleys:
location and radius. Openshaw et al.s (1988)
Estimating the spatial
GAM and similar methods essentially ask the
intensity
second question for each location and radius
which generates multiple hypothesis tests and The spatial scan statistic is appealing, but
complicates inference, again illustrating the is limited to the set of potential clusters.
importance of Figure 16.1. A more general approach involves estimation
To illustrate the spatial scan statistic, of the intensity function associated with a set
Figure 16.6 shows the most likely cluster of of observed event locations. The conceptual
late sites in the Anasazi data by the thick, framework of a spatial point process views
dark circle and the most likely cluster of the set of observed locations as a realization
early sites by the thin, light circle. Neither of a random distribution in space. The next
is statistically significant. Even though the step involves estimating the local probability
most likely cluster of late sites consists of of an event occurrence and defining clusters
only one early site (on the edge), the late sites as areas where events appear to be most
outnumber the early sites in the data set so likely.
this is not a particularly unusual grouping of Kernel estimation is a popular approach
events. for estimating probability distributions and
A few items merit mention. First, note that has seen broad use in spatial analy-
seeking the most likely cluster of late sites is a sis as well (Bailey and Gatrell, 1995;
314 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
Anasazi sites
150
100
v
p = 0.583
50
p = 0.530
0
0 50 100 150
u
Figure 16.6 SaTScan results for the Anasazi data set. Thick, dark circles and p-values
correspond to the most likely clusters of late sites, and thin, light circles and p-values
correspond to the most likely clusters of early sites.
McLafferty et al., 2000; Diggle, 2003; typically a probability density function such
Eck et al., 2005). Conceptually, suppose as a bivariate Gaussian density or other
we place an equal amount of soft mod- function which integrates to one. At each
eling clay over each event location on of a fine grid of points, we sum the
our map. These will overlap for events kernel values associated with each observed
near each other and the resulting height event, yielding a smooth surface estimat-
of the entire surface represents our esti- ing the unknown intensity function. The
mate of the spatial intensity, higher in bandwidth (spatial extent) of each kernel
areas with many observed events, lower controls the overall amount of smoothness
in areas with few observed events. More in the estimated intensity surface with
precisely, we place a smooth, symmet- larger bandwidths corresponding to smoother
ric function (the kernel) over events, surfaces. Essentially, the kernel takes each
DETECTION OF CLUSTERING IN SPATIAL DATA 315
observation and spreads its influence over bandwidth of 15 distance units. Visually, we
a local area corresponding to the kernel observe some differences between the two
function. intensity estimates, such as a more distinct
Mathematically, suppose x denotes the gap between site intensity for the late period
vector location of N events (x1 , x2 , . . . , xN ), (right-hand plot) in the northern third of
and x denotes any location within our study the study area, and perhaps an additional
area A. The kernel estimate of the intensity mode for the early period (left-hand plot) in
l(x) is: the southwestern section of the study area.
Such conclusions must be interpreted with
caution however, since they are dependent
1
N
x xi upon the bandwidth used for estimation. In
l(x) = Kern (16.4)
|A|b b this illustration we use the same bandwidth in
i=1
both plots to facilitate numerical comparisons
between them in the next subsection, even
where |A| denotes the geographic area of our though the two time periods contain different
study area A, Kern() is a kernel function sample sizes.
satisfying:
Kern(x) dx = 1 16.7.3. Comparing maps:
A Contouring relative risk
Intensity estimates provide a descriptive view
and b denotes the kernels bandwidth. of local variations in the probability of event
Figure 16.7 represents the two intensity occurrence. However, as mentioned above,
estimates for the Anasazi site data for a the interpretation of clustering depends on the
150
100
100
v
v
50
50
0
Figure 16.7 Kernel estimates of the intensity functions for the patterns of late (left) and
early (right) sites for the Anasazi site data based on a bandwidth of 15 distance units.
316 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
(often spatially-varying) population at risk of or less likely than the other. In order
an event. That is, we are often more interested to use this approach to detect clusters,
in spatial variations in the risk (probability) we seek peaks or valleys in the surface.
of an event rather than in spatial variations To assess statistical significance, the next
in the actual numbers of events. For crime step is to decide whether the peaks and
data, we often do not have point-level valleys are more extreme than one would
population data or samples of the locations expect to observe under a null hypothesis.
of control individuals not experiencing the Kelsall and Diggle (1995) propose using
crime under study, and intensity analysis random labeling simulations to determine
concludes with interpretation of the intensity local clusters. Suppose we have n0 type 0
function of events (Eck et al., 2005). In events and n1 type 1 events. Conditional
other fields, such comparison patterns are on the complete set of observations of both
more readily available, and we next con- types of events, we randomly assign n0 of
sider statistical identification of clusters via the events to be type 0, the rest to be type 1,
comparisons between two estimated intensity and calculate r(g) for a grid of locations
functions. g = (g1 , g2 , . . . , gG ). We repeat the random
Suppose we have two types of events labeling a large number of times providing
(events and controls, early or late sites, etc.). a large number of r(gi ) values for each gi
Bithell (1990), Lawson and Williams (1993), in our grid, under the random labeling null
and Kelsall and Diggle (1995) propose hypothesis. If the value of r(gi ) based on the
approaches for comparing kernel estimates observed data is more extreme than the 2.5th
from each type of event, say l0 and l1 . or 97.5th percentiles of the values based on
Kelsall and Diggle (1995) examine the the simulation, we mark the location on the
surface generated by the natural logarithm of map. We note that this approach provides
the ratio of the two intensity functions: pointwise inference, not overall inference
due to the multitude of grid points and the
correlation between values of r(g) induced by
l1 (x) the kernel function (nearby estimates share
r(x) = log
l0 (x) the same data).
Figure 16.1 provides a basis for com-
parison between the spatial scan statistic
for any location x in our study area A. and the log relative risk surface. The scan
To borrow a term from epidemiology, the statistic addresses the question Where is the
ratio of the two intensity functions reflects most unusual collection of cases and how
the relative risk, and the log transformation unusual is it compared to what would be
places the ratio on a more symmetric scale expected of the most unusual collection under
around its null value (0.0 on the log scale). the null hypothesis? The log relative risk
Kelsall and Diggle (1995) point out technical surface addresses: Where are different types
and practical reasons for using the same of events more or less likely than others and
bandwidth for both kernel estimates, pri- how do these differences compare to what
marily to avoid confounding the smoothness we would expect under the null hypothesis?
of the r(x) surface by differences in the One important distinction between these
underlying smoothness of the two intensity two questions is the emphasis on a single
estimates. cluster in the first and the emphasis on
The log relative risk surface r(x) illustrates the entire log relative risk surface in the
areas where events of each type are more second. For instance, a focus on a single
DETECTION OF CLUSTERING IN SPATIAL DATA 317
cluster ignores the size, number, and location To illustrate the approach, Figure 16.8
of other local peaks and valleys across illustrates the log relative risk of late versus
the surface. Also, if we were to use the early sites based on the kernel intensity
pointwise interval inference to identify a estimates shown in Figure 16.7. On the
single most likely cluster from the log contour plot, we indicate grid points with
relative risk surface we would fall into local relative risk estimates falling above
the same multiple inference problem as and below the 95 percent tolerance intervals
discussed above for GAM-type methods. (defined by random labeling) by + and
Instead, we should think of the collection symbols, respectively. We see locally statis-
of pointwise intervals as a general guide tically significant increases in the relative
to describe the variability (under the null probability of late versus early sites in the
hypothesis) of the estimated log relative risk north-central area mentioned in our discus-
surface across the study area, and draw sion of Figure 16.7.
attention to locations where the estimated log How can we reconcile the locally sig-
relative risk surface wanders outside of these nificant cluster shown in Figure 16.8 with
bounds. Leong (2005) recently proposed the non-significant most likely cluster found
and compared several approaches to move by the spatial scan statistic in Figure 16.6?
from pointwise to simultaneous intervals Closer examination of Figure 16.6 reveals
around such log relative risk functions in that the collection of late sites (filled circles)
one dimension and extensions to higher driving the cluster identified in the log
dimensions would provide a stronger basis relative risk plot is an oblong concentration
for inference. of late sites in the north central portion of
0.5
150
Log
++0.5
+
0 ++
++
++
++
++
++
++
+
rela
+
+++
+++
++
++++
+++
t
ive r
100
0
v
isk
0
50
+0+.++
v
+++++5+
+ 0.5
++
++
++
+++
+++
++++
++ +2+++
0 +
+1
u
0 50 100 150
u
Figure 16.8 Log relative risk surface comparing the probabilities of late sites versus that of
early sites for the Anasazi site data based on a bandwidth of 15 distance units. On the
contour plot, + denotes a point exceeding the upper 95 percent pointwise tolerance limits
and a point exceeding the lower 95 percent limit (see text).
318 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
the study area. This concentration would not This work is supported in part by
be considered among the circular potential grant NIEHS R01 ES007750. The opin-
clusters we used in our application of the ions expressed herein are solely those of
spatial scan statistic. The example illustrates the author and may not reflect those of
the importance of understanding the types the National Institutes of Health or the
of clusters evaluated by a particular method National Institute of Environmental Health
when comparing results between different Sciences.
approaches. In addition, the most likely
clusters identified by the spatial scan statistic
do not appear as unusual peaks in the
log relative risk surface since (as with
the scan statistic) there is not a strong REFERENCES
excess of early or late sites in these
Anselin, L. (1995). Local indicators of spatial
locations. association: LISA. Geographical Analysis, 27:
93116.
Assuno, R., Costa, M., Tavares, A. and Ferreira, S.
(2006). Fast detection of arbitrarily shaped
16.8. CONCLUSIONS disease clusters. Statistics in Medicine, 25:
723742.
The sections above illustrate the importance Bailey, T.C. and Gatrell, A.C. (1995). Interactive
of understanding what sort of spatial patterns Spatial Data Analysis. New York: John Wiley and
statistical approaches investigate in studies Sons.
to detect clusters and/or clustering. The data Barnard, G.A. (1963). Contribution to the discussion
set provides an interesting example where of Professor Bartletts paper. Journal of the Royal
we observe no significant clustering but a Statistical Society, Series B, 25: 294.
significant cluster, provided we examine a Bartlett, M.S. (1964). The spectral analysis of
broad enough class of potential clusters. two-dimensional point processes. Biometrika, 51:
Figure 16.1 illustrates that the example is 299311.
not simply a situation of applying multiple Besag, J. (1977). Discussion of Modeling spatial
methods until we get the answer we desire, patterns by B.D. Ripley. Journal of the Royal
but rather an example of the sorts of patterns Statstical Society, Series B, 39: 192225.
not considered by many common summaries Besag, J. and Newell, J. (1991). The detection of
of spatial pattern, and how some potentially clusters in rare diseases. Journal of the Royal
interesting patterns may be missed by some Statistical Society, Series A, 154: 327333.
methods. Bithell, J. (1990). An application of density estimation
to geographical epidemiology. Statistics in Medicine,
9: 691701.
Chainey, S. (2005). Methods and techniques for
ACKNOWLEDGMENTS understanding crime hot spots. In: Mapping Crime:
Understanding Hot Spots. Eck, J.E., Chainey, S.,
Cameron, J.G., Leitner, M. and Wilson, R.E.
Thanks to John Richardson, a toxicologist (eds.), National Institute of Justice Report NCJ
for US EPA Region IV, who provided the 209393. Washington DC: United States Depart-
initial sketch that became Figure 16.1. In ment of Justice, Ofce of Justice Programs,
a simple diagram, he provided a summary pp. 1534.
of many important issues relating to the Cliff, A.D. and Ord, J.K. (1973). Spatial Autocorrelation.
cluster/clustering detection problem. London: Pion.
DETECTION OF CLUSTERING IN SPATIAL DATA 319
Cressie, N.A.C. (1993). Statistics for Spatial Data, Spatial and Space-time Scan Statistics. Bethesda,
Revised Edition. New York: John Wiley and MD: National Cancer Institute.
Sons.
Kulldorff, M., Tango, T. and Park, P.J. (2003). Power
Cuzick, J. and Edwards, R. (1990). Spatial clustering comparisons for disease clustering tests. Statistics in
for inhomogeneous populations (with discussion). Medicine, 42: 665684.
Journal of the Royal Statistical Society, Series B, 52:
Langworthy, R.H. and Jefferis, E.S. (2000). The utility of
73104.
standard deviation ellipses for evaluating hot spots.
Denison, D. and Holmes, C. (2001). Bayesian In: Analyzing Crime Patterns: Frontiers of Practice.
partitioning for estimating disease risk. Biometrics, Goldsmith, V., McGuire, P.G., Mollenkopf, J.H.
57: 143147. and Ross, T.A. (eds.), Thousand Oaks, CA: Sage
Publications, Inc.
Diggle, P.J. (2003). Statistical Analysis of Spatial
Point Patterns, Second Edition. New York: Oxford Lawson, A.B. (2001). Statistical Methods in Spatial
University Press. Epidemiology. Chichester: John Wiley & Sons.
Eck, J.E., Chainey, S., Cameron, J.G., Leitner, M. Lawson, A.B. and Denison, D.G.T. (2002). Spatial
and Wilson, R.E. (2005). Mapping Crime: Under- Cluster Modelling. Boca Raton FL: Chapman &
standing Hot Spots. National Institute of Justice Hall/CRC.
Report NCJ 209393. Washington DC: United
Lawson, A.B. and Williams, F.L.R. (1993). Applications
States Department of Justice, Ofce of Justice
of extraction mapping in environmental epidemiol-
Programs.
ogy. Statistics in Medicine, 12: 12491258.
Elliott, P., Cuzick, J., English, D. and Stern, R. (1992). Leong, T. (2005). First- and second-order properties of
Geographical and Environmental Epidemiology: spatial point processes in biostatistics. Unpublished
Methods for Small-Area Studies. Oxford: Oxford Ph.D. dissertation, Department of Biostatistics,
University Press. Rollins School of Public Health, Emory University.
Elliott, P., Wakeeld, J.C., Best, N.G. and Briggs, D.J. Atlanta, GA.
(1999). Spatial Epidemiology: Methods and Applica- McLafferty, S., Williamson, D. and McGuire, P.G.
tions. Oxford: Oxford University Press. (2000). Identifying crime hot spots using kernel
Goldsmith, V., McGuire, P.G., Mollenkopf, J.H. and smoothing, In: Analyzing Crime Patterns: Frontiers
Ross, T.A. (2000). Analyzing Crime Patterns: of Practice. Goldsmith, V., McGuire, P.G.,
Frontiers of Practice. Thousand Oaks, CA: Sage Mollenkopf, J.H. and Ross, T.A. (eds.), Thousand
Publications, Inc. Oaks, CA: Sage Publications, Inc.
Gumerman, G.J. (1970). Black Mesa: Survey and Mller, J. and Waagepetersen, R. (2002). Statistical
Excavation in Northeastern Arizona, 1968. Prescott Inference and Simulation for Spatial Point Patterns.
College Press. Boca Raton, FL: Chapman & Hall/CRC.
Gumerman, G.J., Westfall, D. and Weed, C.S. (1972). Openshaw, S., Craft, A.W., Charlton, M. and Birch, J.M.
Archaeological Investigations on Black Mesa: The (1988). Investigation of leukaemia clusters by use of
19691970 Seasons. Prescott College Press. a geographical analysis machine. Lancet, 1 (8580):
272273.
Kelsall, J. and Diggle, P.J. (1995). Non-parametric
estimation of spatial variation in relative risk. Patil, G.P. and Taillie, C. (2004). Upper level
Statistics in Medicine, 14: 23352342. set scan statistic for detecting arbitrarily shaped
hotspots. Environmental and Ecological Statistics,
Knorr-Held, L. and Raer, G. (2000). Bayesian detection 11: 183197.
of clusters and discontinuities in disease maps.
Biometrics, 56: 1321. Plog, S. (ed.) (1986). Spatial Organization and
Exchange: Archaeological Survey on Northern Black
Kulldorff, M. (1997). A spatial scan statistic. Com- Mesa. Southern Illinois University Press.
munications in Statistics: Theory and Methods, 26:
14871496. Powell, S. and Smiley, F.E. (2002). Prehistoric Culture
Change on the Colorado Plateau: Ten Thousand
Kulldorff, M. and Information Management Services, Years on Black Mesa. Tucson AZ: The University of
Inc. (2002). SaTScan v. 3.0: Software for the Arizona Press.
320 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
Ripley, B.D. (1977). Modeling spatial patterns (with incidence in upstate New York. American
discussion). Journal of the Royal Statistical Society, Journal of Epidemiology, 132, supplement:
Series B, 39: 172212. S136S143.
Tobler, W. (1970). A computer movie simulating urban Waller, L.A. and Jacquez, G.M. (1995). Disease models
growth in the Detroit region. Economic Geography, implicit in statistical tests of disease clustering.
46: 234240. Epidemiology, 6: 584590.
Turnbull, B.W., Iwano, E.J., Burnett, W.S., Waller, L.A. and Gotway, C.A. (2004). Applied Spatial
Howe, H.L. and Clark, L.C. (1990). Monitoring for Analysis of Public Health Data. Hoboken NJ: John
clusters of disease: Application to leukemia Wiley & Sons.
17
Bayesian Spatial Analysis
Andrew B. Lawson and Sudipto Banerjee
inference, and Bayesian inference. Here settings. When the referencing is done using
we focus on the latter approach. Bayesian coordinates (latitudelongitude, Easting
inference and modeling can be seen as an Northing, etc.) over a domain D, we denote
extension of likelihood methods, but it also it as s D; for instance in two-dimensional
has a fundamentally different view of the domains we have s (sx , sy ). The most
inferential process. frequently encountered scenario observes
a spatial field measured at a finite set
of locations, say S = {s1 , . . . , sn }.
We usually name this a random field,
17.2. NOTATION which we denote as {w(s) : s D} or
simply as w(s) in short. A realization
The following notation will be used through- of this random field will be a vector
out this chapter. A random variate is denoted w = (w(s1 ), . . . , w(sn )).
yi , for an item in a vector. The vector of
these items is y. Often y will be related to
independent variables (such as in a linear
model). In that case the matrix of such 17.2.2. Health data notation
variables can be defined as X. A linear model For health data discussed in this chapter we
can be defined, for a single independent will confine ourselves (mostly) to examining
variable x1 as: count data arising within small arbitrary
administrative areas (such as census tracts,
yi = 0 + 1 x1i + ei . zip codes, postcodes, counties). Define yi as
the count of disease within the ith small area.
Assume that i = 1, . . . , m. For this we need
In general, the matrix formulation of the to define a relative risk for the ith region: i .
model, where i = 1, . . . , n will be: We usually want to make inferences about the
relative risk, in any study.
y = X + e (17.1) We also usually have available an expected
rate for the ith region: ei . Often the count
within the regions will have a Poisson
where y is an n 1 vector of the dependent
distribution, i.e., yi Pois(ei i ).
variable, X is an np matrix of p independent
predictors (or covariates), is a p 1
parameter vector of the corresponding slopes
and e is an n 1 vector of the errors. Often 17.3. LIKELIHOOD AND BAYESIAN
we make distributional assumptions, such as MODELS
e N(0, ) These expressions imply that
the errors are normally distributed with a 17.3.1. Likelihood
zero-vector, 0, as the mean and a covariance
matrix . A random variable X is usually associ-
ated with a distribution which governs its
behavior. We denote this distribution as
f (x | ) where is a parameter. In general,
17.2.1. Point-referenced spatial
could be a vector of parameters and so
data notation
is denoted . In this case we have f (x | ).
As we will be dealing with spatial data, we When a random sample of values of X are
will require some notation specific to such taken {xi , i = 1, . . . , n} then the likelihood is
BAYESIAN SPATIAL ANALYSIS 323
defined as the joint distribution of the sample unobserved effects as random variables, the
values: hierarchical Bayesian approach to statistical
analysis provides a cohesive framework for
(
n combining complex data models and external
f (x | ) = f (xi | ). (17.2) knowledge or expert opinion (e.g., Berger,
i=1 1985; Carlin and Louis, 2000; Robert, 2001;
Gelman et al., 2004; Lee, 2005) In this
approach, in addition to specifying the distri-
It is assumed that conditional on the
butional model f (y | ) for the observed data
sample values are independent. If this were
y = ( y1 , . . . , yn ) given a vector of unknown
not so, then we would require to take
parameters = (1 , . . . , k ), we suppose
the product of conditional distributions in
that is a random quantity sampled from a
equation (17.2). When using the frequentist
prior distribution p( | l), where l is a vector
inferential process it is important to base
of hyperparameters. Inference concerning
decisions about parameters (estimation of
is then based on its posterior distribution:
parameter values or confidence intervals) on
the likelihood function. Maximum likelihood
estimation seeks point estimates of the p(y, | l) p(y, | l)
parameters in by maximising f (x | ) or p( | y, l) = =)
p(y | l) p(y, | l) d
log f (x | ). Testing and interval estimation
is often based on likelihood ratios derived f (y | )p( | l)
=) . (17.3)
for different values of under different f (y | )p( | l) d
hypotheses. Inference for quantities such as
confidence intervals is based on the concept
of repeated experimentation, in that probabil- Notice the contribution of both the data
ity statements are derived based on properties (in the form of the likelihood f (y | )) and
of repeated sequences of experiments. the external knowledge or opinion (in the
form of the prior p( | l)) to the posterior.
If l is known, this posterior distribution is
fully specified; if not, a second-stage prior
17.4. BAYESIAN INFERENCE distribution (called a hyper-prior) may be
specified for it, leading to a fully Bayesian
Fundamental philosophical differences with analysis. Alternatively, we might simply
the frequentist approach are found when a replace l by an estimate l obtained as
Bayesian perspective is assumed. First of the value which maximizes the marginal
all, parameters within Bayesian models are distribution p(y | l) viewed as a function of l.
assumed to be random variables and hence Inference proceeds based on the estimated
are governed by distributions themselves. posterior distribution p( | y, l), obtained by
Hence, there is no longer a fixed (true) value plugging l into equation (17.3). This is called
for a given parameter. Instead an expected an empirical Bayes analysis and is closer to
value or other functional of a distribution maximum likelihood estimation techniques.
can be defined. Because parameters have The Bayesian decision-making paradigm
distributions then the likelihood previously improves on the classical approaches to
defined must be extended to accommodate statistical analysis in its more philosophically
these distributions. sound foundation, its unified approach to
By modeling both the observed data data analysis, and its ability to formally
and any unknown parameter or other incorporate prior opinion or external
324 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
empirical evidence into the results via the and Casella, 2005). Univariate MCMC
prior distribution. Statisticians, formerly algorithms are particularly attractive for
reluctant to adopt the Bayesian approach general purpose implementation, since all
due to general skepticism concerning that is required is the ability to sample
its philosophy and a lack of necessary easily from each parameters complete con-
computational tools, are now turning to ditional distribution, namely p(i | y, j=i ),
it with increasing regularity as classical i = 1, . . . , k. The recently developed
methods emerge as both theoretically and WinBUGS language (www.mrc-bsu.
practically inadequate. Modeling the i s as cam.ac.uk/bugs/welcome.shtml)
random (instead of fixed) effects allows us and the R statistical platform (www.
to induce specific (e.g., spatial, temporal or r-project.org) with its Bayesian
more general) correlation structures among packages are promising steps towards
them, hence among the observed data yi as a general purpose software package for
well. Hierarchical Bayesian methods now hierarchical modeling, though it may be
enjoy broad application in the analysis of insufficiently general in some advanced
complex systems, where it is natural to pool analysis settings, and in any case more work
information across different sources e.g., is needed before it is suitable for routine use
Gelman et al. (2004). by statistical support staff.
Modern Bayesian methods seek complete Statistical prediction in Bayesian settings
evaluation of the posterior distribution using is particularly elegant and intuitive. Let
simulation methods that draw samples from ypred denote the random variables (they
the posterior distribution. This sampling- can be a collection) we seek to predict.
based paradigm enables exact inference Then, we simply treat ypred as a random
free of unverifiable asymptotic assumptions variable whose prior, conditional upon the
on sample sizes and other regularity parameters, is the data likelihood f (y | ).
conditions. A computational challenge in Then, all predictions will be summarized in
applying Bayesian methods is that for many the posterior predictive distribution:
complex systems, the simulations required
to do inference under equation (17.3)
p(ypred | y) = f (ypred | )p( | y) d .
generally involve distributions that are
intractable in closed form, and thus one
needs more sophisticated algorithms to
sample from the posterior. Forms for Once the posterior samples are available
the prior distributions (called conjugate from p( | y), it is routine to draw samples
forms) may often be found which enable from p(ypred | y) using the principle of
at least partial analytic evaluation of these composition: for each posterior draw of , we
distributions, but in the presence of nuisance draw ypred from f (ypred | ). Details of such
parameters (typically unknown variances), methods are particularly well explained in the
some intractable distributions remain. Here texts by Carlin and Louis (2000) and Gelman
the emergence of inexpensive, high-speed et al. (2004).
computing equipment and software comes
to the rescue, enabling the application of
17.4.1. Posterior sampling
recently developed MCMC integration
methods
methods, such as the MetropolisHastings
algorithm (Hastings, 1970) and the Gibbs Practical Bayesian modeling relies upon
sampler (Geman and Geman, 1984; Robert efficient computation of the posterior
BAYESIAN SPATIAL ANALYSIS 325
predictors with full column rank (we assume Following the principle of composition
independent columns so that covariates are sampling, we draw, say for j = 1, . . . , M,
not collinear), is a p1 vector of regression 2( j) IG(n p/2, (n p)s2 ) followed
coefficients, and e is the n 1 vector of by ( j) N(, 2j (X T X)1 ). This yields
uncorrelated normally distributed errors with our desired posterior sample ( ( j) , 2( j) )
common variance 2 . with j = 1, 2, . . . , M. Posterior confidence
To construct a Bayesian framework, we intervals and all inference will again be
will need to assign a prior distribution for carried out using these samples.
(, 2 ) in the above model. For illustration,
consider the non-informative or reference
prior distribution for (, 2 ):
17.5. HIERARCHICAL MODELS
the distribution at all. However, by allowing The basic algorithms used for this
a higher level of variation i.e., hyperpriors for construction are:
, , then we can fix the values of and
without heavily influencing the lower level
1 the Metropolis and its extension, the Metropolis
variation. This allows the data to inform more Hastings algorithm;
about the different parameters in the lower
levels of the hierarchy. 2 the Gibbs Sampler algorithm.
(
P( | y) L(y | ) gi (i ). (17.4)
17.6.2. Metropolis and
i
MetropolisHastings
updates
The aim is to generate a sample from In this case choose a symmetric pro-
the posterior distribution P( | y). Suppose posal q(, ) and define the transition
we can construct a Markov chain with probability as:
state space c , where c k . The
chain is constructed so that the equilibrium
distribution is P( | y), and the chain should ( , )q(, ) if =
be easy to simulate from. If the chain is run p( , ) =
1 q(, )( , ) if =
over a long period, then it should be possible
to reconstruct features of P( | y) from the
realized chain values. This forms the basis
(this is derived from a famous theorem due and identically distributed as N(0, 2 ), where
to Bochner). Further technical details about 2 is a measurement error variance or micro-
positive definite functions can be found in scale variance. The key to incorporating
Cressie (1993), Chils and Delfiner (1999) spatial association is by modeling w(s) as
and Banerjee et al. (2004). a Gaussian Process with spatial variance
Since it is common for spatial data to 2 and a valid correlation function (, )
consist of single observations from a site, with representing parameters that quantify
we often need to assume stationary or correlation decay and smoothness of the
isotropic processes for ensuring estimable resulting spatial surface.
models. Stationarity, in spatial modeling When we have observations, y =
contexts, refers to the setting when (Y (s1 ), . . . , Y (sn )), from n locations, we
C(s, s ) = C(s s ); that is, the covariance treat the data as a partial realization of
function depends upon the separation of a spatial process, modeled through w(s).
the sites. Isotropy goes further and specifies Hence, w(s) GP(0, 2 (, )), is a
C(s, s ) = C(s s ), where s s is zero-centered Gaussian Process with
the distance between the sites. Furthermore, variance 2 and a valid correlation function
we will parametrize the covariance function (d, ), which depends upon inter-site
as C(s s ) = 2 (s s ), where (s s ) distances (dij = si sj ) and a parameter
is called a correlation function and 2 is quantifying correlation decay. Also, we
a spatial variance parameter. In particular, assume (s) are i.i.d. N(0, 2 ). Inferential
we will use the the isotropic exponential goals include estimation of regression
correlation function (d, ) = exp (d), coefficients, spatial and nugget variances,
with d = s s . and the strength of spatial association thro-
ugh distances. Likelihood-based inference
proceeds from the distribution of the data,
17.8.2. Bayesian spatial regression y N(X, ), with = 2 R() + 2 I,
and kriging where X is the covariance matrix and R()
is the correlation matrix with Rij = (dij , ).
There is an expanding literature on modeling See Cressie (1993) for details, including
point-referenced spatial data. The most com- maximum-likelihood and restricted maximum-
mon setting assumes a response or dependent likelihood methods, and Banerjee et al.
variable Y (s) observed at a generic location s, (2004) for Bayesian estimation.
referenced by a coordinate system (e.g., Statistical prediction (kriging) at a new
UTM or latlong), along with a vector of location s0 proceeds from the conditional
covariates x(s). One seeks to model the distribution of Y (s0 ) given the data y.
dependent variable in a spatial regression Collecting all the model parameters into
setting such as: = (, 2 , 2 , , ), we note that
effective range was less than 3000 meters. the exponential correlation function this
Using these priors an MCMC algorithm is approximately 3/. Finally Figure 17.1
was devised to obtain posterior samples. displays an image plot of the estimated
Gibbs updates were used for the regression response surface overlaid with contours
parameters while Metropolis updates were of the estimated spatial random effects
employed for spatial variance components (the w(s)s). The random effects serve to offset
( 2 , 2 ) and the spatial range parameter . the spatially varying density of the response
The CODA package in R (www.r- surface.
project.org) was used to diagnose
convergence by monitoring mixing, Gelman
Rubin diagnostics, autocorrelations, and 17.9. BAYESIAN MODELS FOR
cross-correlations. Analysis was based on DISEASE MAPPING
three chains of 11,000 samples each. The
first 1,000 samples were discarded from In previous sections we have alluded to a
each chain as a part of burn-in. Subsequent simple Poisson model for disease counts. In
parameter estimation and analysis used the fact, this is the basic model often assumed
remaining 30,000 (10,000 3) samples. for small area counts of disease (in tracts, zip
Table 17.1 presents the 95% central codes, counties, etc.). We consider two data
credible intervals for the parameter estimates resolutions here. First we consider case event
based upon the posterior samples. All six data where, within a suitable study region
covariates are significant and perhaps explain (W ), realization of cases arises. The locations
some of the spatial variation in the data, of cases are usually residential addresses.
as is indicated by the spatial variance 2 These form a spatial point process. Often
being smaller than the measurement error data is not available at this level of spatial
variance 2 . The spatial range is calcu- resolution and aggregation to larger spatial
lated as the distance beyond which the units occurs. Aggregated counts of disease
correlation function drops below 0.05; for are often more readily available (e.g., from
2597000
2595000 2596000
Lat. UTM
2594000
Figure 17.1 Contour lines of estimated spatial random effects overlayed on an image plot
of estimated relative density of eastern hemlock. Note, the random effects serve to offset
the spatially varying density of eastern hemlock.
official government sources). Hence, the of locations. Often the natural likelihood
second common data type is disease count model for such data is a heterogeneous
data within small areas. These small areas Poisson Process (PP). In this model, the
are arbitrary with respect to the disease distribution of the cases (points) is governed
process (such as census tracts, counties, by a first-order intensity function. This
postcodes) and form a sub-division of the function, l(s) say, describes the variation
study region. In what follows we will briefly across space of the intensity (density) of
consider case event data, but will concentrate cases. This function is the basis for modeling
discussion on the more commonly available the spatial distribution of cases. we denote
count data type. this model as:
+ +
++ +
+ + +
+
+ + +
+ + + ++
+
++ +
42500 + ++
+
+
+
+ +
+
+
+ + +
+
y
42000
+
+
+ + + +
+ +
+
+ +
+ + +
41500
+
++
+ +
+
+
Figure 17.2 Larynx cancer incident case address locations in NW England (19741983).
l(s) = l0 (s)l1 (s | ).
1 if i 1, . . . , m
yi =
Here the at risk background is represented 0 otherwise
by l0 (s) while the modeled excess risk of
the disease is defined to be l1 (s | ), where i, i = 1, . . . , N
336 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
and the resulting likelihood is just given by: (s) is that it is a random field defined to
be a spatial Gaussian process.
In the intensity (17.8), all the variables
(
N
[l1 (si )]yi can be estimated using maximum likelihood.
L(s | ) = .
1 + l1 (si ) However when a Bayesian approach is
i=1
assumed then all parameters have prior
probability distributions and so we would
By conditioning of the joint set of cases and need to consider sampling the posterior
controls the population effect is removed and distribution given by:
does not require estimation.
P1 (, , | s) L(s | , , ) P0 (, , )
17.9.2. Parametric forms
Often we can define a suitable model for where P0 (, , ) is the joint prior distribu-
excess risk within l1 (s). In the case where tion of the parameters. Assuming indepen-
we want to relate the excess risk to a known dent prior distributions for each parameter
location (e.g., a putative source of pollution) component, i.e., P0 (, , ) = g1 (1 )
then a distance-based definition might be g2 (2 ) g3 (3 ) . . . g () g ( ), this model
considered. For example: can be sampled via standard MCMC algo-
rithms. In intensity (17.9), the spatial com-
ponent (s) would have a spatially correlated
l1 (s) = exp{F(s) + ds } (17.8)
prior distribution and so a Bayesian approach
would be natural.
where is an overall rate parameter, ds is a
distance measured from s to a fixed location
(source) and is a regression parameter, F(s) 17.9.3. Count data
is a design vector with columns representing
spatially-varying covariates, and is a Often only count data is available within a
parameter vector. The variables in F(s) could set of small areas. Denote yi as the count
be site-specific or could be measures on the of disease within the ith small area where
individual (age, gender, etc.). In addition this i = 1, . . ., p. As in the case of case event data
definition could be extended to include other we need to allow for the at risk population
effects. For example we could have: in our models. This can usually be easily
achieved for count data since expected rates
or counts can be obtained or calculated
l1 (s) = exp{F(s) + (s) + ds } for small areas. For example, age sex
standardized rates for census tracts, postal
(17.9) zones, or zip codes are often available from
government sources. Denote these rates as
ei , i = 1, . . ., p. Also, in our model we
where (s) is a spatial process, and is a want to model the relative risk of disease
parameter. This process can be regarded as via the parameter i , i = 1, . . ., p. The
a random component and can include within relative risk will be the focus of modeling
its specification spatial correlation between and it is usually assumed that the {ei }
sites. One common assumption concerning are fixed.
BAYESIAN SPATIAL ANALYSIS 337
The simplest model for such data is a to a mean angle parameter (0 ), while the
Poisson log linear model where: distance component is assumed to be log-
linearly related to risk. The final term i is
meant to repesent unattributed extra variation
yi Poiss(ei i ).
in risk. This could include random effect
terms, such as:
In addtion the relative risk i is usually
modeled with a log link for positivity.
i = ui + i
A simple example could be:
Here, the directional component is summa- where f1 (u) is the CAR prior distribution,
rized by the cosine and sine terms in relation f2 (v) is a zero mean normal distribution,
338 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
f3 () is the joint prior distribution for aggregate level. Often the main issue relates
the regression parameters, f () and f ( ) to making individual level inference from
are prior distributions for the remaining aggregate data. Aggregation or averaging
parameters. Note that and are hyper- induces biases in estimation of parameters
parameters and they have prior distributions for models (see, e.g., Wakefield, 2004). The
as could any hyperparameters within the modifiable areal unit problem (MAUP) is an
other prior distributions (f1 , f2 , f3 ). The prior example of an aggregation-related inference
distributions for regression parameters are problem. Another problem that can arise
often assumed to be independent and each is the misaligned data problem (MIDP).
parameter is often assumed to have a zero This arises when the spatial resolution of
mean normal prior distribution. covariates is different from the outcome
variable. The classic example of this would
be modeling cancer outcomes at zip code
Disease map reconstruction level and relating these to groundwater
Often the main aim of modeling disease inci- uranium measured at point locations (wells).
dence is simply to provide a good estimate A fuller discussion of these issues can be
of disease risk. This can be specified as the found in Banerjee et al. (2004). In general the
relative risk within each region (i ). Hence type of model assumed is often of the form:
the aim is to provide an accurate estimate of
the true underlying risk within the map. Much
recent work has been focussed on this area of log i = xiT + ziT
concern, and many models and approaches
have been developed (see, e.g., Banerjee
where xiT is a row vector of fixed covariate
et al., 2004, section 5.4; Lawson, 2006,
values for the ith small area and is a
Chapter 8.0, Lawson (2008)). Typically a log
corresponding parameter vector, and ziT is a
linear model with random effects is defined:
row vector of random effects and a unit
vector.
log i = 0 + i where i = ui + i .
Surveillance
Here the ui , i terms are CH and UH defined With recent concerns over bioterrorism
as above. This is often called the convolution (Fienberg and Shmueli, 2005; Sosin, 2003;
model and was originally proposed by Besag Lawson and Kleinman, 2005), the focus of
et al. (1991). This model has proved to disease surveillance has become important.
be very robust against mis-specification of Essentially this focus concerns the moni-
the risk, although it can also over-smooth toring of disease incidence with a view to
rates. Lawson et al. (2000), Best et al. detecting aberrations or unusual incidence
(2005) and Hossain and Lawson (2006) have events. This often requires the monitoring of
provided recent simulation-based evaluations large scale databases of health information.
of a range of methods in this area. In addition, the focus of the monitoring could
be a range of effects. There could be a need
to find clusters of disease on maps or change
Ecological analysis points in time series or some mixture of these
This area of focus arises when the risk within effects in spacetime. Detection of change
a small area is to be related to a covari- in multiple time and spatial series is the
ate or covariates usually measured at the focus. This is a challenging area that requires
BAYESIAN SPATIAL ANALYSIS 339
the use of fast computational algorithms The two effects have the following prior
and novel spatial-sequential inference. In distributions:
essence, a range of models found in equa-
tions (17.1)(17.3) above may need to be
examined simultaneously in this analysis. ui CAR(ui , /ni )
SMR
less than 0.5000
0.50010.7800
0.78011.0900
1.09011.5100
1.5101 and over
each region. This can be obtained from a 17.10. SOFTWARE FOR BAYESIAN
posterior sample by averaging the converged MODELING
sample output. The estimates of the relative
risk for the congential abnormalities data Posterior sampling is the commonest
are displayed in Figure 17.4. The posterior approach to Bayesian inference. There is
probability of i > 1 over the whole map is now a range of software that can peform
shown in Figure 17.5 Note that this quantity this task. The best known of these is the
can be used to assess whether ther are any free software WinBUGS (downloadable
areas of significant risk elevation on the from www.mrc-bsu.cam.ac.uk/bugs/). This
map. For more details of this example see package employs both Gibbs sampling and
Lawson et al. (2003: chapter 6). MetropolisHastings updating methods for a
RR
less than 0.3720
0.37210.8230
0.82311.4410
1.44112.2180
2.2181 and over
Figure 17.4 Posterior expected relative risk estimates for the congenital abnormalities data
for South Carolina, 1990.
PP
less than 0.0820
0.08210.2050
0.20510.4170
0.41710.6710
0.6711 and over
Figure 17.5 Posterior probability of exceedance (Pr (i > 1)) for the South Carolina
congenital abnormalities data.
BAYESIAN SPATIAL ANALYSIS 341
wide range of models. The package also has Curve Modelling with Applications to Weed Growth.
a wide range of online runnable examples Biometrics, 61, 617625.
and has a GIS tool called GeoBUGS that Berger, J.O. (1985). Bayesian Decision Theory.
allows mapping of small area data and New York: Springer Verlag.
parameter estimates, as well as spatial Besag, J. and Green, P.J. (1993). Spatial statistics
modeling of various kinds. Bayesian Kriging and Bayesian computation. Journal of the Royal
and both CAR and multivariate CAR models Statistical Society, Series B, 55: 2537.
can be fitted using this package. Facilities Besag, J., York, J. and Molli, A. (1991). Bayesian
also exist within R (e.g. packages such image restoration with two applications in spatial
as bayesm, geoR, geoRglm, MCMCpack, statistics. Annals of the Institute of Statistical
mCmC, spBayes etc.) and MATLAB Mathematics, 43: 159.
(spatial statistics toolbox) to perforn MCMC Best, N., Richardson, S. and Thomson, A. (2005).
computations for Bayesian spatial models. A comparison of Bayesian spatial models for disease
mapping. Statistical Methods in Medical Research,
14: 3559.
Carlin, B.P. and Louis, T. (2000). Bayes and Empirical
ACKNOWLEDGMENTS Bayes Methods for Data Analysis, 2nd edn. London:
Chapman and Hall/CRC Press.
Portions of this research were based upon Chen, M., Shao, Q. and Ibrahim, J. (2000). Monte
data generated in long-term research studies Carlo Methods in Bayesian Computation. New York:
on the Bartlett Experimental Forest, Bartlett, Springer Verlag.
NH, funded by the U.S. Department of Chils and Delner (1999). Geostatistics: Modelling
Agriculture, Forest Service, Northeastern Spatial Uncertainty, p. 43. New York: Wiley.
Research Station. The authors would espe-
Cressie, N.A.C. (1993). Statistics for Spatial Data,
cially like to thank Marie-Louise Smith in the revised edition. New York: Wiley.
USDA Forest Service Northeastern Research
Fienberg, S. and Shmueli, G. (2005). Statistical issues
Station for sharing a data set and Andrew
and challenges associated with rapid detection
Finley in the Department of Forest Resources of bio-terrorist attacks. Statistics in Medicine, 24:
at the University of Minnesota for help with 513529.
the statistical computations.
Gamerman, D. (2000). Markov Chain Monte Carlo:
Stochastic Simulation for Bayesian Inference.
New York: CRC Press.
(1993). Modelling complexity: Applications of Gibbs Lee, P. (2005). Bayesian Statistics, 4th edn. London:
sampling in medicine. Journal of the Royal Statistical Arnold.
Society B, 55: 3952.
Mller, J. and Waagpetersen, R. (2004). Statistical
Hastings, W. (1970). Monte Carlo sampling methods Inference and Simulation for Spatial Point Processes.
using Markov chains and their applications. New York: CRC/Chapman and Hall.
Biometrika, 57: 97109. 44
Neal, R.M. (2003). Slice sampling. Annals of Statistics,
Hossain, M. and Lawson, A.B. (2006). Cluster detection 31: 134.
diagnostics for small area health data: with reference
Ripley, B.D. (1987). Stochastic Simulation. New York:
to evaluation of local likelihood models. Statistics in
Wiley.
Medicine, 25: 771786.
Robert, C. (2001). The Bayesian Choice: A Decision-
Kauth, R.J. and Thomas, G.S. (1976). The tasseled
theoretic Motivation. New York: Springer Verlag.
cap a graphic description of the spectral-temporal
development of agricultural crops as seen by landsat. Robert, C. and Casella, G. (2005). Monte Carlo
In: Proceedings of the Symposium on Machine Statistical Methods, 2nd edn. New York: Springer.
Processing of Remotely Sensed Data, pp. 4151.
Schabenberger, O. and Gotway, C. (2004). Statistical
West Lafayett: Purdue University.
Methods For Spatial Data Analysis. Boca Raton, FL:
Lawson, A.B. (2006). Statistical Methods in Spatial Chapman and Hall/CRC Press.
Epidemiology, 2nd edn. New York: Wiley.
Scheiner, S.M. and Gurevitch, J. (2001). Design and
Lawson, A. B. (2008) Bayesian Disease Mapping: Analysis of Ecological Experiments, 2nd edn. London:
Hierarchical Modeling in Spatial Epidemiology. Oxford University Press.
London: Chapman and Hall/CRC Press.
Sosin, D. (2003). Draft framework for evaluating
Lawson, A.B., Biggeri, A., Boehning, D., Lesaffre, E., syndromic surveillance systems. Journal of Urban
Viel, J.-F., Clark, A., Schlattmann, P. and Divino, F. Health, 80: i8i13. supplement.
(2000). Disease mapping models: an empirical
Spiegelhalter, D.J., Best, N.G., Carlin, B.P. and van der
evaluation. Statistics in Medicine, 19: 22172242.
Linde, A. (2002). Bayesian deviance, the effective
Special issue: Disease mapping with emphasis on
number of parameters and the comparison of
evaluation of methods.
arbitrarily complex models. Journal of the Royal
Lawson, A.B., Browne, W.J. and Vidal-Rodiero, C.L. Statistical Society, 64: 583640.
(2003). Disease Mapping with WinBUGS and
Stein, M. (1999). Statistical Interpolation of Spatial
MLwiN. New York: Wiley.
Data: Some Theory for Kriging, p. 46. New York:
Lawson, A.B. and Cressie, N. (2000). Spatial sta- Springer Verlag.
tistical methods for environmental epidemiology.
Wakeeld, J. (2004). A critique of statistical aspects
In: Rao, C.R. and Sen, P.K. (eds), Handbook
of ecological studies in spatial epidemiology.
of Statistics: Bio-Environmental and Public Health
Environmental and Ecological Statistics, 11: 3154.
Statistics, volume 18, pp. 357396. Amsterdam:
Elsevier. Waller, L. and Gotway, C. (2004). Applied Spatial
Statistics for Public Health Data. New York: Wiley.
Lawson, A.B. and Kleinman, K. (eds) (2005). Spatial
and Syndromic Surveillance for Public Health, p. 45. Webster, R. and Oliver, M. (2001). Geostatistics for
New York: Wiley. Environmental Scientists. New York: Wiley.
18
Monitoring Changes in
Spatial Patterns
Peter A. Rogerson
taken place within the field of epidemiology, new map of burglaries, or the epidemiologist
where there is interest in the detection who maps the locations of new cancer cases
of geographic clusters. Besag and Newell each year. A market researcher may wish to
(1991) suggest three categories for these assess the degree to which customers cluster
statistical approaches. In addition to the around a store, and it may be of particular
global and local statistics outlined above interest to monitor this each month, based
(referred to by Besag and Newell as general upon new sales data. If statistical tests are
and focused tests, respectively), they note simply carried out each time a new map
that there is a separate category for tests for is available, the multiplicity of tests will
the detection of clustering. While global tests increase the likelihood that a false declaration
lead to acceptance or rejection of a specified of significance is made. For instance, if
null hypothesis (perhaps one of spatial 20 tests are carried out using a Type I error
randomness, but more realistically, one where probability of 0.05, we can expect to find
the observed spatial distribution of cases is on average one false rejection of the null
compared with an expected distribution based hypothesis among the 20 tests.
upon population distribution and possibly In this chapter we describe and review
other covariates), they do not indicate the the use of statistical approaches designed for
size and/or location of geographic clusters. carrying out repeated tests concerned with the
Similarly, local tests are limited in the sense evaluation of spatial patterns. The common
that they evaluate only one location. A test objective of such repeated tests is the quick
for the detection of clustering may essentially detection of geographic change (where most
be viewed as a set of local tests (where one commonly the goal is to find new, emergent
or more specifications of potential cluster clusters as quickly as possible). It can be
size are made for many locations within the noted that this objective of prospective,
study area). Scan statistics (Kulldorff and quick detection of temporal change in spatial
Nagarwalla, 1994), and the maximum of pattern differs from that of retrospectively
smoothed Gaussian random fields (Rogerson, finding spacetime interaction in a set of
2001) fall into this category, where the data using a single test such as those
extreme local statistic is assessed, and the outlined by say Knox (1964), Mantel (1967),
multiple hypothesis testing associated with or Raubertas (1988).
carrying out many local tests is accounted for. The development of methods for the
Like other subfields of spatial analysis, surveillance or monitoring of spatial patterns
interest in the statistical analysis of spatial has received much of its impetus during the
patterns and the development of statistical last few years from intense interest in surveil-
methods for cluster detection has grown lance for bioterrorism, and following from
rapidly in the last decade. Waller (Chapter 16 that, interest in public health surveillance.
in this volume) provides a review of many of The recent reviews of outbreak detection
these developments and related issues. algorithms (Buckeridge et al., 2005) and
Spatial statistical tests of null hypotheses control charts for public health surveillance
are almost always carried out on a single set (Woodall, 2006) include discussions of spa-
of data; the hypothesis is accepted or rejected, tial considerations in surveillance and sum-
and ideally the size and location of significant marize the many recent advances in this area.
geographic clustering is revealed. However, In addition, Chapter 9 of Lawson (2001) and
there are many situations where repeated the more recent collection of contributions
tests of this type are required. Imagine the edited by Kleinman and Lawson (2005) also
crime analyst, who, each month, receives a attest to the growing importance of this field.
MONITORING CHANGES IN SPATIAL PATTERNS 345
large changes from the process mean, they are For other choices of k in the range
not as sensitive as other methods at detecting 1/ ARL0 k 1 one can use the more
smaller and therefore more subtle deviations general
from the baseline process. The cumulative
sum (CUSUM) chart was introduced by Page 2
2k 2 ARL0 +2 2k ARL0
(1954); the approach consists of maintaining h ln +1 1.166.
2k 2 ARL0 +1 2k
the cumulative sum of deviations between
(18.2)
observed and expected values. Cumulative
sum methods are covered in detail by
Hawkins and Olwell (1998). For the partic- The Shewhart chart is a special case of the
ular example of standardized, independent, cusum chart, where k is equal to the Shewhart
normally distributed observations (zt ), the control limit and h = 0.
one-sided cumulative sum at time t, St , is: There is a tradeoff between the rate of
false alarms and the ability to detect change
when it actually occurs; the higher the value
St = max(0, St1 + zt k) of ARL0 (and hence the lower the false
alarm rate), the greater will be the time
until true change is detected (as signified by
ARL1 , the average number of observations
where k is a parameter chosen to be equal until an alarm is signaled, once change has
to one-half the size of the deviation that occurred). Moustakides (1986) shows, and
is expected when the process goes out of Frisen and Sonesson (2005) note, that the
control. In this example, the expected value cusum approach minimizes the maximum
of each observation is equal to zero (since expected delay until an alarm is sounded, for
observations have been standardized), and a particular changepoint.
it is easy to see that the cusum is, more
precisely, the cumulative sum of deviations
for observations that exceed their expectation Cusums for Poisson data
by more than k standard deviations. The Regional data to be used for monitoring are
parameter k is almost always chosen in often not normally distributed. For example,
this case to be equal to ; this choice counts of disease or crime incidents are often
minimizes the time it will take to detect a one taken to have a Poisson distribution. Lucas
standard deviation increase in the mean of (1985) gives the Poisson cusum as:
the process. An alarm indicating an increase
in the underlying mean of the process is
St = max(0, St1 + yt k)
declared when the cumulative sum exceeds
some predefined threshold, h (i.e., St > h).
The threshold is chosen in conjunction with where yt is the count at time t. If the expected
a desired value of ARL0 ; for the case count is constant and equal to l0 , the value
of k = , Rogerson (2006) provides the of k is:
following formula:
l1 l0
k=
ln l1 ln l0
ARL0 + 4 ARL0
h ln + 1 1.166.
ARL0 + 2 2 where it is desired to detect an increase
(18.1) in the Poisson parameter from l0 to l1 as
MONITORING CHANGES IN SPATIAL PATTERNS 347
quickly as possible. Lucas gives tables for the where 1/ is the mean time between
threshold h, which is determined from both events.
k and the analysts choice of the in-control To detect a decrease in the mean time
average run length, ARL0 . between events (and hence an increase in
An alternative approach is to attempt to from, say, 0 to 1 ), one can use the
transform the Poisson counts to normality. exponential cusum:
Rossi et al. (1999) find that the following
transformation converts the data, approxi-
St = max(0, St1 xt + k)
mately, to a standard normal distribution:
where xt is the observation at time t and l interest and in the surrounding regions.
is a parameter that dictates the importance The weights define the spatial structure of
of dated information. An alarm is signaled at the alternative, and should be matched as
the first time when the value of zt exceeds a closely as possible with the definition of any
time-varying threshold that over time reaches presumed cluster. The weights for example
an asymptotic limit. In the special case of might decline as the distance from the
l = 1, only current information is used, and region of interest increases. For each time
the method is identical to the Shewhart chart. period, the weighted sum of observations is
The ShiryaevRoberts method, based upon compared with expectations, and deviations
contributions from Shiryaev (1963) and are cumulated; if these deviations exceed a
Roberts (1966), can be derived as a special pre-specified threshold, an alarm signaling a
case of a likelihood ratio method with a possible increase in disease in the vicinity of
noninformative prior distribution on the time the region of interest is sounded. Raubertas
of the changepoint (Frisen and Sonesson, notes some of the complications that
2005). This approach minimizes the expected arise when one wishes to monitor several
time until an alarm following a change. regions simultaneously, since there will
Many other approaches to temporal be correlation in the monitoring statistics
surveillance exist; these range from simple obtained for regions that are close to
calculations of historical limits that are one another (since they will have shared
empirically based upon recent data, to sophis- neighborhoods).
ticated use of time series analysis these are Statistical process control approaches to
reviewed by Farrington and Beale (1998), spatial surveillance may be categorized into
and more recently by Le Strat (2005). those that maintain separate, local charts
for each region (where, like Raubertas,
the regional chart may possibly include
information from a defined neighborhood
around the region), and those that monitor
18.3. SPATIAL SURVEILLANCE
a single, global spatial statistic.
As an example of the latter category,
18.3.1. Brief overview of the
Rogerson (1997) also uses cumulative sum
development of methods
methods to monitor temporal changes in a
for spatial surveillance
global spatial statistic (specifically, Tangos
Like recent developments in spatial cluster 1995 statistic). Each time a new case is
detection, many of the recent developments observed, Tangos statistic is updated and the
in the monitoring of spatial patterns have resulting statistic is then compared with the
occurred within the field of public health. expectation of the statistic (conditional upon
Raubertas (1989) was one of the first to the previous value of the statistic, before
outline how statistical approaches to spatial the new case was observed) under the null
surveillance could be developed, and he did hypothesis of no raised incidence in any
so in the context of disease surveillance. subregion. An alarm is sounded, indicating
Raubertas employed cumulative sum a significant change in the global statistic,
methods to suggest how disease monitoring if deviations between observed and expected
for a particular region within a study area statistics cumulate sufficiently.
could be carried out. Monitoring is based Kulldorff (2001) has extended his spatial
upon forming a weighted sum of the number scan statistic to the case of prospective
of cases occurring both in the region of disease surveillance, by considering the
MONITORING CHANGES IN SPATIAL PATTERNS 349
likelihood of the observed number of events and Clayton, 1993) for an historical period.
in space-time cylinders (where the vertical In particular, they use a logistic equation
axis represents time, and the horizontal to model the probability that a particular
plane represents a region and its surrounding individual is a case. Next, they use the
neighborhood), under the null hypothesis. coefficient estimates to derive the expected
The spatial scan statistic (Kulldorff and probability that an individual becomes a
Nagarwalla 1994) is based upon the like- case during the next time period. Statistical
lihood ratio associated with the number significance is achieved if the observed count
of events inside and outside of a circular of cases is unlikely to have occurred using a
scanning window. The numerator of the ratio binomial distribution based upon the number
is associated with the hypothesis that the of individuals and the predicted probability
rates inside and outside of the rate are resulting from the model.
different, and the denominator of the ratio Other approaches to spatial surveillance
is associated with the null hypothesis that include distance-based methods (see, e.g.,
the rates inside and outside of the window Forsberg et al., 2005), and perspectives
are the same. Likelihood ratios are found that adopt more of a model-based than
using circular scanning windows of various a statistical hypothesis testing perspective
sizes, and the window moves, to scan over (Lawson, 2005).
space. The most unusual window under the
null hypothesis is the one displaying the
maximum likelihood ratio. This maximum
observed ratio is compared with ratios 18.3.2. Spatial issues in spatial
that are simulated by assuming the null surveillance
hypothesis to be true; if for example the
maximum observed ratio is greater than One way to monitor variables for a set of
95% of the simulated ratios, the cluster is said regional subunits is to maintain a separate
to be significant using = 0.05. cusum chart for each subunit. An immediate
For disease surveillance, the circular scan- issue that arises in the context of monitoring
ning windows become cylinders with time on across a set of regional subunits is how
the vertical axis, where the top of the cylinder to properly account for the multiple testing
represents the most recent time period. To across spatial units. If cusum control charts
find space-time clusters as the cylinders are kept for each region, the average run
grow vertically with the progression of time, length between false alarms will be less than
the maximum likelihood ratio concept is that implied by the threshold derived for each
simply generalized. At each time period, the chart (which is based upon the desired ARL).
likelihood of the most interesting cylinder Thus if thresholds for each chart are chosen
(i.e., the one with the highest likelihood using a desired ARL0 of 100, the mean time
ratio) is compared with the likelihood of until the first alarm on at least one of the
the most interesting cylinder generated from charts will be less than 100. More precisely,
many simulations of the null hypothesis. The the average run length between false alarms
popularity of the method has been aided by for a set of m charts (one for each region),
freely available software (SatScan), available ARL0 , will be
at www.satscan.org.
Kleinman et al. (2004) model the count of
1
cases in a small region using covariates in ARL0 = .
a generalized linear mixed model (Breslow 1 (1 1/ARL0 )m
350 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
This is based upon the fact that the time control. This suggests that the adjustments
between false alarms has an exponential for multiple testing may be too severe, and
distribution (Page, 1954), and hence the prob- recent developments in the area of multiple
ability that any single observation leads to a testing can be used to lower the thresholds
false alarm is 1/ARL0 . Alternatively stated, (for a review, see Castro and Singer, 2006).
the ARL to use on each chart is given by: A second reason that equations (18.3) and
(18.4) can be conservative is that they assume
0 that the m regional charts are independent.
1/m 11
1 More commonly, regional charts may exhibit
ARL0 = 1 1 (18.3)
ARL0 spatial dependence; a cusum chart for one
region may look a lot like a chart for a nearby
region. Finally, if emergent clusters might
where, again, ARL0 is the desired time exceed the size of regional subunits, this
between alarm investigations. A computa- will provide a rationale for monitoring local
tionally simpler way to account for the statistics for neighborhoods around regions.
simultaneous monitoring of the m charts is Maintaining separate charts for each region
to use a Bonferroni-type adjustment; instead is a directional scheme; the approach will
of using Equation (18.3) to determine the work very well when the actual change
threshold for each chart, the quantity: occurs in one of the regions (and not,
for example, combinations of regions), but
can lose considerable power in detecting
ARL0 = m ARL0 (18.4) change quickly when changes in other
directions occur. If, for example, an increase
occurs in a neighborhood containing several
is used. Thus if there are m = 10 regional regions (corresponding to several charts), this
units and a desired time between false alarms approach will not be as effective and can
of ARL0 = 100, the threshold for each chart yield longer times to detection than other
is found using ARL0 = 10 (100) = 1000, methods.
together with equation (18.1) or (18.2). In the next section, we examine some alter-
This type of adjustment is appropriate native approaches to multiplicity adjustment.
and will yield the desired ARL when
(a) no spatial autocorrelation in the regional
variables exists, (b) when all regions are in
Monitoring a single local statistic
control, and (c) there is a desire to monitor
Suppose that there is no spatial autocorrela-
individual regions, and not neighborhoods
tion in the regional values being monitored,
around regions. However, Equations (18.3)
and that we suspect that when change occurs,
and (18.4) will often lead to thresholds that
it will occur in the form of increases in
are too conservative (i.e., thresholds that are
a subset of regions comprising a neighbor-
too high). One reason for this is that not
hood. There are at least two ways forward
all m regions may be in-control; we only
if our objective is to detect this increase
require a threshold and false alarm rate that
quickly:
have been adjusted for the number of in-
control regions (which is unknown, but is
less than or equal to m). When a region 1 Keep a single chart for the variable consisting of
goes out of control, other (e.g., surrounding a weighted sum of the regional values (similar to
regions) may simultaneously go out of the suggestion of Raubertas).
MONITORING CHANGES IN SPATIAL PATTERNS 351
2 Use the approach of Healy (1987), which is where = I; the Healy and Raubertas charts
optimal for quick detection of change in a single, will be identical. An important issue is the
hypothesized direction. adjustment for multiplicity; using individual
thresholds for each chart based upon mARL
would be too conservative, since the charts
While these approaches should give iden-
will be correlated (nearby local statistics will
tical results under the conditions specified,
be similar, since they use shared regional
Healys approach is more general, since
values). On the other hand, thresholds based
it can also handle the situation where
on ARL alone would be too liberal, unless the
the underlying variables are correlated.
charts for all local statistics were identical. It
Specifically, when the variancecovariance
is of interest to find the number of effectively
matrix associated with the regional values is
independent charts (say, e); in that case each
designated , the following cumulative sum
individual threshold could then be based
based on vectors of regional observations (xt )
upon e(ARL).
is optimal for detecting a change in mean
Let the regional variables be denoted by
from G to B , where these latter quantities
{yi } and the local statistic to be monitored
are vectors of regional values for the good,
by {zi }. Rogerson (2005) suggests that
in-control, and bad, out-of-control means,
a Gaussian kernel be used to define the
respectively:
neighborhood weights:
where: wij
wij = 3
wij2
j
(B G ) 1
a = 4 5
{(B G ) 1 (B G )}1/2 1 dij2
wij = exp
2 2 (A/m)
and:
where A is the size of the study area, and
2 where is the width of the Gaussian kernel,
D = (B G ) 1 (B G ). expressed in terms of multiples of the square
root of the average regional area. Then one
possibility is to use the following for an
Monitoring many local statistics estimate of e:
simultaneously
ow suppose that we wish to carry out surveil-
m
lance of several such local statistics simulta- e= .
1 + 0.81 2
neously. We could either keep a Raubertas-
type chart for each local statistic, or, more
generally (since it is possible to account This is based upon results reported in
for underlying spatial autocorrelation in the Rogerson (2001), who modified the work
regional values), keep a Healy-type chart for of Worsley (1996) on the use of Gaussian
each region. Consider first the special case random fields to find the probability that
352 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
Frisen, M. and Sonesson, C. (2005). Optimal Lucas, J. M. (1985). Counted data cusums. Technomet-
surveillance. In: Kleinman, K. and Lawson, A.B. rics, 27, 129144.
(eds), (2005). Spatial and Syndromic Surveillance,.
Mantel, N. (1967). The detection of disease clustering
pp. 3152. New York: Wiley.
and a generalized regression approach. Cancer
Gan, F.F. (1994). Design of optimal exponential CUSUM Research, 27: 209220.
control charts. Journal of Quality Technology, 26:
Moran, P.A.P. (1950). Notes on continuous stochastic
109124.
phenomena. Biometrika, 37: 1723.
Getis, A. and Ord, J. (1992). The analysis of
Moustakides, G.V. (1986). Optimal stopping-times
spatial association by use of distance statistics.
for detecting changes in distributions. Annals of
Geographical Analysis, 24: 189206.
Statistics, 14: 13791387.
Hawkins, D.M. and Olwell D.H. 1998.Cumulative
Nelson, L.S. (1984). The Shewhart control chart: tests
Sum Charts and Charting for Quality Improvement.
for special causes. Journal of Quality Technology, 16:
New York: Springer.
237239.
Healy, J.D. (1987). A note on multivariate CUSUM
Ord, J. and Getis, A. (1995). Local spatial auto-
procedures. Technometrics, 29: 409412.
correlation statistics: distributional issues and an
Hunter, J.S. (1986). The exponentially weighted application. Geographical Analysis, 27: 286306.
moving average. Journal of Quality Technology, 18:
Page, E.S. (1954). Continuous inspection schemes.
203210.
Biometrika, 41: 100115.
Kleinman, K. and Lawson, A.B. (eds), (2005). Spatial
and Syndromic Surveillance. New York: Wiley. Raubertas, R.F. (1988). Spatial and temporal analysis
of disease occurrence for detection of clustering.
Kleinman, K., Lazarus, R. and Platt, R. (2004). Biometrics, 44: 11211129.
A generalized linear mixed models approach for
detecting incident clusters of disease: biological Raubertas, R.F. (1989). An analysis of disease
terrorism and other surveillance. American Journal surveillance data that uses the geographic locations
of. Epidemiology, 156: 217224. of the reporting units. Statistics in Medicine, 8:
267271.
Knox, G. (1964). The detection of space-time
interactions. Applied Statistics, 13: 2529. Ripley, B.D. (1976). The second-order analysis of
stationary point processes. Journal of Applied
Kulldorff, M. and Nagarwalla, N. (1994). Spatial Probability, 13: 255266.
disease clusters: detection and inference. Statistics
in Medicine, 14: 799810. Roberts, S.W. (1959). Control chart tests based
on geometric moving averages. Technometrics, 1:
Kulldorff, M. (2001). Prospective time-periodic geo- 239250.
graphical disease surveillance using a scan statistic.
Journal of the Royal Statistical Society Series A, Roberts, S.W. (1966). A comparison of some control
164: 6172. chart procedures. Technometrics, 8: 411430.
Lawson, A. 2001. Statistical methods in spatial Rogerson, P. (1997). Surveillance methods for monitor-
epidemiology. New York: Wiley. ing the development of spatial patterns. Statistics in
Medicine, 16: 20812093.
Lawson, A.B. (2005). Advanced modeling for
surveillance: clustering of relative risk changes. Rogerson, P. (2001). A statistical method for the
In: Kleinman, K. and Lawson, A.B. (eds), (2005). detection of geographic clustering. Geographical
Spatial and Syndromic Surveillance, pp. 3152. Analysis, 33: 215227.
New York: Wiley.
Rogerson, P. (2005). Spatial surveillance and cumula-
Le Strat, Y. (2005). Overview of temporal surveillance. tive sum methods. In: Kleinman, K. and Lawson, A.
In: Kleinman, K. and Lawson, A.B. (eds), Spatial and (eds), Spatial and Syndromic Surveillance for Public
Syndromic Surveillance, pp. 1329. New York: Wiley. Health, pp. 95114. New York: Wiley.
Lucas, J.M. and Saccucci, M.S. (1990). Exponentially Rogerson, P. (2006). Formulas for the design of CUSUM
weighted moving average control schemes: proper- quality control charts. Communications in Statistics
ties and enhancements. Technometrics, 32: 112. Theory and Methods, 35: 373383.
354 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
Rogerson, P. and Yamada, I. (2004). Approaches to Tango, T. (1995). A class of tests for detecting general
syndromic surveillance when data consist of small and focused clustering of rare diseases. Statistics in
regional counts. Morbidity and Mortality Weekly Medicine, 7: 649660.
Report, 53 (Supplement): 7985.
Waller, L. (2006). Detection of clustering in spatial
Rogerson, P. and Yamada, I. (2004). Monitoring change data. Handbook of Spatial Analysis. London: Sage
in spatial patterns of disease: comparing univari- Publications.
ate and multivariate cumulative sum approaches. Waller, L. and Gotway, C. (2004). Applied Spatial
Statistics in Medicine, 23: 21952214. Statistics for Public Health Data. New York: Wiley.
Rossi, G., Lampugnani, L., and Marchi, M. (1999). An Wolter, C. (1987). Monitoring intervals between rare
approximate CUSUM procedure for surveillance of events: a cumulative score procedure compared with
health events. Statistics in Medicine, 18: 21112122. Rina Chens sets technique. Methods of Information
Shiryaev, A.N. (1963). On optimum methods in quickest in Medicine, 26: 215219.
detection problems. Theory of Probability and its Woodall, W.H. (2006). The use of control charts in
Applications, 8: 2246. health-care and public health.
Skellam, J.G. (1952). Studies in statistical ecology. Worsley, K.J. (1996). The geometry of random images.
I. Spatial pattern. Biometrika, 39: 346362. Chance, 9 (1): 2740.
19
Case-Control Clustering for
Mobile Populations
Geoffrey M. Jacquez and Jaymie R. Meliker
The effect of [human] mobility could be a when assessing case-clustering often do not
timespace lag between causes and effects that adequately account for known risk factors
makes conventional mapping spurious.
A. Shaerstrom (2003) (e.g., smoking), covariates (e.g., age, gender,
race, education, etc.) and the spacetime
lag between exposure and disease. This
chapter is based closely on two previous
19.1. INTRODUCTION papers published by our research group
(Jacquez et al., 2005, 2006). It provides
Traditionally, geographic clustering tech- background on human mobility and its
niques have concerned themselves with implications in disease clustering, and then
static spatial distributions in which human offers an approach for analyzing case-
mobility is ignored. For example, within the control data for mobile individuals that
case-control framework, place-of-residence addresses latency and incorporates covariates
at time of diagnosis or death is often and other risk factors in the analysis.
analyzed even though there may be a Called Q-statistics, this approach is used for
substantial space time lag or latency between analyzing clustering in case-control data for
timing of causative exposures and disease mobile individuals. An example analysis of
diagnosis. The few techniques currently bladder cancer in southeastern Michigan is
available for accounting for human mobility presented within an inductive framework in
356 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
Market Corby 5
Harborough 1 Beds
2 Bucks 3
3 Cambs 6 1
4 Herts
5 Leics 2 4
6 Northants
Kettering 7 Greater London
7
Life time tracks
10 km
Area shown in A
(a)
100 km
(b)
10000 km
1000 km
(c) (d)
Figure 19.1 Exponential increase in lifetime distances traveled over generations of males
from great grandfather (A), grandfather (B), father (C), and son (D). From Bradley (1988),
with kind permission of Dr. Bradley and Springer Science and Business Media.
CASE-CONTROL CLUSTERING FOR MOBILE POPULATIONS 357
and that this risk is time-dependent, is a as individuals move throughout their days
fact for almost all human diseases, including and lives. In the context of human health
infectious as well as chronic diseases such studies these have been called geospatial
as cancer. Goodchild (2000) referred to lifelines, and their mathematical representa-
the failure to appropriately represent the tion, properties, and means of analysis have
time dimension as a static world-view. become important research topics. Sinha
To date, many disease clustering methods and Mark (2005) employed a Minkowski
have been based on a static world-view in metric to quantify the dissimilarity between
which individuals are considered immobile, the geospatial lifelines of cases and con-
migration between populations does not trols, and suggested that their technique
occur, and in which background disease could be used to evaluate differences in
risks under the null hypothesis are assumed exposure histories between the case and
to be time-invariant and uniform through control populations. The Minkowski metric
geographic space. As a result, many of the provides a global measure of dissimilarity
applications in the published literature suffer between cases and controls, but does not
from violations of fundamental assumptions identify where or when these dissimilari-
that are inherently unrealistic (Jacquez, ties occur. Using k-function analysis, Han
2004). et al. (2004) evaluated clustering of breast
cancer in two New York state counties and
detected significant spatial clustering at the
global level. Their approach incorporated
19.2. CONSEQUENCES OF THE knowledge of residential locations of both
STATIC WORLD VIEW IN cases and controls at biologically relevant
DISEASE CLUSTERING ages in a womans life, namely at birth,
menarche, and at womans first birth. The
When analyzing chronic diseases such as k-function was applied to the spatial pattern
cancer, causative exposures may occur over described by place of residence at specific
a long time period, and the disease may time slices in the participants lives. Sabel
be manifested only after a lengthy latency and colleagues (Sabel et al., 2000, 2003) used
period. During this latency period individuals residential histories to analyze clustering of
may move from one place of residence to cases of motor neurone disease in Finland.
another. This can make it difficult to detect They calculated risk surfaces using kernel
clustering of cases in relation to the spatial functions that were weighted by duration at
distribution of their causative exposures. Yet specific locations of residence. This approach
the static spatial point distribution is the point thus used the residential history information
of departure for many clustering approaches, more fully, but ignored the temporal ordering
including Turnbulls test (Turnbull et al., of place of residence.
1990), and tests suggested by Cuzick and Jacquez and colleagues (2005) developed
Edwards (1990), Besag and Newell (1991), global, local and focused versions of so-
the Bernoulli form of the scan test (Kulldorff called Q-statistics that evaluate clustering
and Nagarwalla, 1995), Tango (1995), and a in residential histories using case-control
host of others. Especially for chronic diseases data. Their approach is based on a space
with long latencies, human mobility must be time representation that is consistent with
accounted for. Hagerstrands model of spacetime paths,
Hagerstand (1970) developed conceptual and evaluates local, global, and focused
models of the spacetime paths formed clustering of the residential histories of the
CASE-CONTROL CLUSTERING FOR MOBILE POPULATIONS 359
cases relative to the residential histories and biologically active compounds are of
of the controls. One of the benefits of anthropogenic origin (e.g., PCBs) and were
the different versions of the Q-statistics not present in the environment in prior
is their ability to quantify what is hap- generations.
pening at the local, spatial, and temporal As noted earlier, the majority of cluster
scales that are of relevance to individuals, methods assume a static geography and work
while also providing global statistics for with static spatial point patterns (instead of
evaluating aggregations of cases. But their location histories) to represent cases and
approach did not incorporate explicit models controls. The spatial coordinate employed
of disease latency, nor did it account may be the place of residence of cases
for those times in a persons life when at time of diagnosis, death, hospitalization,
they might be most susceptible to specific or whatever health-related event is being
exposures. studied. But clustering of cases at time of
diagnosis or death is often of little scientific
or practical interest in terms of enhancing
our understanding of healthenvironment
19.3. A HISTORICAL PERSPECTIVE relationships. Of greater import is whether
ON LATENCY MODELS there is clustering in the locations where
the causative exposures occurred, but this
It seems a truism to observe that people question cannot be adequately addressed
are mobile, the environment varies through by techniques that employ a static world-
time, and that populations grow and their view because those approaches implicitly
composition changes, thereby complicating assume the duration between exposure and
the adjustment for covariates. We therefore the date of the health related event (e.g.,
need to understand the contributions to diagnosis, death) to be negligible. When
individual exposure that transpire at home, exploring spacetime interaction whether
at work, and while commuting. Substantial nearby cases tend to occur at about the
disease latencies may need to be accounted same time the Knox test (Knox, 1964)
for, and an individuals susceptibility to employs critical time and space distances
disease and to environmental insults may that may be specified to reflect a latency
vary with age. Metabolic responses may period and the average distance individuals
be non-linear and synergistic, and observed might move during this period. But to
impacts of current exposures may be medi- date and to our knowledge none of the
ated by past exposures. Enzymes involved available tests for geographic clustering take
in metabolism may be inducible, such as into account disease latency for location
the example of alcohol dehydrogenase and histories. Methods for addressing this need
alcohol metabolism. In addition, exposures are proposed later in this chapter.
are temporally dynamic, may be episodic For purposes of this chapter we make
or cyclic, and can occur on time scales a distinction between the evolution of risk
including days, weeks, years, decades, and through time of a known exposure (e.g., when
potentially over the entire life-course. For the exposure began, ended, the mid-point, as
example, in summer, air pollutants may vary well as changes in the exposure level through
over the course of day; while concentrations time) and the definition of a time window
of naturally-occurring metals in groundwa- within which an unknown exposure might
ter may be relatively static over months have occurred that plausibly could explain
and even years. And certain carcinogens a known disease outcome (what we refer to
360 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
in this article as the exposure window). Let how can exposures during the life course be
us now consider approaches that have been accounted for when modeling latency and
used for modeling latency. exposure windows? Robins and Greenland
Langholz et al. (1999) observed that (1991) showed that in cohort analyses, years
effects of latency as described in the epi- of life lost (YLL) due to early exposures
demiological literature are largely insufficient cannot be estimated without bias in the
for addressing questions related to public absence of causal models for how exposure
health. They proposed latency models using causes death. Morfeld (2004) demonstrated
bilinear and exponential decay functions, this result analytically, resulting in a pro-
and fitted these models to case-control posed framework for formulating such causal
data within a likelihood framework. Their models (e.g., Robins G-estimation procedure
working definition of latency is the function (Robins, 1997; Rothmann and Greenland,
describing how the relative risk associ- 1998) that can be used to estimate the latency
ated with a known exposure changes through between exposure and death). Of course any
time. So, for example, in their analysis of results from an exploratory analysis with
lung cancer in a cohort of uranium miners no a priori hypothesis would need to be
they found that . . . relative risk associated verified with another study. A model that
with exposure increases for about 8.5 years links exposures and latency periods to the
and thereafter decreases until it reaches health outcomes thus appears to be required
background levels after about 34 years. in order to evaluate alternative specifications
As for most latency models of occupational of exposure windows, an important result that
studies, Langholzs metric was calculated we will refer to later in this chapter.
for a known exposure for example, the For purposes of clustering, the putative
period of employment. For purposes of exposure is often unknown, and we therefore
clustering we are interested in determining must be able to handle uncertainty in
whether the residential histories of cases exposure windows. Later in this chapter
clustered during those times when causative we define approaches for explicitly model-
exposures plausibly might have occurred, ing exposure windows, and for specifying
but we do not necessarily know what those sampling distributions for exposure win-
exposures might be. We thus wish to use our dows. These can then be used to evaluate
admittedly inadequate knowledge of cancer the sensitivity of the cluster statistics to
latency to define exposure windows that alternative specifications of and uncertainty
bracket those time periods within which an in the exposure windows. But in general,
environmental exposure might be associated the latency model employed should be
with an observed cancer. This could indicate, specified to correspond to some a priori
for example, those times in a persons life hypothesis regarding disease causation a
when exposures (should they occur) are most causal model.
likely to result in a cancer at some later date.
This is an important distinction that, as noted
above, must be kept in mind for the remainder
of this chapter. 19.4. AGE-DEPENDENT MODEL OF
Exposures early in life and over an DISEASE LATENCY AND
individuals life course may be important EXPOSURE WINDOWS
risk factors for the onset of chronic diseases
such as cancer (Barker, 1992; Han et al., Detailed specification of a latency model
2004; Kuh and Ben-Shlomo, 1997). But requires a causal model of how disease results
CASE-CONTROL CLUSTERING FOR MOBILE POPULATIONS 361
in death. At this writing our knowledge of exposure might have occurred and given rise
the causes of most cancers is incomplete to the observed cancer that time interval
and in almost all instances is insufficient from E0 (Ad ) and E1 (Ad ).
to fully specify such a model. But in For the purposes of this chapter we will
order to tackle this problem it first is assume the age at which latency begins
necessary to develop an understanding of is the age at which the exposure window
the information the construction of such ends (E1 (Ad )) although this does not have
a model might require. We therefore now to be the case and the modeling approach
consider how one might construct and then (below) is readily adapted to instances in
employ a model of disease latency within which the end of the exposure window
the framework of Q-statistics, using a simple is not the same as the beginning of the
and necessarily unrealistic age-dependent latency period. We would like to model the
function as our point of departure. As more exposure window and latency as functions
realistic models of disease causation are of the age at diagnosis, Ad . The duration
developed they may be radically different in of the exposure window is therefore age
form and will replace what we acknowledge dependent and we now write E(Ad ) =
is a simplistic first step. But for now and E1 (Ad ) E0 (Ad ), and the duration of the
for convenience define the latency L(Ad ) latency period is L(Ad ) = Ad E1 (Ad ).
as the duration between the age of the For our purposes we wish to construct
participant at the time of onset of the a model of E(Ad ) + L(Ad ) so that
condition, E1 (Ad ) (age of the participant the duration of the latency period and
at that date when the participant has the exposure window becomes shorter as the
beginnings of a cancer, yet to be diagnosed) age at diagnosis decreases, since we wish
and the age at diagnosis, Ad (Figure 19.2). to avoid implausible situations such as
Further, suppose the exposure window the causative exposures occurring after the age
time in an individuals life course when at diagnosis. Notice, however, that the
he or she is biologically vulnerable should model can be specified in a manner that
an exposure occur commences at age would allow maternal exposures prior to
E0 (Ad ) and ends at age E1 (Ad ). Recall conception. We would also like the model
the distinction made in the Introduction to allow in utero exposures occurring after
regarding exposure windows and an actual conception. To accomplish these objectives
exposure. The exposure window is simply we employ a modified form of the logistic
that time in a persons life when a causative equation initially attributed to Verhulst (1838,
1845). Define the variable g at a given age of
diagnosis to be:
E(Ad ) L(Ad )
E(Ad ) + L(Ad )
g(Ad ) = . (19.1)
t max(E(Ad ) + L(Ad ))
E 0(Ad ) E 1(Ad ) Ad
ages considered. Now define the parameter for which the exposure and its timing
g0 to be: are known (or at least presumed known,
being related for example to employment
dates), as well as the date of diagnosis or
min(E(Ad ) + L(Ad ))
g0 = . (19.2) death. Since in our case the exposures are not
max(E(Ad ) + L(Ad )) observable we require a sampling distribution
for exposure windows that will allow us to
This is the smallest possible value of g(Ad ). assess the sensitivity of any observed case
The model of the latency and exposure clustering to uncertainty in that exposure
window as a function of age is then: window.
We will accomplish this by modeling
exposure windows for an individual with
1 a given age at diagnosis as the waiting
g(Ad ) = . (19.3)
1 time from the beginning of the exposure
1+ 1 erAd
g0 window (E0 (Ad )) to the end of that exposure
window (E1 (Ad )). Our approach will be to
find the duration of the exposure window for
Here r is a parameter describing the rate
individuals of a given age using the function
of increase of g(Ad ) as a function of age
in equation (19.3) and solving for E(Ad )
at diagnosis, with positive values indicating
in equation (19.1). We then obtain individual
that the time period between the onset
realizations of that exposure window by
of the causative exposure and the end of
sampling from a distribution of waiting times.
the latency period increases as the age at
Suppose we define events as being the
diagnosis increases (Figure 19.3). Hence
beginning and end of an exposure window,
equation (19.3) is how we model g(Ad ) and
and that these events are separated by a
equation (19.1) is the relationship between
waiting time E(Ad ). Assume E0 (Ad ) and
g(Ad ) and the latency and exposure windows
E1 (Ad ) are Poisson distributed and that the
at a given age of diagnosis.
Poisson process has intensity l. For a given
waiting time we can estimate the intensity
of the Poisson process adjusting for edge
19.5. SAMPLING DISTRIBUTIONS effects as:
FOR EXPOSURE WINDOWS
2
l = . (19.4)
With an age-dependent model of the latency E(Ad ) + 1
and exposure windows defined we now
concern ourselves with models of their
Or when ignoring edge effects as:
uncertainty. Recall that exposure windows
represent that time interval within which a
causative environmental exposure plausibly 1
l = . (19.5)
could have occurred. Notice that we observe E(Ad )
the cancer outcome (e.g., date of diagnosis)
but do not know whether the cancer was in The cumulative distribution of E(Ad ) is
fact caused by an environmental exposure, then estimated by:
nor what the exposure might actually be.
This is in contrast to models of latency
that were summarized in the Introduction, D(E(Ad )) = 1 elE(Ad ) (19.6)
CASE-CONTROL CLUSTERING FOR MOBILE POPULATIONS 363
25 95
20 75
Age at event
15
E + L
55
10 35
5 15
0 5
0 50 100 0 50 100
Age at diagnosis Age at diagnosis
35 95
30
75
25
Age at event
E + L
20 55
15
35
10
5 15
0 5
0 50 100 0 50 100
Age at diagnosis Age at diagnosis
Figure 19.3 Age dependent model of exposure window and latency. The sum of the
exposure window plus the latency as a function of age at diagnosis is shown in the rst
column. The second column shows the age at diagnosis (top solid line), the age at the end of
the exposure window (dashed line) and the age at the beginning of the exposure window
(bottom solid line). Top row: r = 0.05; bottom row r = 0.125. Minimum latency is 0.375
years, maximum latency is 15 years. Minimum exposure window is 0.375 years, maximum
exposure window is 15 years.
for risk factors, covariates, and disease relationships, and define a nearest neighbor
latency. We then describe an experimental indicator to be:
data set for bladder cancer in southeastern
Michigan, and apply some of these new
methods to this dataset to illustrate the 1 if and only if j is a k nearest
approach. i,j,k,t = neighbor of i at time t
Jacquez et al. (2005, 2006) developed 0 otherwise
global, local, and focused tests for case- (19.10)
control clustering of residential histories
for use with chronic diseases such as
cancer and that account for covariates We then can define a binary matrix of kth
and other risk factors such as smoking. nearest neighbor relationships at a given
Readers unfamiliar with Q-statistics may time t as:
wish to refer to the original works. We
now briefly present these techniques and
k,t =
then extend them to account for exposure
windows.
Define the coordinate ui,t = {xi,t , yi,t } 0 1,2,k,t 1,N,k,t
to indicate the geographic location of the 2,1,k,t 0
.
ith case or control at time t. Residential
histories can then be represented as the set N1,N,k,t
of spacetime locations: N,1,k,t N,N1,k,t 0
(19.11)
Ri = {ui0 , ui1 , . . ., uiT }. (19.8)
This matrix enumerates the k nearest neigh-
bors (indicated by a 1) for each of the N
This defines individual i at location ui0 at the
individuals. The entries of this matrix are 1
beginning of the study (time 0), and moving
(indicating that j is a k nearest neighbor of
to location ui1 at time t = 1. At the end
i at time t) or 0 (indicating j is not a k nearest
of the study individual i may be found at
neighbor of i at time t). It may be asymmetric
uiT . T is defined to be the number of unique
about the 0 diagonal since nearest neighbor
location observations on all individuals in
relationships are not necessarily reflexive.
the study. We now define a case-control
Since two individuals cannot occupy the
identifier, ci , to be:
same location, we assume at any time t
, that any individual has k unique k-nearest
1 if and only if i is a case neighbors. The row sums thus are equal to
ci = (19.9) k(i,,k,t = k) although the column sums vary
0 otherwise.
depending on the spatial distribution of case
control locations at time t. The sum of all the
Define na to be the number of cases elements in the matrix is Nk.
and nb to be the number of controls. Alternative specifications of the proximity
The total number of individuals in the metric may be used the metrics do not
study is then N = na + nb . Let k indicate have to be nearest neighbor relationships in
the number of nearest neighbors to con- order for the Q-statistics to work. We prefer
sider when evaluating nearest neighbor to use nearest neighbor relationships because
CASE-CONTROL CLUSTERING FOR MOBILE POPULATIONS 365
they are invariant under changing population significance of the above statistics. This is
densities, unlike geographic distance and accomplished by holding the location histo-
adjacency measures. There is also some ries for the cases and controls constant, and
evidence that nearest neighbor metrics are by then sprinkling the case-control identifiers
more powerful than distance- and adjacency- at random over the residential histories. This
based measures (Jacquez and Waller, 1997). corresponds to a null hypothesis where the
Still, one then may be faced with the probability of an individual being declared a
question of how many nearest neighbors case (ci = 1) is proportional to the number
(k) should I consider? In certain instances of cases in the data set, or:
one may have prior information that suggests
that clusters of a certain size should be
n1
expected, and this can serve as a guide to p(ci = 1|H0,I ) = . (19.13)
n0 + n 1
specification of k. When prior information
is lacking one may wish to explore several
levels of k. In these instances Tango (2000, Here n1 is the number of cases and n0 is
2006) advocates using the minimum p-value the number of controls, and H0,I indicates a
obtained under each level of k considered null hypothesis corresponding to Goovaerts
as the test statistic. Jacquez et al. (2006) and Jacquezs (2004) type I neutral model
evaluated different levels of k to determine of spatial independence. This null hypothesis
sensitivity of the results to specification of k. assumes the risk of being declared a case
Each of these approaches has advantages and is the same over all of the N case and
may be preferred in different situations. controls. When covariates and risk factors
There exists a 1 T + 1 vector denoting are quantified we may wish to incorporate
those instants in time when the system is that information into the null hypothesis. Any
observed and the locations of the individuals case-clustering that is found then will be
are recorded. We can then consider the above and beyond the modeled risk factors
sequence of T nearest neighbor matrices and covariates, and will thus indicate the
defined by: possible presence of risk sources beyond
those specified under this null hypothesis.
Tk = {k,t t = 0. . .T }. (19.12)
case given that persons vector of covariates Step 2. Sprinkle the case-control identier ci over
and risk factors. The linear logistic model the residential histories of the participants
may then be written as: in a manner consistent with the desired
null hypothesis, and conditioned on the
observed number of cases. Assume we
have n1 cases, N participants and that Pi is
logit (p) = log (p/1 p) = + x the probability of the i th participant being
(19.14) a case. Notice the Pi are provided by the
logistic equation.
N 3 Calculate the desired test statistic for exposure
E
QF,k,t = F ,j,k,t cj ej,t . (19.20) traces, for the original (not randomized
j=1 data), Q (e.g., equation (19.20) for focused
clustering, equation (19.19) for local clustering,
etc.).
Here F, j,k,t is 1 if individual j is a k
nearest neighbor of the focus at time t, 4 Assign case-control identiers across the resi-
E
and 0 otherwise. The statistic QF,k,t is the dential histories employing the logistic model
count of the number of cases whose exposure described earlier in order to account for
traces are k nearest neighbors of the focus known risk factors and covariates. This will
at time t. Notice these statistics can also be result in a possible arrangement of cases and
controls (a realization) that accounts for the risk
duration weighted as described by Jacquez
factors and covariates. Hence any statistically
et al. (2005).
signicant clustering observed in the exposure
traces may be attributable to causes other than
the risk factors and covariates included in the
19.8.1. Statistical probability of logistic model.
exposure traces
5 For the realization from (4), calculate the
In order to evaluate whether exposure traces desired test statistic for clustering of exposure
of the cases cluster we first must derive a traces (Q).
procedure for generating representative times
of diagnosis, latency periods, and exposure 6 Repeat (4) and (5) a desired number of times
windows for the controls. Once this is to construct the reference distribution of the
accomplished we will be able to determine statistic under the null hypothesis (the null
whether the exposure traces for the cases distribution of Q).
cluster relative to those so constructed for
7 Evaluate the probability of the observed
the controls. Given the residential history of a
clustering of exposure traces under the null
control, steps involved to accomplish this are: hypothesis by comparing the value of the test
statistic for the observed data (Q ) to the
1 Set the age at diagnosis for each control to be reference distribution for Q from (6).
their age at their time of interview for the study
(notice researchers often may subtract one year
from age at time of interview, to account
for time between diagnosis and interview for 19.9. EXAMPLE: BLADDER CANCER
cases). IN SOUTHEASTERN MICHIGAN
2 Dene the exposure window and latency period A population-based bladder cancer case-control
for each case and control using the time of study is underway in southeastern Michigan.
CASE-CONTROL CLUSTERING FOR MOBILE POPULATIONS 369
A3: There is clustering of bladder cancer cases to address hypotheses A0 and A1. They
about industries known to emit bladder cancer then used the logistic model to adjust
carcinogens that is not explained by known risk for smoking, age, gender, education, and
factors and covariates. race in order to evaluate hypotheses
A2A3, employing the following func-
They used the global and local Q-statistics tion to evaluate the probability of being
not adjusting for covariates and risk factors a case:
2.0359 0.0125 Agei 0.9396 Genderi + 0.1900 Educatei +
0.0557 Racei 0.2438 Cignumi
e
p (ci = 1|xi ) =
2.0359 0.0125 Agei 0.9396 Genderi + 0.1900 Educatei +
0.0557 Racei 0.2438 Cignumi
1+e
(19.21)
Here females experience a higher risk three counties. However, whether these
because controls are in the process of being clusters may be explained by smoking and the
frequency matched to cases in the ongoing covariates age, gender, race, and education
study, and in this dataset, a greater proportion remained to be evaluated.
of cases are females than controls. In this Next, the researchers evaluated hypothesis
chapter, results are presented for k = 7 A2: The clusters may be explained by
nearest neighbors. Results for additional known risk factors and covariates. To
nearest neighbors are discussed in Jacquez accomplish this they incorporated the prob-
et al. (2006). abilities calculated from the logistic model
The first hypothesis A0: Bladder cancer in equation (19.21) into the randomization
cases in southeastern Michigan are not procedure as described in section 19.7.2.
clustered was evaluated without correcting They then recalculated the probabilities of
for the known risk factors and covariates. the global Q statistic used to evaluate A0.
The Global Q statistic was 1.198437 and Because the geometry of the residential
was significant (p = 0.001), and hypothesis histories doesnt change, the values of the
A0 was rejected. Next, hypothesis A1: statistic were unchanged. After adjustment
There is spacetime clustering of bladder for smoking and covariates the P value
cancer cases in southeastern Michigan was slightly increased to 0.003 from 0.001
evaluated using the spatial and temporally before adjustment. Hypothesis A2 was not
local Q-statistics of equations (19.10) and accepted, and the authors concluded the
(19.12) in Jacquez et al. (2005). This global case clustering of residential histories
effectively decomposed the observed global was not sufficiently explained by smoking
clustering into local contributions. Persistent and the covariates. Significant local clus-
case clusters were found in Oakland, Ingham, tering also remained, and was persistent
and Jackson counties. Hypothesis A1 was through time. In all, 26 local clusters were
accepted and Jacquez et al. (2006) concluded significant after covariate adjustment. They
there is persistent case clustering in these were found in Lapeer, Ingham, Oakland, and
CASE-CONTROL CLUSTERING FOR MOBILE POPULATIONS 371
Jackson counties. The clusters in Lapeer key covariates. Until then, we cannot rule
and Jackson counties were comprised of out occupational exposures in explaining the
13 cluster centers, and are ephemeral. focused clustering around certain industries.
The clusters in northwestern Ingham county In the interest of public health, however, it
appeared in 1950, concentrated to the north- is worth exploring those facilities with the
west of Lansing and persisted into 2000. most extreme p-values to single out those
Numerous clusters appeared in central and that consistently are at the center of a cluster
southeastern Oakland county beginning in of cases. Once identified, additional epidemi-
the 1950s and persisted to the present day. ological investigation may be warranted to
The authors suggested that the grouping uncover a biologically plausible exposure,
of these local case clusters into two areas and to determine whether individuals in the
and their persistence through time might vicinity of the operation actually demonstrate
indicate the possible action of a causal agent a body burden for the suspected carcinogen.
or an unknown covariate. They therefore
explored hypothesis A3: There is clustering
of bladder cancer cases about industries
known to emit bladder cancer carcinogens 19.11. DISCUSSION AND FUTURE
that is not explained by known risk factors DIRECTIONS
and covariates. Bladder cancers have a
multiplicity of possible causative exposures. The case-control epidemiological study
Using a database of 268 industries that design provides a wealth of information at
emitted known or suspected bladder cancer the individual level regarding exposures,
carcinogens, they analyzed case clustering risks, risk modifiers, and covariates. When
of residential histories about these indus- designing such a study the researcher often
tries both with and without adjustment for is concerned with assessing a few putative
smoking and the four covariates. The global exposures, and in determining whether there
version of the focused test was significant at are significant differences in these exposures
the 0.015 level before covariate adjustment between the case and control populations.
and remained significant ( p = 0.035) after As such, the case-control design is not
the covariates and smoking were accounted inherently spatial, nor is it particularly well
for. Considering the 268 business address his- suited or even capable of assessing risk
tories one at a time, the researchers found 22 factors other than those specified in the
industries that were significant cluster foci, original design.
located in Oakland (19 clusters), Ingham (2), The approaches described in this chapter
and Jackson (1) counties. Clusters in central may prove to be a highly useful addition
and southeastern Oakland county appeared in to the traditional aspatial case-control design
the 1930s and persisted to the present day. because they allow researchers to identify
The prospect of environmental pollution local groups of individuals whose risk
originating from these facilities being asso- exceeds that accounted for by the known risk
ciated with bladder cancer is intriguing; factors and covariates incorporated under the
however, caution is necessary until the study designed study. Efforts in developing causal
is complete. Occupational histories are being models for latency and exposure timing are
collected and will be incorporated as risk evolving, and the approach outlined here
factors in the logistic regression model, thus will allow researchers to incorporate these
creating a neutral model that includes smok- models into future cluster analyses that
ing and occupational exposures, along with account for human mobility. In addition,
372 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
while the application presented here uses Barker, D. (1992). Fetal and Infant Origins of Adult
residential histories, this approach may also Disease. London: BMJ Publishing.
be used to investigate disease clustering using Bellander, T., Berglind, N., Gustavsson, P., Jonson, T.,
occupational histories, or other forms of Nyberg, F., Pershagen, G. and Jarup, L. (2001).
human mobility. Using geographic information systems to assess
The ability of local and focused tests to individual historical exposure to air pollution
from trafc and house heating in Stockholm.
quantify pockets of cases whose excess risk
Environmental Health Perspectives, 109(6):
might be attributable to specific locations 633639.
or point sources is a powerful addition
Besag, J. and Newell, J. (1991). The detection of
to the inferential toolbox. While such a
clusters in rare diseases. Journal of the Royal
tool can never of itself assess the dose Statistical Society Series A, 154: 143155.
response relationship necessary to attribute
Beyea, J. and Hatch, M. (1999). Geographic exposure
risk to a specific location or point source,
modeling: a valuable extension of geographic
the ability to temporally and geographically information systems for use in environmental
localize the putative exposure source makes epidemiology. Environmental Health Perspectives,
it possible to begin the assessment of 107(suppl 1): 181190.
doseresponse relationships. Once such a Bonner, M.R., Han, D., Nie, J., Rogerson, P.,
putative focus has been identified, the next Vena, J.E., Muti, P., Trevisan, M., Edge, S.B.
step may involve techniques for modeling and Fraudenheim, J.L. (2005). Breast cancer risk
exposure that will provide a more accurate and exposure in early life to polycyclic aromatic
and detailed description of the spatial and hydrocarbons using total suspended particulates as
a proxy measure. Cancer Epidemiology Biomarkers
temporal variability in exposure. And once a and Prevention, 14(1): 5360.
specific point source is identified, the task of
quantifying the type and quantity of releases Bradley, D. (1988). The scope of travel medicine. In:
First Conference on International Travel Medicine,
of agents that plausibly might give rise to the pp. 19. Zurich: Springer Verlag.
observed health outcome may begin.
Brody, J.G., Vorhees, D.J., Melly, S.J., Swedis, S.R.,
Drivas, P.J. and Rudel, R.A. (2002). Using GIS and
historical records to reconstruct residential exposure
to large-scale pesticide application. Journal of
ACKNOWLEDGMENTS Exposure Analysis and Environmental Epidemiology,
12(1): 6480.
This research was funded by grants Cliff, A.D. and Haggett, P. (2003). On changing
R43CA117171, R01CA096002, and contexts for epidemic modeling. In: Toubiana, L.,
R44CA092807 from the National Cancer Viboud, C., Flahault, A. and Valleron, A.-J. (eds),
Institute. The views expressed in this Geography and Health, pp. 118. Paris: Inserm.
publication are those of the researchers and Collia, D.V., Sharp, J. and Giesbrecht, L. (2003). The
do not necessarily represent that of the NCI. 2001 National Household Travel Survey: a look into
the travel patterns of older Americans. Journal of
Safety Research, 34(4): 461470.
Cuzick, J. and Edwards, R. (1990). Spatial clustering
REFERENCES for inhomogeneous populations. Journal of the Royal
Statistical Society Series B, 52: 73104.
Aschengrau, A., Ozonoff, D., Coogan, P., Vezina, R.,
Davies, R. (1964). A History of the Worlds Airlines.
Heeren, T. and Zhang, Y. (1996). Cancer risk
New York: Oxford University Press.
and residential proximity to cranberry cultivation in
Massachusetts, American Journal of Public Health, Elliott, P., Briggs, D., Morris, S., de Hoogh, C.,
86(9): 12891296. Hurt, C., Jensen, T.K., Maitland, I., Richardson, S.,
CASE-CONTROL CLUSTERING FOR MOBILE POPULATIONS 373
Wakeeld, J. and Jarup, L. (2001). Risk of Exposure Analysis and Environmental Epidemiology,
adverse birth outcomes in populations living near 11(3): 231252.
landll sites. British Medical Journal, 323(7309):
Knox, G. (1964). The detection of space time
363368.
interactions. Applied Statistics, 13: 2529.
EPA (2000). Toxics Release Inventory (TRI) Data Files,
Kuh, D. and Ben-Shlomo, Y. (1997). A Life Course
Environmental Protection Agency.
Approach to Chronic Disease Epidemiology: Tracing
Goodchild, M. (2000). GIS and transportation: status the Origins of Ill-health from Early to Later Life.
and challenges. GeoInformatica, 4: 127139. Oxford: Oxford University Press.
Goovaerts, P. and Jacquez, G.M. (2004). Account- Kulldorff, M. and Nagarwalla, N. (1995). Spatial
ing for regional background and population disease clusters: detection and inference. Statistics
size in the detection of spatial clusters and in Medicine, 14(8): 799810.
outliers using geostatistical ltering and spatial
Langholz, B., Thomas, D., Xiang, A. and Stram, D.
neutral models: the case of lung cancer in
(1999). Latency analysis in epidemiologic studies of
Long Island, New York. International Journal of
occupational exposures: application to the Colorado
Health Geographics, 3(1): 14.
Plateau uranium miners cohort. American Journal of
Hagerstrand, T. (1970). What about people in regional Industrial Medicine, 35(3): 246256.
science? Papers of the Regional Science Association,
Long, L. (1992). Changing residence: comparative
24: 721.
perspectives on its relationship to age, sex, and
Han, D., Rogerson, P.A., Nie, J., Bonner, M.R., marital status. Population Studies, 46: 141158.
Vena, J.E., Vito, D., Muti, P., Trevisan, M., Edge, S.B.
McNamee, R. and Dolk, H. (2001). Does exposure
and Freudenheim, J.L. (2004). Geographic clustering
to landll waste harm the fetus? Perhaps, but
of residence in early life and subsequent risk of breast
more evidence is needed. British Medical Journal,
cancer (United States). Cancer Causes and Control,
323(7309): 351352.
15(9): 921929.
Morfeld, P. (2004). Years of Life Lost due to
Jacquez, G.M. (2004). Current practices in the spatial
exposure: Causal concepts and empirical shortcom-
analysis of cancer: ies in the ointment. International
ings. Epidemiologic Perspectives and Innovation,
Journal of Health Geographics, 3(1): 22.
1(1): 5.
Jacquez, G.M., Kaufmann, A., Meliker, J., Goovaerts, P.,
Nuckols, J.R., Ward, M.H. and Jarup, L. (2004).
AvRuskin, G. and Nriagu, J. (2005). Global, local
Using geographic information systems for expo-
and focused geographic clustering for case-control
sure assessment in environmental epidemiology
data with residential histories. Environmental Health,
studies. Environmental Health Perspectives, 112(9):
4(1): 4.
10071015.
Jacquez, G.M., Meliker, J.R., AvRuskin, G.A.,
Nyberg, F., Gustavsson, P., Jarup, L., Bellander, T.,
Goovaerts, P., Kaufmann, A., Wilson, M. and Nriagu,
Berglind, N., Jakobsson, R. and Pershagen, G.
J. (2006). Case-control geographic clustering for
(2000). Urban air pollution and lung cancer in
residential histories accounting for risk factors and
Stockholm. Epidemiology, 11(5): 487495.
covariates. 5: 32 International Journal of Health
Geographics. OLeary, E.S., Vena, J.E., Freudenheim, J.L. and
Brasure, J. (2004). Pesticide exposure and risk
Jacquez, G.M. and Waller, L. (1997). The effect of
of breast cancer: a nested case-control study of
uncertain locations on disease cluster statistics.
residentially stable women living on Long Island.
In: Mowerer, H.T. and Congalton, R.G. (eds),
Environmental Research, 94(2): 134144.
Quantifying Spatial Uncertainty in Natural Resources:
Theory and Application for GIS and Remote Sensing. Reuscher, T., Schmoyer, R. and Hu, P.S. (2002).
Chelsea MI: Arbor Press. Transferability of Nationwide Personal Transporta-
tion Survey data to regional and local scales.
Klepeis, N.E., Nelson, W.C., Ott, W.R., Robinson, J.,
Transportation Research Record, 1817: 2532.
Tsang, A.M., Switzer, P., Behar, J.V., Hern, S. and
Engelmann, W. (2001). The National Human Activity Reynolds, P., Hurley, S.E., Gunier, R.B., Yerabati, S.,
Pattern Survey (NHAPS): a resource for assessing Quach, T. and Hertz, A. (2005). Residential proximity
exposure to environmental pollutants. Journal of to agricultural pesticide use and incidence of breast
374 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
cancer in California, 19881997. Environmental Stellman, J.M., Stellman, S.D., Weber, T., Tomasallo, C.,
Health Perspectives, 113(8): 9931000. Stellman, A.B. and Christian, R. (2003). A geo-
graphic information system for characterizing expo-
Robins, J. (1997). Causal inference from complex
sure to Agent Orange and other herbicides in
longitudinal data. In: Berkane, M. (ed.), Latent
Vietnam. Environmental Health Perspectives, 111(3):
Variable Modeling with Applications to Causality,
321328.
pp. 69117. New York: Springer.
Swartz, C.H., Rudel, R.A., Kachajian, J.R. and
Robins, J. and Greenland, S. (1991). Estimability
Brody, J.G. (2003). Historical reconstruction of
and estimation of expected years of life lost due
wastewater and land use impacts to groundwater
to a hazardous exposure. Statistics in Medicine,
used for public drinking water: exposure assessment
10(1): 7993.
using chemical data and GIS. Journal of Exposure
Rothmann, K. and Greenland, S. (1998). Modern Analysis and Environmental Epidemiology, 13(5):
Epidemiology. Philadelphia: Lippincott-Raven. 403416.
Sabel, C.E., Boyle, P.J., Lytnen, M., Gatrell, A.C., Tango, T. (1995). A class of tests for detecting general
Jokelainen, M., Flowerdew, R. and Maasilta, P. and focused clustering of rare diseases. Statistics in
(2003). Spatial clustering of amyotrophic lateral Medicine, 14(2122): 23232334.
sclerosis in Finland at place of birth and place of
death. American Journal of Epidemiology, 157(10): Tango, T. (2000). A test for spatial disease clustering
898905. adjusted for multiple testing. Statistics in Medicine,
19(2): 191204.
Sabel, C.E., Gatrell, A.C., Lytnen, M., Maasilta, P.
and Jokelainen, M. (2000). Modelling exposure Tango, T. (in press). A test with minimized p-value
opportunities: estimating relative risk for motor for spatial clustering applicable to case-control point
neurone disease in Finland. Social Science and data, Biometrics.
Medicine, 50(78): 11211137. Turnbull, B.W., Iwano, E.J., Burnett, W.S., Howe, H.L.
Schaerstrom, A. (2003). The potential for time and Clark, L.C. (1990). Monitoring for clusters of
geography in medical geography. In: Toubiana, L., disease: application to leukemia incidence in upstate
Viboud, C., Flahault, A. and Valleron, A.-J. (eds), New York. American Journal of Epidemiology, 132
Geography and Health, pp. 195207. Paris: Inserm. (Suppl 1): S136143.
Scheiner, J. and Kaspar, B. (2003). Lifestyles, choice Verhulst, P.F. (1838). Notice sur la loi que la population
of housing location and daily mobility: the lifestyle pursuit dans son accroissement. Correspondance
approach in the context of spatial mobility and Mathematique et Physique, 10: 113121.
planning. International Social Science Journal, 55: Verhulst, P.F. (1845). Recherches Mathematiques sur La
319332. Loi DAccroissement de la Population (Mathematical
Silverman, D., Morrison, A. and Devesa, S. Researches into the Law of Population Growth
(1996). Bladder cancer. In: Schottenfeld, D. and Increase). Nouveaux Memoires de lAcademie Royale
Fraumeni, J. Jr., (eds), Cancer Epidemiology and des Sciences et Belles-Lettres de Bruxelles, 18:
Prevention, pp. 11561179. New York: Oxford Art. 1, 145.
University Press.
Ward, M., Nuckols, J., Weigel, S., Maxwell, S.,
Sinha, G. and Mark, D. (2005). Measuring similarity Cantor, K. and Miller, R. (2000). Geographic
between geospatial lifelines in studies of environ- information systems. A new tool in environ-
mental health. Journal of Geographical Systems, mental epidemiology. Annals of Epidemiology,
7(1): 115136. 10(7): 477.
20
Neural Networks for
Spatial Data Analysis
Manfred M. Fischer
The objective of this chapter is to provide of which Hertz et al. (1991), Ripley (1996)
an entry point and appropriate background, and Bishop (2006) appear to be most suitable
for those spatial analysts wishing to engage in for a spatial analysis audience. Readers
the field of neural networks, required to fully interested in spatial interaction or flow data
realize its potential. The chapter is organized analysis are referred to a paper by Fischer
as follows. In section 20.2 we begin by intro- and Reismann (2002b) to find a useful
ducing the functional form of feedforward methodology for neural spatial interaction
neural network models, including the specific modelling.
parameterization of the nonlinear transfer
functions. Section 20.3 proceeds to discuss
the problem of determining the network
parameters within a framework that involves 20.2. FEEDFORWARD NEURAL
the solution of a nonlinear optimization NETWORKS
problem. Because there is no hope of finding
an analytical solution to this optimization Feedforward neural networks consist of
problem, section 20.4 reviews some of the nodes (also known as processing units or
most important iterative search procedures simply units) that are organized in layers.
that utilize gradient information for solving Figure 20.1 shows a schematic diagram
the problem. This requires the evaluation of a typical feedforward neural network
of derivatives of the objective function containing a single intermediate layer of
known as error function in the machine processing units separating input from output
learning literature with respect to the units. Intermediate layers of this sort are
network parameters, and section 20.5 shows often called hidden layers to distinguish them
how these can be obtained computation- from the input and output layers. In this
ally efficient using the technique of error network there are N input nodes representing
backpropagation. input variables x1 , . . ., xN ; H hidden units
The section that follows addresses the issue representing hidden variables z1 , . . ., zH ;
of network complexity and briefly discusses and K output nodes representing output
some techniques (in particular regularization variables y1 , . . ., yK . Weight parameters are
and early stopping) to determine the number represented by links between the nodes. The
of hidden units. This problem is shown to bias parameters are denoted by links coming
essentially consist of optimizing the com- from additional input and hidden variables
plexity of the network model (complexity x0 and z0 . Observe the feedforward structure
in terms of free parameters) in order to where the inputs are connected only to units
achieve the best generalization performance. in the hidden layer, and the outputs of this
Section 20.7 then moves attention to the issue layer are connected only to units in the
of how to appropriately test the generaliza- output layer.
tion performance of a neural network. Some The term architecture or topology of a
conclusions and an outlook for the future are network refers to the topological arrangement
given in the final section. of the nodes. We call the network architecture
The bibliography that is included intends shown in Figure 20.1 a single hidden layer
to provide useful pointers to the literature network or a two layer rather than a three
rather than a complete record of the whole layer network because it is the number of
field of neural networks. The readers should layers of adaptive weights that is important
recognize that there are several wide rang- for determining the network properties. This
ing text books with introductory character, architecture is most widely used in practice.3
NEURAL NETWORKS FOR SPATIAL DATA ANALYSIS 377
y1 . . . yK Outputs
Second layer of
parameters w (2)
z0 z1 . . . zh . . . zH Hidden units
First layer of
parameters w (1)
. . . . . . Inputs
x0 x1 x2 xn xN
Figure 20.1 Network diagram for the single hidden layer neural network corresponding to
equation (20.6). The input, hidden and output variables are represented by nodes, and the
weight parameters by links between the nodes, where the bias parameters are denoted by
links coming from additional input and hidden variables x0 and z0 . The arrow denotes the
direction of information ow through the network during forward propagation
Kurkov (1992) has shown that one hid- biases.6 These quantities are known as
den layer is sufficient to approximate any activations in the field of neural networks.
continuous function uniformly on a compact Each of them is then transformed using
input domain. But note that it may be a differentiable continuous nonlinear or
more parsimonious to use fewer hidden units activation (transfer) function7 to give the
connected in two or more hidden layers. output:
Any network diagram can be converted
into its corresponding mapping function,
zh = (neth ) (20.2)
provided that the diagram is feedforward as
in Figure 20.1 so that it does not contain
closed directed cycles.4 This guarantees that for h = 1, . . ., H. These quantities are
the network output yk (k = 1, . . ., K) can be again linearly combined to generate the
described by a series of functional trans- input, called netk , that output unit k (k =
formations as follows. First, we form a 1, . . ., K) receives:
linear combination5 of the N input variables
x1 , . . ., xN to get the input, say neth , that
hidden unit h receives:
H
(2) (2)
netk = wkh zh + wk0 . (20.3)
h=1
N
(1) (1)
neth = whn xn + wh0 (20.1) (2)
n=1 The parameters wkh represent the connection
weights from hidden unit h (h = 1, . . ., H)
to output unit k (k = 1, . . ., K), and the
for h = 1, . . ., H. The superscript (1) indi- (2)
wk0 are bias parameters. Finally, the netk
cates that the corresponding parameters are are transformed to produce a set of network
in the first layer of the network. The outputs yk (k = 1, . . ., K):
(1)
parameters whn represent connection weights
going from input n (n = 1, . . ., N) to
hidden unit h (h = 1, . . ., H), and wh0
(1) yk = k (netk ) (20.4)
378 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
to the softmax activation function (Bridle, This error function, say E, is defined in
1994): term of deviations of the network outputs
y = ( y1 , . . ., yK ) from corresponding
desired (target) outputs t = (t1 , . . ., tK ), and
exp (netk ) expressed as a function of the weight vector w
yk = k (netk ) = (20.8)
K
representing the free parameters (connection
exp (netc )
c=1 weights and bias terms) of the network. The
goal of training is then to minimize the error
function so that:
where 0 yk 1 and K k=1 yk = 1.
A neural network with a single logistic
output unit can be seen as a nonlinear min E(w) (20.9)
wW
extension of logistic regression. With many
logistic units, it corresponds to linked logistic
regressions of each class versus the others. where W is a weight space appropriate
If the transfer functions of the output units to the network architecture. The smallest
in a network are taken to be linear, we value of E(w) will occur at a point such
have a standard linear model augmented that the gradient of the error function
by nonlinear terms. Given the popularity of vanishes E(w) = 0, where E(w) denotes
linear models in spatial analysis, this form the gradient (the vector containing the partial
is particularly appealing, as it suggests that derivatives) of E(w) with respect to w.
neural network models can be viewed as A single hidden layer network of the kind
extensions of rather than as alternatives shown in Figure 20.1, with H hidden units,
to the familiar models. The hidden unit generally has many points at which the
activations can then be viewed as latent gradient vanishes. The point w is called a
variables whose inclusion enriches the linear global minimum for E(w) if E(w ) E(w)
model. for all w W. Other minima are called local
minima, and each corresponds to a different
set of parameters. For a successful applica-
tion of neural networks, however, it may not
20.3. NETWORK TRAINING be necessary to find the global minimum,
and in general it will not be known whether
So far, we have considered neural networks the minimum found is the global one or not.
as a general class of parametric nonlinear But it may be necessary to compare several
functions from a vector x of input variables minima in order to find a sufficiently good
x1 , . . ., xN to a vector y of output variables solution of the problem under scrutiny.
y1 , . . ., yK . The process of determining Training is performed using a training
the network parameters is called network set Sp = {(x p , t p ) : p = 1, . . ., P},
training or network learning. The problem consisting of P ordered pairs of vectors. x p
of determining the network parameters can denotes an N-dimensional input vector and t p
be viewed from different perspectives. We the associated K-dimensional desired output
view it as an unconstrained nonlinear func- (target) vector. The choice of a suitable error
tion optimization problem,10 the solution function depends on the problem to be per-
of which requires the minimization of formed. We follow Bishop (1995: chapter 6)
some (continuous and differentiable) error to provide a maximum likelihood motivation
function. for the choice, and start by considering
380 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
E
= ( yk tk ). (20.14)
(
P
netk
p (t| x, w, ) = p (t p | x , w, ).
p
p=1
(20.11) This property will be used when discussing
the technique of error backpropagation in
section 20.5.
Maximizing the likelihood function is equi-
Now let us consider the case of binary
valent to minimizing the sum-of-squares
classification where we have a single target
function given by:
variable t such that t = 1 denotes
class C1 and t = 0 class C2 . We con-
sider a network with a single output whose
P
K
: :
E(w) = 1 :gk (xp , w) t p :2 . transfer function is a logistic sigmoid so
2 k
p=1 k=1 that 0 g(x, w) 1, and we can inter-
(20.12) pret g(x, w) as the conditional probability
p(C1 , x), with p(C2 , x) given by 1 g(x, w).
The conditional probability of targets given
The value of w found by solving equation inputs is then a Bernoulli distribution of
(20.9) will be denoted wML because it the form:
corresponds to the maximum likelihood
estimation. Having formed wML , the noise
precision is then provided by: p(t | x, w) = g(x, w)t {1 g(x, w)}1t .
(20.15)
1
P
: p :
= 1 :g(x , wML ) t p :2 .
ML PK
p=1 If we have a training set of independent
(20.13) observations, then the error function, given
NEURAL NETWORKS FOR SPATIAL DATA ANALYSIS 381
by the negative log likelihood, is the cross- with respect to the activation for a particular
entropy error function of the form: output unit k takes the simple form (20.14)
as in the regression case.
If we have a standard multiple-class
p
classification problem to solve, where each
E(w) = {t p lny p +(1t p )ln(1y p )}
p=1
input is assigned to one of K mutually
(20.16) exclusive classes, then we can use a neural
network with K output units each of which
has a softmax output activation function.
where yp denotes g(xp , w). Note there is no The binary target variables tk {0, 1}
analogue of the noise precision because have a 1-of-K coding scheme indicating
the target values are assumed to be correctly the correct class, and the network outputs
p
labelled. are interpreted as gk (xp , w) = p(tk = 1 | x)
For classification problems, the targets leading to the error function, called the
represent labels defining class membership multiple-class cross-entropy error function
or more generally estimates of the (see Fischer and Staufer, 1999):
probabilities of class membership. If we have
K separate binary classifications to perform, 4 5
P
K
gk (x p , w)
then a neural network with K logistic sigmoid E(w) =
p
tk ln p
output units is an appropriate choice. In p=1 k=1
tk
p
this case a binary class label tk {0, 1} (20.19)
is associated with each output k. If we assume
that the class labels are independent, given
the input vector xp , then the conditional which is non-negative, and equals zero when
p
distribution is: gk (xp , w) = tk for all k and p. Once again, the
derivative of this error function with respect
to the activation for a particular output unit k
(
K
takes the familiar form equation (20.14). It is
p(t | x, w) = gk (x, w)tk [1gk (x, w)]1tk .
worth noting that in the case of K = 2 we can
k=1
(20.17) use a network with a single logistic sigmoid
output, alternatively to a network with two
softmax output activations.
Taking the negative logarithm of the corres- In summary, there is natural pairing of the
ponding likelihood function then yields the choice of the output unit transfer function and
multiple-class cross-entropy error function of the choice of the error function, according
the form: to the type of the problem that has to
be solved. For regression we take linear
outputs and a sum-of-squares error, for
P
K
E(w) =
p p
tk ln yk (multiple independent) binary classifications
p=1 k=1 we use logistic sigmoid outputs with the
corresponding cross-entropy error function,
p p
can take a network with two softmax outputs The training process is maintained on an
(Bishop, 2006: 236). epoch-by-epoch basis until the connection
weights and bias terms of the network
stabilize and the average error over the entire
20.4. PARAMETER OPTIMIZATION training set converges to some minimum.
It is good practice to randomize the order
There are many ways to solve the minimiza- of presentation of training examples from
tion problem (20.9). Closed-form optimiza- one epoch to the next. This randomization
tion via the calculus of scalar fields rarely tends to make the search in the parameter
admits a direct solution. A relatively new set space stochastic over the training cycles, thus
of interesting techniques that use optimality avoiding the possibility of limit cycles in the
conditions from calculus are based on evolution of the weight vectors.
evolutionary computation (Goldberg, 1989; Gradient descent optimization may pro-
Fogel, 1995). But gradient procedures which ceed in one of two ways: pattern mode and
use the first partial derivatives E(w), batch mode. In the pattern mode weight
so-called first order strategies, are most updating is performed after the presentation
widely used. Gradient search for solutions of each training example. Note that the error
gleans its information about derivatives from functions based on maximum likelihood for
a sequence of function values. The recursion a set of independent observations comprise a
scheme is based on the formula:11 sum of terms, one for each data point. Thus:
w( + 1) = w( ) + ( ) d( ) (20.20)
P
E(w) = Ep (w) (20.22)
p=1
where denotes the iteration step. Different
procedures differ from each other with regard
to the choice of step length ( ) and search where Ep is called the local error while E
direction d( ), the former being a scalar the global error, and pattern mode gradient
called learning rate and the latter a vector descent makes an update to the parameter
of unit length. vector based on one training example at a
The simplest approach to using gradient time so that:
information is to assume ( ) being constant
and to choose the parameter update in w( + 1) = w( ) Ep (w( )). (20.23)
equation (20.20) to comprise a small step in
the direction of the negative gradient so that:
Rumelhart et al. (1986) have shown that
d( ) = E(w( )). (20.21) pattern based gradient descent minimizes
equation (20.22), if the learning parameter
is sufficiently small. The smaller , the
After each such update, the gradient is smaller will be the changes to the weights
re-evaluated for the new parameter vector in the network from one iteration to the
w( + 1). Note that the error function is next and the smoother will be the trajectory
defined with respect to a training set SP to in the parameter space. This improvement,
be processed to evaluate E. One complete however, is attained at the cost of a slower
presentation of the entire training set during rate of training. If we make the learning rate
the training process is called an epoch. parameter too large so as to speed up the
NEURAL NETWORKS FOR SPATIAL DATA ANALYSIS 383
rate of training, the resulting large changes where ( ) is a time varying parameter.
in the parameter weights assume such a form There are various rules for determining ( )
that the network may become unstable. in terms of the gradient vectors at time
In the batch mode of training, parameter and + 1 leading to the FletcherReeves and
updating is performed after the presentation PolakRibire variants of conjugate gradient
of all the training examples that constitute algorithms (see Press et al., 1992). The
an epoch. From an online operational point computation of the learning rate parameter
of view, the pattern mode of training is ( ) in the update formula (20.20) involves
preferred over the batch mode, because it a line search, the purpose of which is to find
requires less local storage for each weight a particular value of for which the error
connection. Moreover, given that the training function E(w( )+ d( )) is minimized, given
patterns are presented to the network in a fixed values of w( ) and d( ).
random manner, the use of pattern-by-pattern The application of Newtons method to
updating of parameters makes the search in the training of neural networks is hindered
parameter space stochastic in nature12 which by the requirement of having to calcu-
in turn makes it less likely to be trapped in late the Hessian matrix and its inverse,
a local minimum. On the other hand, the use which can be computationally expensive.
of batch mode of training provides a more The problem is further complicated by
accurate estimation of the gradient vector the fact that the Hessian matrix H would
E. Finally, the relative effectiveness of the have to be non-singular for its inverse
two training modes depends on the problem to be computed. Quasi-Newton methods
to be solved (Haykin, 1994: 152 pp). avoid this problem by building up an
For batch optimization there are more approximation to the inverse Hessian over
efficient procedures, such as conjugate gra- a number of iteration steps. The most
dients and quasi-Newton methods, that are commonly variants are the Davidson
much more robust and much faster than FletcherPowell and the BroydenFletcher
gradient descent (Nocedal and Wright, 1999). GoldfarbShanno procedures (see Press
Unlike steepest gradient, these algorithms et al., 1992).
have the characteristic that the error function Quasi-Newton procedures are today the
always decreases at each iteration unless most efficient and sophisticated (batch)
the parameter vector has arrived at a local optimization algorithms. But they require the
or global minimum. Conjugate gradient evaluation and storage in memory of a dense
methods achieve this by incorporating an matrix H( ) at each iteration step . For
intricate relationship between the direction larger problems (more than 1,000 weights)
and gradient vectors. The initial direction the storage of the approximate Hessian
vector d(0) is set equal to the negative can be too demanding. In contrast, the
gradient vector at the initial step = 0. conjugate gradient procedures require much
Each successive direction vector is then less storage, but an exact determination of the
computed as a linear combination of the learning rate ( ) and the parameters ( )
current gradient vector and the previous in each iteration , and, thus, approximately
direction vector. Thus: twice as many gradient evaluations as the
quasi-Newton methods.
When the surface modelled by the error
d( + 1) = E(w( + 1)) + ( ) d( ) function in its parameter space is extremely
rugged and has many local minima, then a
(20.24) local search from a random starting point
384 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
tends to converge to a local minimum simply termed backprop uses a local mes-
close to the initial point and to a solution sage passing scheme in which information
worse than the global minimum. In order is sent alternately forwards and backwards
to seek out good local minima, a good through the network. Its modern form stems
training procedure must thus include both from Rumelhart et al. (1986), illustrated
a gradient based optimization algorithm for gradient descent optimization applied
and a technique like random start that to the sum-of-squares error function. It
enables sampling of the space of minima. is important to recognize, however, that
Alternatively, stochastic global search pro- error backpropagation can also be applied
cedures might be used. Examples of such to error functions other than just sum-of-
procedures include Alopex (see Fischer et al., squares and to a wide variety of opti-
2003, for an application in the context of mization schemes for weight adjustment
spatial interaction data analysis), genetic other than gradient descent, in pattern or
algorithms (see Fischer and Leung, 1998, batch mode.
for another application in the same context), We describe the backpropagation algo-
and simulated annealing. These procedures rithm for a general network of type (20.6)
guarantee convergence to a global solution that has a single hidden layer, arbitrary
with high probability, but at the expense of differentiable activation functions with a
slower convergence. corresponding local error function Ep (w).
Finally, it is worth noting that the question For each pattern p in the training data set,
whether neural networks can have real- we shall assume that we have supplied the
time learning capabilities is still challenging corresponding input vector xp to the network
and open. Real-time learning is highly and calculated the activations of all the
required by time-critical applications, such hidden and output units in the network by
as for navigation and tracking systems in a applying equations (20.1)(20.4). Recall that
p
GIS-T context, where the data observations each hidden unit h has input neth and output
p p
are arriving in a continuous stream, and zh = h (neth ), and each output unit k has
p p p
predictions have to be made before all the input netk and output yk = k (netk ).
data seen. Even for offline applications, This process is called forward propagation
speed is still a need, and real-time learning because it can be seen as a forward flow
algorithms that reduce training time are of of information (signals) provided by x p
considerable value. through the network. For the rest of this
section we consider one example and drop
the superscript p in order to keep the notation
uncluttered.
We evaluate the gradient Ep with respect
(2)
20.5. ERROR BACKPROPAGATION to a hidden-to-output parameter wkh first, by
(2)
noting that Ep depends on the weight wkh
One of the greatest breakthroughs in neural only via the summed input, netk , to the output
network modelling has been the introduction unit k. Thus, we can apply the chain rule for
of the technique of error backpropagation13 partial derivatives to get:
in that it provides a computationally effi-
cient technique to calculate the gradient
vector of an error function for a feedfor-
Ep Ep netk
ward neural network with respect to the = (20.25)
(2)
wkh netk w(2)
parameters. This technique sometimes kh
NEURAL NETWORKS FOR SPATIAL DATA ANALYSIS 385
netk
H
(2) k = yk (1 yk )( yk tk ). (20.30)
(2)
= (2)
wkh zh = zh . (20.26)
wkh wkh h=0
Ep
(2)
= k zh . (20.28) with:
wkh
K
h := h (neth )
(2)
This equation tells us that the required k wkh (20.32)
(2)
partial derivative with respect to wkh is k=1
obtained simply by the multiplication of two
expressions: the value of for unit k at
where the use of the prime signifies differ-
the output end of the connection concerned
entiation with respect to the argument. In
and the value of z at the input end h of
the case of logistic hidden units we get the
the connection. Thus, in order to evaluate
following backpropagation formula:
the partial derivatives with respect to the
second layer parameters we need only to
compute the value of k for each output unit
K
h = h (neth )
(2)
k = 1, . . ., K in the network, and then apply k wkh
equation (20.28). k=1
For linear outputs associated with the
sum-of-squares error function, for logistic
K
(2)
= (neth )(1 (neth )) k wkh
sigmoid outputs associated with the cross- k=1
entropy error function and for softmax
outputs associated with the multiple-class
K
(2)
cross-entropy error function, the s are = zh (1 zh ) k wkh . (20.33)
given by: k=1
forward pass through the network to calculate of finding a parsimonious model for a real
the zh and yk values by propagating the world problem is critical for all models but
input vector, followed by a backward pass particularly important for neural networks
to calculate k and h , and hence the partial because the problem of overfitting is more
derivatives of the error function. Note that for likely to occur.
the presentation of each training example the A neural network model that is too
input pattern is fixed throughout the message simple (i.e., small H), or too inflexible,
passing scheme, encompassing the forward will have a large bias and smooth out
pass followed by the backward pass. some of the underlying structure in the data
The backpropagation technique can be (corresponding to high bias), while one that
summarized in the following four steps: has too much flexibility in relation to the
particular data set will overfit the data and
Step 1 Apply an input vector x p to the network and have a large variance. In either case, the
forward propagate through the network, performance of the network on new data (i.e.,
using equations (20.1)(20.4), to generate generalization performance) will be poor.
the hidden and output unit activations
based on current weight settings.
This highlights the need to optimize the
Step 2 Evaluate the k for all the output units complexity in the model selection process
(k = 1, . . ., K ) using equation (20.29) or in order to achieve the best generalization
equation (20.30), depending on the problem (Bishop, 1995: 332; Fischer, 2000). There are
type to be studied. some ways to control the complexity of a
Step 3 Backpropagate the deltas, using equation
neural network, complexity in terms of the
(20.33), to get h for each hidden unit
h(h = 1, . . ., H ) in the network. number of hidden units or, more precisely,
Step 4 Use equations (20.28) and (20.31) to in terms of the independently adjusted
evaluate the required derivatives. parameters. Practice in spatial data analysis
generally adopts a trial and error approach
For batch procedures the gradient of the that trains a sequence of neural networks
global error can be obtained by repeating with an increasing number of hidden units
Step 1 to Step 4 for each pattern p in and then selects that one which gives the
the training set, and then summing over all predictive performance on a testing set.14
patterns. There are, however, other more principled
ways to control the complexity of a neural
network model in order to avoid overfitting.15
One approach is that of regularization, which
20.6. NETWORK COMPLEXITY involves adding a regularization term R(w)
to the error function in order to control
So far we have considered neural networks overfitting, so that the total error function to
of type (20.6) with a priori given numbers be minimized takes the form:
of input, hidden and output units. While the
number of input and output units in a neural
network is basically problem dependent, the E(w) = E(w) + R(w) (20.34)
number H of hidden units is a free parameter
that can be adjusted to provide the best
testing performance on independent data, where is a positive real number, the so-
called testing set. But the testing error is not called regularization parameter, that controls
a simple function of H due to the presence of the relative importance of the data dependent
local minima in the error function. The issue error E(w) and the regularization term R(w),
NEURAL NETWORKS FOR SPATIAL DATA ANALYSIS 387
sometimes also called complexity term. This otherwise one solution is arbitrary favoured
term embodies the a priori knowledge about over an equivalent solution. In particu-
the solution, and therefore depends on the lar, the weights should be scale invariant
nature of the particular problem to be solved. (Bishop, 2006: 257258). A regularized
Note that Ep (w) is called the regularized error error function that satisfies this property is
function. given by:
One of the simplest forms of regularizer is
defined as the squared norm of the parameter : :m : :m
vector w in the network, as given by: E(w) + 1 :wq1 : + 2 :wq2 : (20.37)
R(w) = w2 . (20.35) where wq1 denotes the set of the weights in
(1) (1)
the first layer, that is w11 , . . . , wh1 , . . .
(1)
, wHN , and wq2 those in the second layer,
This regularizer16 is known as a weight (2) (2) (2)
that is w11 , . . ., wkh , . . ., wKH . Under
decay function that penalizes large weights.
linear transformations of the weights, the
Hinton (1987) has found empirically that a
regularizer will remain unchanged, provided
regularizer of this form can lead to significant
that the parameters 1 and 2 are suitably
improvements in network generalization.
rescaled.
Sometimes, a more general regularizer is
The more sophisticated control of com-
used, for which the regularized error takes
plexity that regularization offers over adjust-
the form:
ing the number of hidden units by trial
and error is evident. Regularization allows
E(w) + wm (20.36) complex neural network models to be trained
on data sets of limited size without severe
overfitting, by limiting the effective network
where m = 2 corresponds to the quadratic complexity. The problem of determining the
regularizer (20.35). The case m = 1 is appropriate number of hidden units is, thus,
known as the lasso in the statistics literature shifted to one of determining a suitable value
(Tibshirani, 1996b). It has the property that for the regularization parameter(s) during the
if is sufficiently large some of the training process.
parameter weights are driven to zero in The principal alternative to regularization
sequential learning algorithms, leading to as a way to optimize the model complexity
a sparse model. As is increased, so an for a given training data set is the procedure
increasing number of parameters are driven of early stopping. As we have seen in the
to zero. previous sections, training of a nonlinear
One of the limitations of this regular- network model corresponds to an iterative
izer is inconsistency with certain scaling reduction of the error function defined with
characteristics of network mappings. If one respect to a given training data set. For
trains a network using original data and many of the optimization procedures used for
one network using data for which the input network training (such as conjugate gradient
and/or target variables are linearly trans- optimization) the error is a nondecreasing
formed, then consistency requires obtaining function of the iteration steps . But the
equivalent networks which differ only by error measured with respect to independent
a linear transformation of the weights. Any data, called the validation data set, often
regularizer should possess this characteristic, shows a decrease first, followed by an
388 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
increase as the network starts to overfit, as those used for training is known as general-
illustrated in Fischer and Gopal (1994) for ization (see, e.g., Moody, 1992). To assess
a spatial interaction data analysis problem. the generalization performance of a neural
Thus, training can be stopped at the point of network model is of crucial importance.
smallest error with respect to the validation The performance on the training set is not
data, in order to get a network that shows a good indicator due to the problem of
good generalization performance. But, if the overfitting. As often in statistics, there is a
validation set is small, it will give a relatively trade-off between accuracy on the training
noisy estimate of generalization performance, data and generalization. This is a well-
and it may be necessary to keep aside studied dilemma (see, e.g., Bishop, 1995:
another data set, the test set, on which the chapter 9).
performance of the network model is finally The simplest way to assess the gener-
evaluated. alization performance is the use of a test
This approach of stopping training before set. Here, of course, it is assumed that
a minimum of the training error has been the test data are drawn from the same
reached is another way of eliminating population used to generate the training data.
the network complexity. It contrasts with If the test set is too small, an accurate
regularization because the determination of assessment cannot be obtained. Test set
the number of hidden units does not require validation becomes practical only if the data
convergence of the training process. The sets are very large or new data can be
training process is used here to perform generated cheaply. As the training and test
a directed search of the weight space for sets are independent samples, an unbiased
a neural network model that does not estimate of the prediction risk is obtained.
overfit the data and, thus, shows superior But the estimate can be highly variable across
generalization performance. Various theo- different data splittings.
retical and empirical results have provided One way to overcome this problem is by
strong evidence for the efficiency of early cross-validation. Cross-validation is a sample
stopping (see, e.g., Weigend et al., 1991; re-use method for assessing generalization
Baldi and Chauvin, 1991; Finnoff, 1991). performance. It makes maximally efficient
Although many questions remain, a picture use of the available data. The idea is to
is starting to emerge as to the mechanisms divide the available data set into generally
responsible for the effectiveness of this equally sized D parts, and then to use one
procedure. In particular, it has been shown part to test the performance of the neural
that stopped training has the same sort of network model trained on the remaining
regularization effect (i.e., reducing model (D 1) parts. The resulting estimator is again
variance at the cost of bias) that penalty terms unbiased, and we can average the D such
provide. estimates. Leave-one-out cross-validation is
a special case, in which each observation is
tested on the remaining (P 1) observations.
This version evidently requires a large
number of computations. Choosing D = P
20.7. GENERALIZATION should give the most accurate assessment,
PERFORMANCE as the true size of the training set is
most closely mimicked, but also involves
The ability of a neural network to predict the most computation. In addition, cross-
correctly new observations that differ from validation estimates of performance for large
NEURAL NETWORKS FOR SPATIAL DATA ANALYSIS 389
we then randomly set aside P 2 patterns equation (20.38), in the same manner as
for the bootstrap validation set. They are (w ) but with resample SPb3 replacing
picked randomly without replacement and SP 3 and b w replacing w . This yields
removed from the pool. The remaining a sequence of bootstrap statistics,
patterns constitute the training set. This 1 , . . ., B .
process is repeated B times (typically 20 < Step 4: Estimation of the standard deviation
B < 200) to generate b = 1, . . ., B The statistical accuracy of the perfor-
training data sets of size P 1, SPb1 = mance statistic can then be evaluated by
{b z p1 : p1 = 1, . . ., P 1}, called bootstrap looking at the variability of the statistic
training sets; b = 1, . . ., B validation data between the different bootstrap test sets.
sets of size P 2, SPb2 = {b z p2 : p2 = Estimate the standard deviation, , of
1, . . ., P 2}, called bootstrap validation as approximated by bootstrap:
sets; and b = 1, . . ., B test data sets of
size P 3, SPb3 = {b z p3 : p3 = 1, . . ., P 3},
called bootstrap test sets. 1 B 2
PB3 = b (b w )(.)
Step 2: Computation of the bootstrap parameter B 1
estimates b=1
Each bootstrap training set SPb1 is used (20.41)
to compute a new parameter vector by
minimizing:
where
arg min {E (b w ) : b w W ,
B
(.) = 1 b (b w ). (20.42)
B
W RQ } (20.39) b=1
Bck, T., Fogel, D.B. and Michaelewicz, Z. (eds) (1997). Fischer, M.M. (1998). Computational neural networks
Handbook of Evolutionary Computation. New York A new paradigm for spatial analysis. Environment
and Oxford: Oxford University Press. and Planning A, 30(10): 18731892.
Bishop, C.M. (1995). Neural Networks for Pattern Fischer, M.M. (2000). Methodological challenges in
Recognition. Oxford: Clarendon Press. neural spatial interaction modelling: the issue of
model selection. In: Reggiani, A. (ed.), Spatial
Bishop, C.M. (2006). Pattern Recognition and Machine
Economic Science: New Frontiers in Theory and
Learning. New York: Springer.
Methodology, pp. 89101. Berlin, Heidelberg and
Breiman, L. (1996). Heuristics of instability and New York: Springer.
stabilization in model selection. The Annals of
Fischer, M.M. (2002). Learning in neural spatial
Statistics, 24(6): 23502383.
interaction models: A statistical perspective, Journal
Bridle, J.S. (1994). Probabilistic interpretation of of Geographical Systems, 4(3): 287299.
feedforward classication network outputs, with
relationships to statistical pattern recognition. In: Fischer, M.M. (2005). Spatial analysis. In Longley, P.,
Fogelman Souli, F. and Hrault, J. (eds), Neurocom- Goodchild, M.F., Maguire, D.J. and Rhind, D.W.
puting. Algorithms, Architectures and Applications, (eds), Geographical Information Systems. Princi-
pp. 227236. Berlin, Heidelberg and New York: ples, Techniques, Management and Applications.
Springer. Second Edition, Abridged, (CD-ROM). Hoboken,
New Jersey: Wiley.
Carpenter, G.A. (1989). Neural network models for
pattern recognition and associative memory. Neural Fischer, M.M. (2006a). Neural networks. A general
Networks, 2(4): 243257. framework for non-linear function approximation.
Transactions in GIS, 10(4): 521533.
Carpenter, G.A., Grossberg, S. and Reynolds, J.H.
(1991). ARTMAP supervised real-time learning and Fischer, M.M. (2006b). Spatial Analysis and Geocom-
classication of nonstationary data by a self- putation. Selected Essays. Berlin, Heidelberg and
organizing neural network. Neural Networks, 4(5): New York: Springer.
565588. Fischer, M.M. and Getis, A. (eds) (1997). Recent
Cichocki, A. and Unbehauen, R. (1993). Neural Developments in Spatial Analysis. Spatial Statistics,
Networks for Optimization and Signal Processing. Behavioural Modelling, and Computational Intelli-
Chichester: Wiley. gence. Berlin, Heidelberg and New York: Springer.
Corne, S., Murray, T., Openshaw, S., See, L. and Fischer, M.M. and Gopal, S. (1994). Articial neural
Turton, I. (1999). Using computational intelligence networks: A new approach to modelling interre-
techniques to model subglacial water systems. gional telecommunication ows. Journal of Regional
Journal of Geographical Systems, 1(1): 3760. Science, 34(4): 503527.
Cybenko, G. (1989). Approximation by superpositions Fischer, M.M. and Leung, Y. (1998). A genetic-
of a sigmoidal function. Mathematics of Control algorithm based evolutionary computational neural
Signals and Systems, 2: 303314. network for modelling spatial interaction data. The
Annals of Regional Science, 32(3): 437458.
Efron, B. (1982). The Jackknife, the Bootstrap and
Other Resampling Plans. Philadelphia, PA: Society Fischer, M.M. and Leung, Y. (eds) (2001). GeoCom-
for Industrial and Applied Mathematics. putational Modelling: Techniques and Applications.
Berlin, Heidelberg and New York: Springer.
Efron, B. and Tibshirani, R. (1993). An Introduction to
the Bootstrap. New York: Chapman and Hall. Fischer, M.M. and Reismann, M. (2002a). Evaluating
neural spatial interaction modelling by bootstrap-
Finnoff, W. (1991). Complexity measures for classes
ping. Networks and Spatial Economics, 2(3):
of neural networks with variable weight bounds.
255268.
Proceedings of the International Geoscience and
Remote Sensing Symposium (IGARSS94, Volume 4), Fischer, M.M. and Reismann, M. (2002b). A method-
pp. 18801882. Piscataway, NJ: IEEE Press. ology for neural spatial interaction modeling.
Geographical Analysis, 34(2): 207228.
Finnoff, W., Hergert, F. and Zimmerman, H.G.
(1993). Improving model selection by nonconvergent Fischer, M.M. and Staufer, P. (1999). Optimization in an
methods. Neural Networks, 6(6): 771783. error backpropagation neural network environment
394 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
with a performance test on a spectral pattern Grossberg, S. (1988). Nonlinear neural networks.
classication problem. Geographical Analysis, 31(2): Principles, mechanisms and architectures. Neural
89108. Networks, 1(1): 1761.
Fischer, M.M., Hlavckov-Schindler, K. and Hassoun, M.H. (1995). Fundamentals of Articial
Reismann, M. (1999). A global search procedure for Neural Networks. Cambridge, MA and London,
parameter estimation in neural spatial interaction England: MIT Press.
modelling. Papers in Regional Science, 78(2):
119134. Hastie, T., Tibshirani, R. and Friedman, J. (2001). The
Elements of Statistical Learning. Berlin, Heidelberg
Fischer, M.M., Reismann, M. and Hlavckov- and New York: Springer.
Schindler, K. (2003). Neural network modelling
of constrained spatial interaction ows: Design, Haykin, S. (1994). Neural Networks. A Comprehensive
estimation and performance issues. Journal of Foundation. New York: Macmillan College Publish-
Regional Science, 43(1): 3561. ing Company.
Fischer, M.M., Gopal, S., Staufer, P. and Steinnocher, K. Hertz, J., Krogh, A. and Palmers, R.G. (1991).
(1997). Evaluation of neural pattern classiers for a Introduction to the Theory of Neural Computation.
remote sensing application. Geographical Systems, Redwood City, CA: Addison-Wesley.
4(2): 195226.
Hinton, G.E. (1987). Learning translation invariant
Fogel, D.B. (1995). Evolutionary Computation: Toward recognition in massively parallel networks. In:
a New Philosophy of Machine Intelligence. Bakker, J.W. de, Nijman, A.J. and Treleaven, P.C.
Piscataway, NJ: IEEE Press. (eds), Proceedings PARLE Conference on Parallel
Architectures and Languages Europe, pp. 113.
Fogel, D.B. and Robinson, C.J. (eds) (1996). Com-
Berlin, Heidelberg and New York: Springer.
putational Intelligence. Piscataway: IEEE Press and
Wiley-Interscience. Hlavckov-Schindler, K. and Fischer, M.M. (2000). An
Foody, G.M., and Boyd, D.S. (1999). Fuzzy mapping incremental algorithm for parallel training of the size
of tropical land cover along an environmental and the weights in a feedforward neural network.
gradient from remotely sensed data with an articial Neural Processing Letters, 11(2): 131138.
neural network. Journal of Geographical Systems, Hornik, K., Stinchcombe, M. and White, H. (1989).
1(1): 2335. Multilayer feedforward networks are universal
Funahashi, K. (1989). On the approximate realization approximators. Neural Networks, 2(5): 359368.
of continuous mappings by neural networks. Neural
Huang, G.-B. and Siew, C.-K. (2006). Real-time learning
Networks, 2(3): 183192.
capability of neural networks. IEEE Transactions on
Gahegan, M. (2000). On the application of inductive Neural Networks, 17(4): 863878.
machine learning tools to geographical analysis.
Janson, D.J. and Frenzel, J.F. (1993). Training product
Geographical Analysis, 32(1): 113133.
unit neural networks with genetic algorithms. IEEE
Gahegan, M., German, G. and West, G. (1999). Expert, 8(5): 2633.
Improving neural network performance on the clas-
sication of complex geographic datasets. Journal of Kohonen, T. (1988). Self-Organization and Associative
Geographical Systems, 1(1): 322. Memory. Berlin, Heidelberg and New York: Springer.
Goldberg, D.E. (1989). Genetic Algorithms. Reading, Kuan, C.-M. and White, H. (1991). Articial neural
MA: Addison-Wesley. networks: An econometric perspective. Econometric
Reviews, 13(1): 191.
Gopal, S. and Fischer, M.M. (1996). Learning in
single hidden-layer feedforward network models. Kurkov, V. (1992). Kolmogorovs theorem and
Geographical Analysis 28(1): 3855. multilayer neural networks. Neural Networks, 5(3):
501506.
Gopal, S. and Fischer, M.M. (1997). Fuzzy ARTMAP a
neural classier for multispectral image classication. Leung, K.-S., Ji, H.-B. and Leung, Y. (1997). Adap-
In: Fischer, M.M. and Getis, A. (eds), Recent tive weighted outer-product learning associative
Developments in Spatial Analysis, pp. 306335. memory. IEEE Transactions on Systems, Man, and
Berlin, Heidelberg and New York: Springer. Cybernetics Part B, 27(3): 533543.
NEURAL NETWORKS FOR SPATIAL DATA ANALYSIS 395
Leung, Y. (1997). Feedforward neural network models Processing Systems 5, pp. 607614. San Mateo, CA:
for spatial data classication and rule learning. Morgan Kaufmann.
In: Fischer, M.M. and Getis, A. (eds), Recent
Nocedal, J. and Wright S.J. (1999). Numerical
Developments in Spatial Analysis, pp. 289305.
Optimization. Berlin, Heidelberg and New York:
Berlin, Heidelberg and New York: Springer.
Springer.
Leung, Y. (2001). Neural and evolutionary computation
Openshaw, S. (1993). Modelling spatial interaction
methods for spatial classication and knowledge
using a neural net. In: Fischer, M.M. and
acquisition. In: Fischer, M.M. and Leung, Y.
Nijkamp, P. (eds), GIS, Spatial Modelling, and Policy,
(eds), GeoComputational Modelling. Techniques and
pp. 147164. Berlin, Heidelberg and New York:
Applications, pp. 71108. Berlin, Heidelberg and
Springer.
New York: Springer.
Openshaw, S. (1994). Neuroclassication of spatial
Leung, Y., Chen, K.-Z. and Gao, X.-B. (2003).
data. In: Hewitson, B.C. and Crane, R.G. (eds),
A high-performance feedback neural network for
Neural Nets: Applications in Geography. pp. 5370.
solving convex nonlinear programming problems.
Boston: Kluwer Academic Publishers.
IEEE Transactions on Neural Networks 14(6):
14691477. Openshaw, S. and Abrahart, R.J. (eds) (2000).
GeoComputation. London and New York: Taylor &
Leung, Y., Dong, T.-X. and Xu, Z.-B. (1998). The optimal Francis.
encodings for biased association in linear associative
memory. Neural Networks 11(5): 877884. Openshaw, S. and Openshaw, C. (1997). Articial
Intelligence in Geography. Chichester: Wiley.
Leung, Y., Gao, X.-B. and Chen, K.-Z. (2004). A dual
neural network for solving entropy-maximising Plutowski, M., Sakata, S. and White, H. (1994).
models. Environment and Planning A, 36(5): Cross-validation estimates IMSE. In: Cowan, J.D.,
897919. Tesauro, G. and Alspector, J. (eds), Advances
in Neural Information Processing Systems 6,
Leung, Y., Chen, K.-Z., Jiao, Y.-C., Gao, X.-B. pp. 391398. San Francisco: Morgan Kaufmann.
and Leung, K.S. (2001). A new gradient-based
neural network for solving linear and quadratic Poggio, T. and Girosi, F. (1990). Networks for
programming problems. IEEE Transactions on Neural approximation and learning. Proceedings of the IEEE,
Networks, 12(5): 10741083. 78(9): 91106.
Ljung, L. (1977). Analysis of recursive stochastic Press, W.H., Teukolky, S.A., Vetterling, W.T. and
algorithms. IEEE Transactions on Automatic Control, Flannery, B.P. (1992). Numerical Recipes in C. The
AC-22: 551575. Art of Scientic Computing. 2nd edn. Cambridge:
Cambridge University Press.
McCulloch, W.S. and Pitts, W. (1943). A logical calculus
of the ideas immanent in nervous activity. Bulletin of Ripley, B.D. (1994). Neural networks and related
Mathematical Biophysics, 5: 115133. methods for classication (with discussion). Journal
of the Royal Statistical Society B, 56(3): 409456.
Mineter, M.J. and Dowers, S. (1999). Parallel
Ripley, B.D. (1996). Pattern Recognition and Neural
processing for geographical applications: A layered
Networks. Cambridge: Cambridge University Press.
approach. Journal of Geographical Systems, 1(1):
6174. Rosenblatt, F. (1962). Principles of Neurodynamics.
Washington DC: Spartan Books.
Moody, J.E. (1992). The effective number of para-
meters: An analysis of generalization and regulari- Rumelhart, D.E., Hinton, G.E. and Williams, R.J. (1986).
zation in nonlinear learning systems. In: Moody, J.E., Learning internal representations by error propa-
Hanson, S.J. and Lippman, R.P. (eds), Advances gation. In: Rumelhart, D.E., McClelland, J.L. and
in Neural Information Processing Systems 4, the PDP Research Group (eds), Parallel Distributed
pp. 683690. San Mateo, CA: Morgan Kaufmann. Processing: Explorations in the Microstructure
of Cognition, pp. 318362. Cambridge, MA:
Murata, N. Yoshizawa, S. and Amari, S. (1993).
MIT Press.
Learning curves, model selection and complexity of
neural networks. In: Hanson, S.J., Cowan, J.D. and Schwefel, H.-P. (1994). Evolution and Optimum
Giles, C.L. (eds), Advances in Neural Information Seeking. New York: Wiley.
396 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
Specht, D.F. (1991). A general regression neural Widrow, B. and Hoff, M.E. Jr. (1960). Adaptive
network. IEEE Transactions on Neural Networks, switching circuits. IRE Western Electric Show and
2(6): 568576. Convention Record, Part 4: 96104.
Stepniewski, W. and Keane, J. (1997). Pruning Wilkinson, G.G. (1997). Neurocomputing for
backpropagation neural networks using modern earth observation recent developments and
stochastic optimization techniques. Neural Comput- future challenges. In: Fischer, M.M. and Getis, A.
ing and Applications, 5(2): 7698. (eds), Recent Developments in Spatial Analysis,
pp. 289305. Berlin, Heidelberg and New York:
Tibshirani, R. (1996a). A comparison of some error Springer.
estimates for neural network models. Neural
Computation, 8(1): 152163. Wilkinson, G.G. (2001). Spatial pattern recognition in
remote sensing by neural networks. In: Fischer, M.M.
Tibshirani, R. (1996b). Regression shrinkage and and Leung, Y. (eds), GeoComputational Modelling.
selection via the lasso. Journal of the Royal Statistical Techniques and Applications, pp. 145164. Berlin,
Society B, 58: 267288. Heidelberg and New York: Springer.
Wedge, D., Ingram, D., McLean, D., Mingham, C. Wilkinson, G.G., Fierens, F. and Kanellopoulos, I.
and Bandar, Z. (2006). On global-local arti- (1995). Integration of neural and statistical
cial neural networks for function approximation. approaches in spatial data classication. Geograph-
IEEE Transactions on Neural Networks, 17(4): ical Systems, 2(1): 120.
942952.
Yao, X. (1996). A review of evolutionary artical
Weigend, A.S., Rumelhart, D.E. and Huberman, B.A. neural networks. International Journal of Intelligent
(1991). Generalization by weight elimination with Systems, 8(4): 539567.
application to forecasting. In: Lippman, R., Moody, J.
and Touretzky, D. (eds), Advances in Neural Yao, X. (2001). Evolving computational neural networks
Information Processing Systems 3, pp. 875882. San through evolutionary computation. In: Fischer, M.M.
Mateo, CA: Morgan Kaufmann. and Leung, Y. (eds), GeoComputational Modelling.
Techniques and Applications, pp. 3570. Berlin,
Weng, J. and Hwang, W.-S. (2006). From neural Heidelberg and New York: Springer.
networks to the brain: Autonomous mental devel-
opment. IEEE Computational Intelligence Magazine, Yao, X., Fischer, M.M. and Brown, G. (2001).
1(3): 1531. Neural network ensembles and their application
to trafc ow prediction in telecommunication
White, H. (1989). Learning in articial neural networks: networks. In: Proceedings of the 2001 IEEE-INNS-
a statistical perspective. Neural Computation, 1(4): ENNS International Joint Conference on Neural
425464. Networks, pp. 693698. Piscataway, NJ: IEEE Press.
White, H. (1992). Articial Neural Networks. Approx- Zapranis, A. and Refenes, A.-P. (1998). Principles of
imation and Learning Theory. Oxford, UK and Neural Model Identication, Selection and Adequacy.
Cambridge, USA: Blackwell. With Applications to Financial Econometrics. London:
Springer.
White, H. and Racine, J. (2001). Statistical infer-
ence, the bootstrap, and neural-network modeling Zhang, G., Patuwo, B.E. and Hu, M.Y. (1998).
with applications to foreign exchange rates. Forecasting with articial neural networks: The state
IEEE Transactions on Neural Networks, 12(4): of the art. International Journal of Forecasting,
657673. 14(1): 3562.
21
Geocomputation
Harvey J. Miller
8 000 000
7 000 000
6 000 000
Number of transistors
5 000 000
4 000 000
3 000 000
2 000 000
1 000 000
0
1972 1974 1978 1982 1985 1989 1993 1995 1997
Year
Figure 21.1 Evidence of Moores Law the number of transistors on Intel IC chips (based on
Kurtzweil, 1999).
GEOCOMPUTATION 399
our tools and methods given this growth in This may sound far-fetched. But until
computing power (Openshaw, 2000). recently, science has worked under a similar
but equally pervasive metaphor, namely,
universe as machine. This assumed that the
Data collection and storage universe behaved much like an engine, with
Paralleling the astonishing increases in com- continuous and well-behaved processes with
putational power is an equally stunning effects that are proportional to causes. Most
collapse in the cost of collecting and storing importantly, this implied that the whole is
digital data. The computerization of many equal to sum of the parts, and we can
government and business transactions as understand the whole by studying its parts
well as the increasing capabilities for direct independently. The tools for this exploration
digital data capture through devices such were algebra and calculus: these are tools that
as bar code scanners and environmental examine quantities (magnitudes) in continu-
sensors has greatly reduced the cost of data ous mathematical space (Flake, 1998).
collection. At the same time, database, and There is a fundamental reason why com-
data warehousing techniques have become putation may be a better description of nature
more powerful and affordable (Chen et al., than mechanics: frugality. Natural processes
1996). The hardware costs of storing data have a remarkable ability to extract much
are also a minute fraction of the costs a from a minimal investment of resources:
decade or two ago. These trends are shifting consider, for example, the surface area
science from a data-poor to a data-rich generated by the leaves of a tree or in
environment. the interface between the lungs and the
circulatory system. Similarly, a great deal of
biological complexity results from a code that
consists of only four symbols, namely, DNA.
21.2.2. Nature and complexity
Similarly, computing tries to obtain the most
Just because we are drowning in CPU with the least investment of computational
cycles and data does not mean we should resources and, similar to biological growth,
apply them to understanding non-computer simple computational rules can result in
related phenomena. Computers may not be complex behavior that is not predictable.
appropriate tools for gaining new scientific The property of complexity from simplicity
understanding about reality apart from the in both nature and computation means that
computational process itself. Perhaps com- the whole is greater than the sum of the
puters should just be used to manage our parts: phenomena cannot be understood
data and documents, run our personal digital entirely by independent analysis of their
assistants and cell phones, and coordinate components. This implies a middle path to
transportation and logistics. In other words, scientific knowledge: instead of looking at
just because computers are great for solving the individual or aggregate, we see how the
engineering problems, does not mean that aggregate emerges from the interactions of
they are useful to discover knowledge individuals (Flake, 1998).
about the Earth. However, computational There are some powerful mechanisms in
science is also predicated on a belief that both nature and computation that facilitate
computation can mimic natural processes. complexity from simplicity. These principles
Nature may behave much like a computer; are parallelism, iteration, and adaptation
in fact, perhaps the universe is a computer (Flake, 1998). Complex systems are often
(Kelley, 2002). highly parallel collections of relatively
400 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
simple units, for example, consider an there are surprising and useful patterns
ant colony (ants), a brain (neurons), an in these data and representations that are
ecosystem (animals), or a city (people). not being discovered by the analytic and
Parallel systems are more efficient and statistical methods in traditional spatial
robust than sequential systems since they analysis.
can specialize, explore a wider range of Another motivating force for geocompu-
solutions simultaneously, and survive the tation is the increasing recognition of the
failure of many components. Iteration over complexity of the spatio-temporal systems
time allows feedback from the environment of concern in geography and the Earth
to determine the success or failure of sciences (Fischer and Leung, 2001). For
different units and their strategies. Iteration example, the dynamic evolution of an urban
also supports the closely related concept of system emerges from the individual agents
recursion or self-reference. Finally, adapta- of change, their interactions and the co-
tion is a consequence of parallelism and evolution of the context in which these
iteration within an environment with scarce interactions occur. This suggests not only
resources and therefore competition. Many that these systems are more complicated
of the techniques used in computational than previously supposed, but also that we
science incorporate some or all of these cannot engineer their growth; rather, we
principles. can only influence or shape their evolution.
We have seen this time and time again
when a relatively modest change in infras-
tructure or policy (e.g., a new highway
interchange, a change in zoning regulations)
21.3. GEOCOMPUTATION leads to wildly disproportionate outcomes
(e.g., traffic congestion, urban sprawl). This
21.3.1. Motivation
is not defeatist; rather, it suggests humility
Similar to CS, a factor motivating geo- and the need for sophisticated, nuanced
computation is the increasing ability to approaches to understanding and directing
capture, store, and process digital geographic these systems to efficient, equitable and
data. In particular, it is increasingly pos- sustainable outcomes.
sible to capture geo-data at high levels In addition to increasing recognition of the
of spatial and temporal resolution as well complexity of geographic phenomena, it is
as manipulate very detailed representations also likely that intrinsic complexity of these
of geography using geographic information systems is increasing. As the world continues
systems (GIS) and related technologies. to become more crowded, mobile and
Geo-spatial data capture technologies include connected, small local actions can have large-
intelligent transportation systems, hyper- scale outcomes. Saturated road networks
spectral, and laser-based remote sensing mean that an accident in one location results
systems, environmental monitoring devices, in traffic jams across town. Airline networks
and location-aware technologies (LATs) that distribute diseases across continents and
can report their geo-location densely with around the world within hours. The Internet
respect to time. GIS allow analysis of spreads innovative ideas and wild rumors
geographic relationships and morphology throughout the globe nearly at the speed of
at levels of detailed hardly imaginable light. Interconnected financial networks mean
even a short time in the past (Miller that decisions made in a conference room can
and Wentz, 2003). It is possible that have huge economic consequences for large
GEOCOMPUTATION 401
analytical solutions. While this may sound We have also been constructing, storing
like a drawback (and it is), this is a and using maps for 5000 years. In other
necessary trade-off. Traditional modeling words, we can do GIS even without
methods rely on simplistic representations computers, although it would be very slow
of space and behavior in order to facilitate and tedious. GC is about what we could
precise analytical solutions. GC determines not do before the development of powerful
numerical approximations of solutions for computers.
systems with more complex representations In sum, GC uses the traditional techniques
of space and behavior. The argument is that of spatial analysis (statistics, mathematical
it is better to have an approximate solution modeling) and GIS as parts of a more flexible
to a richly represented system than an exact and expansive tool kit. GC is concerned
solution to a sterile representation. Numerical with the use of computational techniques
approximations are necessary consequences and technologies within a scientific frame-
of richer, more accurate representations of work. This involves GIS as the data and
geographic phenomena. information manager, computational methods
Much of the digital geographic data as the tools, and high performance com-
available to researchers no longer meets puting as the driver (Fischer and Abrahart,
many of the assumptions of inferential 2000; Fischer and Leung, 2001; Openshaw,
statistics, including the more relaxed assump- 2000).
tions of spatial analysis. Geographic data is
increasingly no longer carefully structured
and limited samples from a much larger
21.3.4. A theory of
population. Rather, digital geographic data
geocomputation?
are often monitored entire populations (in the
statistical sense) collected using ill-structured Couclelis (1998a, b) provides a more skep-
and noisy methods. Computational tech- tical view of GC. She argues that GC is
niques that do not require strict assumptions in fact a loosely connected grab-bag of
are better suited for these rich but sloppy data techniques rather than a focused scientific
(Atkinson and Martin, 2000). endeavor. She challenges the GC community
GIS provides a source of data and a to develop a rigorous computational theory
toolkit environment for GC. GC is distinct of spatiotemporal processes that justifies the
since it emphasizes dynamic processes over prefix geo.
static form and user interaction over passive Couclelis points out that computational
receipt of information. GC is about matching science is based on the theory of computation,
technology with environment, process with a highly developed and rigorous theory
data model, geometry with application, anal- of what can (and cannot) be computed
ysis with local context, and, philosophy of and how things that can be computed
science with practice (Longley, 1998). We should be. This involves questions such
can also make a distinction between GIS as determining which processes in the
and GC that is similar to Fotheringhams world can be described in the precise
(2000) distinction between computer-based manner required by computation, and what
spatial analysis and GC. In many respects, the is the appropriate language for describing
computer is nothing more than a convenient specific processes. These are much deeper
vehicle for GIS. For example, the overlay questions than what available computational
operation pre-dates much of the development technique is best for a particular data set
of computer-based GIS (see McHarg, 1969). or problem. (For an excellent introduction
GEOCOMPUTATION 403
in length. Therefore, we must conclude that De Keersmaecker et al., 2003; Shen, 2002)
the length of this naturally occurring object land cover patterns (De Cola, 1989), land-
is meaningless, independent of the scale of scape analysis (Burrough, 1993; Clarke and
measurement. Schweizer, 1991) and riparian networks
We can estimate the fractal dimension of (Phillips, 1993a). Wentz (2000) uses a fractal
an entity by comparing the growth in its dimension measure as a component of a
apparent length or size with the change in the general, trivariate shape measure.
scale of the measurement. Essentially, this is
an attempt to estimate the following power
law (Peitgen et al., 2004): Simulating fractal growth
In addition to fractal analysis of geographic
patterns, it is also possible to simulate fractal
yx d
growth using rule sets and iterated systems.
Simulating fractal growth from finite systems
where y is the size of the object, x is the such as rule sets and iterated systems
measurement scale, and d is an empirical captures a key property of fractal growth
parameter related to the dimension of the in the real world: the ability to generate
object. In practice, estimating this relation- highly complex entities using very simple
ship is complex: there are several definitions processes. Physical, biological and human
and measures of the fractal dimension, not all systems evolve from some baseline appar-
of which agree (see Lam and De Cola, 1993; ently without encoding complex information
Moon, 1992; Peitgen et al., 2004). Common such as systems of simultaneous equations,
fractal dimensions include the similarity, constrained optimization problems, or partial
capacity, and HausdorffBesicovich dimen- differential equations to govern their growth.
sions (Batty and Longley, 1994; Goodchild Rather, real world phenomena may emerge
and Mark, 1987; Williams, 1997). Methods through simple growth mechanisms applied
for calculating these dimensions include recursively. Many methods for simulating
box-counting, compass, area-perimeter, and fractal growth use the powerful technique of
variogram methods (Burrough, 1993; Peitgen recursion to generate complex structures with
et al., 2004). minuscule base information.
Measuring the fractal dimension of geo- Two well-known recursive methods for
graphic phenomena allows determination generating fractals are iterated functional
of its scale-invariance (self-similarity at systems (IFS) and L-systems. The IFS algo-
different scales) as well as other fractal rithm starts with a seed object and maps a
properties such as space-filling and irregu- point on that object back onto itself through
larity. The increasing availability of digital some randomly chosen affine transformation.
geographic data as well as GIS tools This recursive process iterates and the object
for handling these data can support these approaches a fractal object consisting of the
analyses, allowing more detailed examination union of smaller copies of the seed object (see
of the relationships between spatial process Batty and Longley, 1994; Barnsley, 1988;
and geographic form (Batty and Longley, Flake, 1998). L-systems simulate biological
1994; Longley, 2000). Applications include growth through a rule-based system that gen-
spatial population distributions (Appleby, erates progressively complex strings through
1996), transportation network morphology recursively applying the production rules to
(Benguigui and Daoud, 1991), urban mor- the axioms and the strings generated through
phology (Batty and Longley, 1987, 1994; these applications. This results in structures
406 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
with fractal properties that can be visualized discrete-time dynamics, with rates of change
using systems such as turtle graphics (Flake, expressed in terms of differences in the
1998; Peitgen et al., 2004). values of variables at different points in time.
Other methods that simulate fractal growth For many years it was assumed dynam-
include tessellation methods such as cellular ical systems exhibited one of three types
automata (White and Engelen, 1993; also of behavior with respect to time (Flake,
see below) and diffusion-limited aggrega- 1998): (1) fixed point (static); (2) periodic
tion (Batty, 1991; Fotheringham et al., (orbit), and; (3) quasi-periodic (orbits that
1989); these methods have been applied never quite repeat themselves). However,
to simulating urban dynamics. Brownian it was also known that certain types of
motion methods have been applied to sim- dynamical systems exhibited behaviors that
ulate natural objects with fractal proper- were intractable analytically. In particular,
ties such as riparian networks, geological non-linear dynamical systems were known
time series and terrain (Goodchild and to be notoriously difficult. Since the rise of
Klinkenberg 1993). the digital computer, it has become easier
Fractal analysis and fractal simulation to study non-linear dynamical systems using
appear to be powerful methods that can numerical simulation. This has led to the
reveal or mimic the structure and processes discoveries that these systems are not just
underlying many natural and human systems. intractable: they show very complex behavior
The critical question remains whether explicit now referred to as chaos.
linkages can be identified between fractal Chaos is not randomness: completely
processes and the natural and behavioral deterministic systems can exhibit chaotic
mechanisms identified from the domain sci- behavior. Yet this behavior is seemingly
ences. It is important to note that some fractal random with respect to prediction: fore-
algorithms are heuristics that imply unrealis- casts about these systems over the long-
tic growth processes. To this end, correspon- run are poor, even though the mechanisms
dence between fractal processes and central of the system are known. In particular,
place theory (Arlinghaus, 1985; Arlinghaus chaotic systems are highly sensitive to
and Arlinghaus, 1989) and von Thnian initial conditions: small differences in the
theories of urban structure (Cavailhs et al., starting points can lead to huge differences
2004) have been established. in their trajectories later in time. Chaotic
behavior seems to be inherent in many
types of nonlinear dynamical systems, even
those with very simple structures: population
21.4.2. Dynamical systems and
dynamics, predatorprey dynamics, weather,
chaotic behavior
and the stock market are all examples of real
A dynamical system is a system that world processes that can be difficult to predict
experiences some change or motion. Many even if we know the underlying mechanics
(if not most) natural and human made (Flake, 1998).
systems are dynamic. The traditional way
to study dynamical systems is through
differential equations and difference equa- Chaotic behavior and strange attractors
tions. Differential equations are continuous- Two well-known non-linear systems that
time equations where one or more of the generate chaotic behavior are the Lorenz
variables are rates of change expressed attractor and generalized LotkaVolterra
as derivatives. Difference equations capture systems. The Lorenz attractor consists of
GEOCOMPUTATION 407
three linked differential equations that (Flake, 1998; Peitgen et al., 2004; Williams,
model convection flow in weather sys- 1997).
tems. Generalized LotkaVolterra systems
model predatorprey relationships through
n linked differential equations, where n is Spatial chaos
the number of species. This system displays The non-linear dynamical systems we have
of wide range of dynamic behavior under discussed thus far exhibit temporal chaos,
different parameterizations, including chaotic that is, chaotic behavior in the dynamic
behavior (Flake, 1998). evolution of aggregate system parameters.
The Lorenz attractor and generalized A reasonable question is whether temporal
LotkaVolterra systems capture many of the chaos can lead to spatial chaos or complex
characteristics of chaotic dynamical systems. spatial patterns that exhibit a high degree
Both are non-linear and incorporate feed- of sensitivity to conditions at particular
back: for example, in LotkaVolterra systems locations. Theoretically, it turns out that
the number of predators affects the number spatial chaos can emerge from temporal
of prey through culling the latter, while in chaos under very broad conditions: unless
turn the number of prey affects the predators the system is perfectly isotropic with respect
that can be supported. Both systems are very to space, spatial chaos will emerge and
simple, but generate very complex behavior increase over time (Phillips, 1993b, 1999a).
behavior that often cannot be distinguished Given the broad conditions under which
from randomness. However, the trajectories spatial chaos can emerge, it is not surprising
of these systems contain order, at least in that spatial chaos has been detected in
a global sense. Finally, these systems are physical and human geographic models
hypersensitive to initial conditions, with the and data. These include physical systems
consequence that while short-term behavior such as geomorphologic, hydrological, and
can be predicted, long-term predictions are ecological systems (see Phillips, 1999a, b),
meaningless (Williams, 1997). retail dynamics (Wilson, 2006), economic
An attractor is the bounded region within systems (White, 1990), urban systems (Wong
the phase space towards which dynamic and Fotheringham, 1990), and spatial choice
systems evolve: examples include the fixed processes (Nijkamp and Reggiani, 1990).
point, period, and quasi-periodic behaviors There are three general approaches to
mentioned above. Chaotic systems are char- detecting spatial chaos (Phillips, 1993b). One
acterized by strange attractors. The system method is to test for sensitivity to initial
evolves within a finite space, but with an conditions by analyzing the Lyapunov expo-
infinite period: visiting every location within nents: these describe the average rate of con-
the region but never the same location twice. vergence or divergence of two neighboring
Consequently, the calculated dimension of trajectories in phase space (Williams, 1997).
chaotic trajectories will often be fractional: A second method is numerical simulation:
contained within a finite area, but space- simulate and plot the behavior of the system
filling. These trajectories are often infinitely in phase space, and analyze the plot using
self-similar: increasing the resolution of the graphical techniques. A third approach is to
calculations and subsequent trajectory plots examine an empirical temporal or spatial
will reveal the same structure repeatedly. series for signatures of chaos, with the
Thus, there is a deep linkage between fractals latter series derived by generating a spatial
and chaos: both exhibit the computational gradient by choosing some transect across
principle of complexity from simplicity space (see Phillips, 1993b).
408 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
are inherent in the choice of cell size as The objective is to simulate the dynamics of
well as the neighborhood definition. Mnard complex systems through the behaviors and
and Marceua (2005) perform a sensitivity interactions of its individual agents. Agents
analysis of scale and the resulting spatial can represent people, households, animals,
patterns and dynamics in a CA model of land- firms, organizations, regions, countries, and
cover change. They discover substantial, so on, depending on the scale of the analysis
non-linear relationships between these scale and the elemental units hypothesized for
issues and the simulation results. that scale.
Similar to CA, ABM in many respects
exemplifies the geocomputational approach.
ABM is motivated by the view that many
21.4.4. Agent-based modeling
geographic phenomena are emergent: sim-
An agent is some independent unit that ple processes generate complex structure
tries to fulfill a set of goals in a complex, and patterns. In addition, the increasing
dynamic environment. These goals can be availability of high-resolution data and GIS
end goals or ultimate states that the agents tools for handling these data facilitate
try to achieve, or they can be some type ABM in geographic research (Benenson and
of reinforcement or reward that the agent Torrens, 2004).
attempts to maximize. The environment can
be very general, and often includes other
agents. An agent is autonomous if its actions Generative geographic science
are independent, i.e., it makes decisions ABM is a critical tool in a distinct, generative
based on its sensory inputs and goals. An approach to science that focuses on the
agent is adaptive if its behavior can improve following question: How could the decen-
over time through some learning process tralized local interactions of autonomous
(Maes 1995). Agents interact by exchanging agents generate a given pattern? The analyst
physical or virtual (informational) resources. attempts to answer this question by situ-
These interactions are typically very simple: ating an initial population of autonomous
they can be described by a small set of agents in a relevant spatial environment
rules. From the pattern and intensity of these and allowing them to interact according
interactions emerge complex behavior that is to simple rules, thereby generating the
not completely predictable or controllable: macroscopic regularity from the bottom up.
it materializes from the interactions of these If the analyst can reproduce the macro-
rules (Flake, 1998). scopic pattern, than the microspecifica-
The agent perspective is very general: tion is a candidate explanation (Epstein,
many systems can be viewed as collections of 1999).
autonomous, adaptive, and interacting agents. ABM is well-suited as a central tool in gen-
In agent-based modeling (ABM), we are erative science due to the following realistic
concerned with simulated agents (software characteristics (Epstein, 1999): (1) hetero-
representations) as opposed to embodied geneity agents represent individual entities
agents (such as humans). Multi-agent systems with unique characteristics that can change
(MAS) are ABMs that contain a distribution over time, as opposed to the static, aggregate
of simulated and interacting agents (Boman representative agents in traditional social
and Holm, 2004). ABM and MAS are and other sciences; (2) autonomy there
bottom-up, individual-based approaches to is no central control over individuals in
simulating physical and human phenomena. agent-based models, except for feedback
GEOCOMPUTATION 411
class of computational techniques that can Also, integrating ancillary information into
exploit noisy data, as well as solve difficult remote sensing to aid in classification also
optimization problems. increases the complexity of the problem
ANNs are an analog to biological neu- (Fischer and Abrahart, 2000). ANNs have
ral networks such as the brain. Biologi- considerable promise as pattern classifiers
cal neurons adjust their firing frequencies that can effectively handle the vast and noisy
over time to other neurons in response information in remotely sensed imagery and
to the firing frequencies from their input imagery combined with ancillary data (see
neurons. Some of these neurons are con- Foody, 1995; Gong et al., 1996; Hepner
nected to external sensors (such as eyes). et al., 1990).
Through a learning process, the biological In contrast to pattern classification, we
neural networks adjust firing frequencies often have the case where we do not have any
until an appropriate response is achieved pre-specified categories for the data. Instead,
(e.g., ideas, behavior). An ANN replicates we wish to find natural groupings or clusters
(on a very limited scale) the behavior of the data based on inherent similarities
and connectivity among biological neurons and dissimilarities. Cluster analysis refers
in a brain. ANNs adapt their structure to attempts to classify a set of objects into
based on subtle regularities in the input classes or clusters such that objects within
data. They are robust with respect to error a cluster are similar while objects between
and can find patterns in noisy data in a clusters are dissimilar. Unsupervised ANNs
short amount of time. ANNs offer these such as Kohonen Maps are a type of neural
advantages over brittle statistical methods clustering where weighted connectivity after
that require strict, well-behaved and known training reflects proximity in the information
error distributions (Fischer and Abrahart, space of the input data (see Flexer, 1999).
2000). ANNs have been used to cluster river flow
data into different event types (Fischer and
Abrahart, 2000).
ANN application modes ANNs can also be viewed as a type
ANNs are very flexible and can be of universal function approximation tech-
applied in many different modes, including nique. Assume a large stream of paired
pattern classification, clustering, function inputs and outputs generated from some
approximation, forecasting, and optimization unknown noisy function. We can view
(Fischer and Abrahart, 2000). ANNs as an attempt to approximate the
Pattern classification involves assigning unknown function with an approximate
input patterns into one of several prespecified function determined by the pattern of
categories. Supervised classification is one of weights in the ANN (Fischer and Abrahart,
the central problems in remote sensing: each 2000). Applications of ANNs as function
pixel must be classified into one of several approximations include spatial interaction
known land cover classifications based on its (Fischer and Gopal, 1994; Gopal and Fischer,
spectral signature and perhaps other spectral 1996; Mozolin et al., 2000; Nijkamp et al.,
signatures in its neighborhood. However, tra- 1996) and spatial interpolation (Rizzo and
ditional methods for supervised classification Dougherty, 1994).
in remote sensing are failing relative to The problem of function approximation is
the vast amount of information available in very similar to the problem of forecasting
emerging remote sensing technologies that events over space and time. Formally, the
have high spatial and spectral resolution. problem is: given a set of n samples of
GEOCOMPUTATION 413
cell phones, and home appliances) and often information analysis: A reconnaissance. Professional
Internet-enabled. The continuation of this Geographer, 57: 365375.
trend is the nanoclients that are extremely Atkinson, P. and Martin, D. (2000). Introduction.
small and specialized. Nanoclients include In: Atkinson, P. and Martin, D. (eds), GIS
wearable computers, smart dust, and wire- and Geocomputation, pp.17. London: Taylor and
less geo-sensor networks. These extremely Francis.
thin clients combined with very fat Axtell, R., Axelrod, R., Epstein, J.M. and Cohen, M.D.
high-performance servers can revolutionize (1996). Aligning simulation models: A case study
geocomputation. Not only do nanoclients and results. Computational and Mathematical
Organization Theory, 1: 123141.
allow for ambient geographic data collection,
but the environment itself can become a type Balmer, M., Nagel, K. and Raney, B. (2004). Large-scale
of computer. Space becomes a metaphor for multi-agent simulations for transportation applica-
tions. Journal of Intelligent Transportation Systems:
itself, landscapes or maps become models of Technology, Planning, and Operations, 8: 205221.
themselves, and geographic objects become
context-aware and know their own positions Barnsley, M. (1988). Fractals Everewhere. London:
Academic Press.
and relationships to other geographic objects
(Clarke, 2003). Batty, M. (1991). Cities as fractals: Simulating growth
The continuing increase in computing and form. In: Crilly, T., Earnshaw, R.A. and Jones, H.
(eds), Fractals and Chaos, pp. 4169. Berlin:
power, capabilities for collecting and stor- Springer-Verlag.
ing geo-spatial data, and the merging of
computation with the geographic environ- Batty, M. (2000). Geocomputation using cellular
automata. In: Openshaw, S. and Abrahart, R.J. (eds),
ment will require entirely new modes of
GeoCompuatation, pp. 95126. London: Taylor and
thinking about computation in general and Francis.
geocomputation in particular. While there
Batty, M., Desyllas, J. and Duxbury, E. (2003). The
will always be limits to computing (at least
discrete dynamics of small-scale spatial events:
as we now understand it) the phenomena and Agent-based models of mobility in carnivals and
problems that can be analyzed and under- street parades. International Journal of Geographical
stood through geocomputational methods Information Science, 17: 673697.
are limited as much by our creativity and Batty, M. and Longley, P. (1987). Fractal dimensions of
imagination. urban shape. Area, 19: 215221.
Batty, M. and Longley, P. (1994). Fractal Cities. London:
Academic Press.
Beguin, H. and Thisse, J.-F. (1979). An axiomatic
REFERENCES approach to geographical space. Geographical
Analysis, 11: 325341.
Appleby, S. (1996). Multifractal characterization of Benenson, I. and Torrens, P. (2004). Geosimulation:
the distribution pattern of the human population. Automata-based Modeling of Urban Phenomena.
Geographical Analysis, 28: 147160. Chichester, UK: John Wiley.
Arlinghaus, S.L. (1985). Fractals take a central place.
Benguigui, L. and Daoud, M. (1991). Is the suburban
Geograska Annaler, 67B: 8388.
railway a fractal? Geographical Analysis 23:
Arlinghaus, S.L. and Arlinghaus, W.C. (1989). The frac- 362368.
tal theory of central place geometry: A Diophantine
Boman, M. and Holm, E. (2004). Multi-agent
analysis of fractal generators for arbitrary Loschian
systems, time geography and microsimulations. In:
numbers. Geographical Analysis, 21: 103121.
Olsson, M.-O. and Sjstedt, G. (eds), Systems
Armstrong, M.P., Cowles, M.K. and Wang, S. Approaches and their Applications, pp. 95118.
(2005). Using a computational grid for geographic Dordrecht: Kluwer Academic.
GEOCOMPUTATION 415
Burrough, P.A. (1993). Fractals and geostatistical Couclelis, H. (1998b). Geocomputation and space.
methods in landscape studies. In: Lam, N.S.-N. Environment and Planning B: Planning and Design,
and De Cola, L. (eds), Fractals in Geography, 25: 4147.
pp. 187121. Englewood Cliffs, NJ: Prentice-Hall.
De Cola, L. (1989). Fractal analysis of a classied
Cavailhs, J., Frankhauser, P., Peeters, D. and Landsat scene. Photogrammetric Engineering and
Thomas, I. (2004). Where Alonso meets Sierpinski: Remote Sensing, 55: 601610.
An urban economic model of a fractal metropolitan
De Cola, L. (1991). Fractal analysis of multiscale spatial
area. Environment and Planning A, 36: 14711498.
autocorrelation among point data. Environment and
Chen, M.S., Han, J. and Yu, P.S. (1996). Data mining: Planning A, 23: 545556.
An overview from a database perspective. IEEE
de Almeida, C.M., Batty, M., Monteiro, A.M.V.,
Transactions on Knowledge and Data Engineering,
Cmara, G., Soares-Filho, B.S., Cerqueira, G.C.
8: 866883.
and Pennachin, C.L. (2003). Stochastic cellular
Clarke, K.C. (1993). One thousand Mount Everests? automata modeling of urban land use dynamics.
In: Lam, N.-S. and De Cola, L. (eds). Fractals in Computers, Environment and Urban Systems, 27:
Geography, pp. 265281. Englewood Cliffs, NJ: 481509.
Prentice-Hall.
De Keersmaecker, M.-L., Frankhauser, P. and Thomas, I.
Clarke, K.C. (2003). Geocomputations future at (2003). Using fractal dimensions for characterizing
the extremes: High performance computing and intra-urban diversity: The example of Brussels.
nanoclients. Parallel Computing, 29: 12811295. Geographical Analysis, 35: 310328.
Clarke, K.C., Brass, J.A. and Riggan, P.J. (1994). Dougherty, M.S., Kirby, H.R. and Boyle, R.D. (1994).
A cellular automaton model of wildre propagation Using neural networks to recognise, predict and
and extinction. Photogrammetric Engineering and model trafc. In: Bielle, M., Ambrosino, G. and
Remote Sensing, 60: 13551367. Boero, M. (eds), Articial Intelligence Applications
to Trafc Engineering, pp. 233250. Utrecht, The
Clarke, K.C. and Gaydos, L.J. (1998). Loose-
Netherlands: VSP.
coupling a cellular automaton model and GIS:
Long-term urban growth prediction for San Epstein, J.M. (1999). Agent-based computational
Francisco and Washington/Baltimore. International models and generative social science. Complexity,
Journal of Geographical Information Science, 12: 4(5): 4160.
699714.
Epstein, J.M. and Axtell, R. (1996). Growing Articial
Clarke, K.C., Hoppen, S. and Gaydos, L. (1997). Societies: Social Science from the Bottom Up.
A self-modifying cellular automaton model of Cambridge, MA: MIT Press.
historical urbanization in the San Francisco Bay area.
Esser, I. and Schreckenberg, M. (1997). Microscopic
Environment and Planning B: Planning and Design,
simulation of urban trafc based on cellular
24: 247261.
automata. International Journal of Modern Physics C,
Clarke, K.C. and Schweizer, D.M. (1991). Measuring the 8: 10251036.
fractal dimension of natural surfaces using a robust
Fischer, M.M. and Abrahart, R.J. (2000). Neurocom-
fractal estimator. Cartography and Geographic
puting: Tools for geographers. In: Openshaw, S. and
Information Systems, 18: 3747.
Abrahart, R.J. (eds), GeoComputation, pp. 187127.
Couclelis, H. (1985). Cellular worlds: A framework for London: Taylor and Francis.
modeling micro-macro dynamics. Environment and
Fischer, M.M. and Gopal, S. (1994). Articial neural
Planning A, 17: 585596.
networks: A new approach to modeling interre-
Couclelis, H. (1988). Of mice and men: What rodent gional telecommunication ows. Journal of Regional
populations can teach us about complex spatial Science, 34: 503527.
dynamics. Environment and Planning A, 20: 99109.
Fischer, M.M. and Leung, Y. (2001). Geocomputational
Couclelis, H. (1998a). Geocomputation in context. modeling techniques and applications: Prologue.
In: Longley, P.A., Brooks, S.M. McDonnell, R. and In: Fischer, M.M. and Leung, Y. (eds), GeoCom-
Macmillan, B. (eds), Geocomputation: A Primer, puational Modeling: Techniques and Applications,
pp.1729. New York: Wiley. pp. 112. Berlin: Springer.
416 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
Flake, G.W. (1998). The Computational Beauty of study of sunspot prediction and solar climate trends.
Nature: Computer Explorations of Fractals, Chaos, Geographical Analysis, 27: 4259.
Complex Systems and Adaptation. Cambridge, MA:
Hgerstrand, T. (1970). What about people in regional
MIT Press.
science? Papers of the Regional Science Association,
Flexer, A. (1999). On the use of self-organizing maps 24: 721.
for clustering and visualization. In: Zytkow, J.M.
Hare, M. and Deadman, P.J. (2004). Further towards
and Rauch, J. (eds), Principles of Data Mining and
a taxonomy of agent-based simulation models
Knowledge Discovery, Lecture Notes in Articial
in environmental management. Mathematics and
Intelligence 1704, 8088.
Computers in Simulation, 64: 2540.
Foody, G.M. (1995). Land cover classication by an
articial neural network with ancillary information. Healey, R., Dowers, S., Gittings, B. and Mineter, M.
International Journal of Geographical Information (eds) (1998). Parallel Processing Algorithms for GIS.
Systems, 9: 527542. London: Taylor and Francis.
Fotheringham, A.S. (2000). GeoComputation analysis Hepner, G.F., Logan, T., Ritter, N. and Bryant, N.
and modern spatial data. In: Openshaw, S. and (1990). Articial neural network classication using
Abrahart, R.J. (eds), GeoComputation, pp. 3348. a minimal training set: comparison to conven-
London: Taylor and Francis. tional supervised classication. Photo- grammetric
Engineering and Remote Sensing, 56: 469473.
Fotheringham, A.S., Batty, M. and Longley, P. (1989).
Diffusion-limited aggregation and the fractal nature Hill, T., OConner, M. and Remus, W. (1996).
of urban growth. Papers of the Regional Science Neural network models for time series forecasts.
Association, 67: 5569. Management Science, 42: 10821092.
Gimblett, H.R., Richards, M.T. and Itami, R.M. Illingworth, V. and Pyle, I. (1997). Dictionary of
(2002). Simulating wildland recreation use and Computing, New York: Oxford University Press.
conicting spatial interactions using rule-driven intel- Kelley, K. (2002). God is the machine. Wired, 10.12.
ligent agents. In: Gimblett, H.R. (ed.), Integrating Available at www.wired.com.
Geographic Information Systems and Agent-based
Modeling Techniques for Simulating Social and Kurtzweil, R. (1999). The Age of Spiritual Machines:
Ecological Processes, pp. 211243. Oxford, UK: When Computers Exceed Human Intelligence.
Oxford University Press. New York: Penguin
Gong, P., Pu, R. and Chen, J. (1996). Mapping eco- Lam, N.-S. and De Cola, L. (1993). Fractal measure-
logical land systems and classication uncertainties ment. In: Lam, N.S.-N. and De Cola, L. (eds), Fractals
from digital elevation and forest-cover data using in Geography, pp. 2355. Englewood Cliffs, NJ:
neural networks. Photogrammetric Engineering and Prentice-Hall.
Remote Sensing, 62: 12491260. Lam, S.-N. and Liu, K. (1996). Use of space-lling
Goodchild, M. and Klinkenberg, B. (1993). Statistics of curves in generating a national rural sampling frame
channel networks on fractional Brownian surfaces. for HIV/AIDS research. Professional Geographer, 48:
In: Lam, N.S.-N. and De Cola, L. (eds), Fractals 321332.
in Geography, pp. 122141. Englewood Cliffs, NJ:
Li, X. and Yeh, A.G.-O. (2000). Modelling sustainable
Prentice-Hall.
urban development by the integration of constrained
Goodchild, M. and Mark, D. (1987). The fractal nature cellular automata and GIS. International Journal of
of geographic phenomena. Annals of the Association Geographical Information Science, 14: 131152.
of American Geographers, 77: 265278.
Longley, P (1998). Foundations. In: Longley, P.A.,
Gopal, S. and Fischer, M.M. (1996). Learning in Brooks, S.M., McDonnell, R. and MacMillan, B. (eds),
single hidden-layer feedforward network mod- Geocomputation: A Primer, pp. 315. New York:
els: Backpropagation in a spatial interaction John Wiley.
modeling context. Geographical Analysis, 28:
Longley, P. (2000). Fractal analysis of digital spatial
3855.
data. In: Openshaw, S. and Abrahart, R.J. (eds),
Gopal, S. and Scuderi, L. (1995). Application of GeoComputation, pp. 293312. London: Taylor and
articial neural networks in climatology: A case Francis.
GEOCOMPUTATION 417
Maes, P. (1995). Modeling adaptive autonomous Peitgen, H.-O., Jrgen, H. and Saupe, D. (2004). Chaos
agents. In: Langton, C. (ed.), Articial Life: An and Fractals: New Frontiers of Science, 2nd edn.
Overview, pp. 135162. Cambridge, MA: MIT Press. New York: Springer.
Mandlebroit, B.B. (1967). How long is the coast Peterson, C. and Sderberg, B. (1993). Articial neural
of Britain? Statistical self-similarity and fractional networks, in Reeves, C. R. (ed.) Modern Heuristic
dimension. Science, 155: 636638. Techniques for Combinatorial Problems, New York:
John Wiley, 197242.
Mandlebroit, B.B. (1983). The Fractal Geometry of
Nature, New York: W.H. Freeman. Phillips, J.D.. (1993a). Interpreting the fractal dimension
of rivers. In: Lam, N.S.-N. and De Cola, L. (eds),
McHarg, I.L. (1969). Design with Nature, 1st edn.
Fractals in Geography, pp. 142157. Englewood
Garden City, NY: Natural History Press.
Cliffs, NJ: Prentice-Hall.
Mnard, A. and Marceau, D.J. (2005). Exploration
Phillips, J.D. (1993b). Spatial-domain chaos in land-
of spatial scale sensitivity in geographic cellular
scapes. Geographical Analysis, 25: 101117.
automata. Environment and Planning B: Planning
and Design, 32: 693714. Phillips, J.D. (1999a). Earth Surface Systems: Complex-
ity, Order and Scale. Oxford, UK: Blackwell.
Miller, H.J. and Wentz, E.A. (2003). Representation and
spatial analysis in geographic information systems. Phillips, J.D. (1999b). Spatial analysis in physical
Annals of the Association of American Geographers, geography and the challenge of deterministic
93: 574594. uncertainty. Geographical Analysis, 31: 359372.
Mineter, M.J. and Dowers, S. (1999). Parallel processing Phipps, M. and Langlois, A. (1997). Spatial dynamics,
for geographic applications: A layered approach. cellular automata and parallel processing computers.
Journal of Geographical Systems, 1: 6174. Environment and Planning B: Planning and Design,
24: 193204.
Moon, F.C. (1992). Chaotic and Fractal Dynamics: An
Introduction for Applied Scientists and Engineers. Rizzo, D.M. and Dougherty, D.E. (2004). Characteri-
New York: John Wiley. zation of aquifer properties using articial neural
networks: Neural kriging. Water Resources Research,
Mozolin, M., Thill, J.-C. and Usery, E.L. (2000). Trip
30: 483498.
distribution forecasting with multilayer perceptron
neural networks: A critical evaluation. Transportation Shen, G. (2002). Fractal dimension and fractal
Research B, 34: 5373. growth of urbanized areas. International Journal of
Geographical Information Science, 16: 419437.
Nijkamp, P. and Reggiani, A. (1990). Logit models and
chaotic behaviour: A new perspective. Environment Shi, W. and Pang, M.Y.C. (2000). Development of
and Planning A, 22: 14551467. Voronoi-based cellular automata: An integrated
dynamic model for geographical information
Nijkamp, P., Reggiani, A. and Tritapepe, T. (1996).
systems. International Journal of Geographical
Modelling inter-urban transport ows in Italy: A com-
Information Science, 14: 455474.
parison between neural network analysis and logit
analysis. Transportation Research C, 4C: 323338. Sipser, M. (1997). Introduction to the Theory of
Computation. Boston, MA: PWS Publishing.
Openshaw, S. (2000). Geocomputation. In: Openshaw, S.
and Abrahart, R.J. (eds), GeoComputation, Smith, J. and Eli, R.N. (1995). Neural-network models of
pp. 131, 293312. London: Taylor and Francis. rainfall-runoff processes. Journal of Water Resources
Planning and Management, 121: 499509.
OSullivan, D. (2001). Exploring spatial process
dynamics using irregular cellular automaton models. Takeyama, M. and Couclelis, H. (1997). Map dynamics:
Geographical Analysis, 33: 118. Integrated cellular automata and GIS through
geo-algebra. International Journal of Geographical
Parker, D.C., Manson, S.M., Janssen, M.A.,
Science, 11: 7391.
Hoffmann, M.J. and Deadman, P. (2003).
Multi-agent systems for the simulation of land- Tesfatsion, L. and Judd, K.L. (2006). Handbook
use and land-cover change: A review. Annals of Computational Economics, Volume 2: Agent-
of the Association of American Geographers, Based Computational Economics, Amsterdam:
93: 314337. North-Holland.
418 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
Tobler, W. (1979). Cellular geography. In: Gale, S. and regional systems. Computers, Environment and
and Olsson, G. (eds), Philosophy in Geography, Urban Systems, 24: 383400.
pp. 379386. Dordrecht: D. Reidel.
Williams, G.P. (1997). Chaos Theory Tamed.
Turton, I. (2000). Parallel processing in geography. Washington, DC: Joseph Henry Press.
In: Openshaw, S. and Abrahart, R.J. (eds), GeoCom-
Wilson, A.G. (2006). Ecological and urban systems
puatation. pp. 4966. London: Taylor and Francis.
models: Some explorations of similarities in the
Wentz, E.A. (2000). A shape denition for geographic context of complexity theory. Environment and
applications based on edge, elongation and perfora- Planning A, 28: 633646.
tion. Geographical Analysis, 32: 95112.
Wolfram, S. (1984). Universality and complexity in
White, R.W. (1990). Transient chaotic behaviour in cellular automata. Physica D, 10: 135.
a hierarchical economic system. Environment and
Wong, D.W.S. and Fotheringham, A.S. (1990).
Planning A, 22: 13091321.
Urban systems as examples of bounded chaos:
White, R. and Engelen, G. (1993). Cellular automata exploring the relationship between fractal dimen-
and fractal urban form: a cellular modelling approach sion, rank-size and rural-to-urban migration.
to the evolution of urban land-use patterns. Geograska Annaler, 72B: 8999.
Environment and Planning A, 25: 11751199.
Wu, F. (2002). Calibration of stochastic cellular
White, R. and Engelen, G. (1997). Cellular automata as automata: The application to ruralurban land
the basis of integrated dynamic regional modeling. conversions. International Journal of Geographical
Environment and Planning B: Planning and Design, Information Science, 16: 795818.
24: 235246.
Xie, Y. (1996). A generalized model for cellu-
White, R. and Engelen, G. (2000). High-resolution lar urban dynamics. Geographical Analysis, 28:
integrated modeling of the spatial dynamics of urban 350373.
22
Applied Retail Location
Models Using Spatial
Interaction Tools
Morton E. OKelly
22.1. RETAIL LOCATIONAL have a demand for services also, but for
ANALYSIS1 that business it could be some combination
of over-the-counter sales (light fixtures) and
22.1.1. Spatial retail location more substantial electrical equipment sold to
contractors and builders. A business with a
The demand by consumers for retail goods traditional central market place location (in
and services is a function of the attributes an older mixed use inner city neighborhood
of the commodity, household income, and for example) might conceivably want to
other factors such as home ownership status. branch out its locations to catch the growth
For example, a home improvement store is in the suburbs and even the outlying
likely to target a market with a housing communities in the hinterland of that main
stock that has lots of possibilities for repair, market. In fact there are so many different
upgrades, and remodels. Both home-owning ways to imagine the dynamics of retail site
and renting populations might yield adequate location that there is a real need for a general
density of demand, but the effective demand purpose simulation tool that might enable the
for goods and services by homeowners is estimation of the merit of various growth
much more likely to be attractive to this par- proposals (Baker, 2000; Munroe, 2001). In all
ticular service. An electrical supplier would these cases, it is important to have an accurate
420 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
estimate of the spatial distribution of effective where they are located, and that they have
demand as arising out of a combination of income that covers the price and market
preferences, and disposable income. segment of the goods. In common with many
Central place theory has long held levels of retail operation, many of the most
that there is a hierarchy of goods, from successful chains study a massive amount
frequently demanded inexpensive items to of geo-demographic profile data that enables
high-end expensive goods. There is both a rich portrait of customers and consumer
a higher spatial frequency of demand for behavior to inform the merchandize and
(and provision of) the so-called lower-order market planning of their operations.
goods, and a corresponding scarcity on the
landscape of higher-order goods. Thus for
every Mercedes or Lexus dealership in the
22.1.2. Consumer demand and
city there might be numerous Ford and
behavior 3
Toyota dealerships. The higher the order of
goods provided, one assumes that there is Measuring the total income and pool of
a wider market scope required to provide expenditure is accomplished by combining
sufficient demand to cover the operating costs a count of households by geo-demographic
of the business (the so-called threshold). cluster (e.g., Claritas PRIZM, MapInfo
Similarly, the higher-order goods, because PSYTE, ESRI Tapestry, AGS Mosaic etc.),
of their relative scarcity on the landscape, the index value for each group (m), the pen-
require longer trip lengths; the break even etration rate and some index of average
calculation for the retailer is whether the per household expenditure.4 To calculate
spatial extent of the market required to the potential pool of expenditure for the
cover costs is matched by a corresponding zone i and commodity c a formula such as
willingness of consumers to travel to the the following might be used:
center for the goods (see the classic study
by Berry (1967)).
Inexpensive low order goods are some- Oic = Nim ymc
over all groups
times sold in combinations with higher priced
items from superstores that do not necessarily
have a small range: they can in fact be where Oic is the demand in zone i for
attractive over a large distance, provided commodity c, Nim is the number of group m
the assortment and price point allows the households in zone i, and ymc is an
large agglomerated retailer to undercut the expenditure rate per household cluster m on
smaller more widely dispersed providers of commodity c.
retail services. This formula is used by From this aggregate, demand shares
Wal-Mart or other big box retailers; they allocated to a particular store have two
have a large assortment of goods, and price components: on the one hand the share is
points that are competitive, and locations that smaller for the more distant competitive
in themselves act as a magnet for spatial stores (holding other factors constant), but
interaction (Munroe, 2001). Customers travel additionally it is felt that the demand for
to stores and therefore the spatial interaction a store increases with the accessibility of
of the purchasers must be recognized as that origin zone to any shopping destination.
an important behavioral factor. The retail Zonal accessibility, and hence aggregate
and trade area service location problem demand is a function of where the stores
requires knowledge of what customers want, are located, and so unlike conventional
APPLIED RETAIL LOCATION MODELS 421
location models, we should not treat demand foot, then a spatially varying parameter
as an exogenous factor (OKelly, 1999). might trend significantly with location given
While it would be naive to say build the socio-economic patchwork of the city,
it and they will come, it is certainly by analogy with a similar argument in
reasonable to think that the provision of the context of house prices (Fotheringham
retail services can induce demand for that et al., 2002).
service that would otherwise be allocated One way to make operational estimates
to other discretionary uses. Some insight is through spatial interaction models. These
and market-based intelligence is needed to models are the topic of this chapter, which
capture the correct demand parameters and covers a variety of models largely inspired by
sensitivity to locational access. The basic several years experience as both an applied
accessibility of each zone can be predefined, and as a theoretical exploration of retail sales
and the demand in the immediate area of and interaction.
a new potential store opening can increase,
as a result of improved accessibility. One
practical estimation approach that can be
22.1.3. The role for models
effective is to have a variety of alterative
sources of judgment (like the so-called Among the most basic general questions
Delphi method, and a variant of the for spatial interaction modelers are the
judgmental methods advocated for several following: Where do the customers come
years by Seldin (1995)) with perhaps one from? What are the spatial interaction
figure coming from an estimate of per capita patterns governing the distribution of
expenditure and saturation, one coming from distance and attraction parameters? What is
pro forma estimates of expected sales per the probability that a customer at i patronizes
square foot and yet another estimate coming a store at j? Conditional upon the location of
from an experienced local commercial real i what is the probability of being a customer
estate professional. The best model is likely of destination j?
to use some aspect of these data as controls on
the judgmental estimate. In other words, no
analyst will simply apply a sales per square Example
foot figure to an arbitrary new built store and A grocery store has an upper income target
say that the expected sales are a product of consumer. Their research shows that these
the coefficient and the store size. Much more are very likely to be loyal customers of
likely is an analysis that takes the current the produce and fresh foods departments
sales situation of the competition into account (which in turn are highly profitable assuming
and then projects how much of these existing that stock can be turned over rapidly to
sales can be captured by the new proposed avoid waste/spoilage). In seeking new store
location. Even more significant is recent locations, where are there sufficient pockets
research that has shown that whatever the of un-met demand among this target popula-
general relationship between the variables, tion? In the analysis of existing own-brand
the strong likelihood of spatial variability in stores, which may or may not be currently
such a relationship ought to be taken into well-located vis--vis the standard customer
account (Fotheringham et al., 2002; Rust profile, is there any need to modify their type
and Donthu, 1995). Thus, if a cross-sectional of store to meet consumer needs?
regression analysis provided evidence of a These kinds of questions can be answered
coefficient of say $30 weekly sales per square with spatial models. Before we get into the
422 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
details of how to formulate and apply such a Models can also be used as a tool in
model, it may be very helpful to get a preview assessment of complex strategic questions.
of some of the uses to which a model might For example, a chain that is considering
be put. One common usage is in impact opening a new branch in a growing suburb
analysis. With a fitted model, purporting might be faced with the question of whether
to describe the allocation of consumers to to keep an existing older store in a nearby
demand centers, we can estimate impact on location. The question is then one of strategy:
remaining stores if a branch is to be closed, do stores A and B together make a better
or indeed if we open a new one. Both of combined profitable solution than the option
these changes have impacts across the system of closing B, presumably giving A an
of stores, but of course the first law of even greater new opening sales level, but
geography (Tobler, 1970) which holds that possibly exposing the chain to the risk that
things are more highly interrelated when they a competitor might take the abandoned site?
are in close proximity, leads us to expect that Not only does the decision hinge on the
the impacts are greatest on the centers and aggregate sales of the various combinations
competitors closest to the site of the change. of open stores but it also must answer
Other uses for fitted models in locational questions about the probable impact of
analysis include assessment of the desirability competitors. Retailers engage in strategic
of overhauling various stores or facilities. behavior, and open or close locations as part
The applied retail analyst is often asked of a system of decisions; such analyses often
to estimate the impact of a change on the include issues of pre-emption and blocking
expected sales of the store: thus having competition, and beating competitors to the
a model which has as its independent punch in new areas of expansion (Ghosh and
variables some measures which can be Craig, 1983).
adjusted to reflect the new attraction of the Models are also useful in assessing
store can be useful to estimate the change ongoing measures of store performance and
in the retail trade area, expected sales, and may be used in this way as an early
so on. By estimating a well-fitted model to warning of emerging shifts in the market.
these data, we replace the specifics of the Assuming that the chain can collect data
data instance with a model that has effects throughout its system on the performance of
these are systematic influences on the trends each store, and some appropriately calculated
in the levels of spatial interaction, and are variables to describe the stores site and
likely to include roles for distance and situation the analyst can embark on the
retail attraction (typical basic variables in SI kind of analog assessment made popular
models see Guy (1991)). In more elaborate in the early days of quantitative analysis.
settings these models can also include many This method, in its modern guise, uses the
other independent variables (see especially stores sales (as a dependent variable) and
the Multiplicative Competitive Interaction a selection of measurements of the trade
Models MCI Nakanishi and Cooper area characteristics, and develops a multiple
(1974)). Once these models are fitted, the regression model to assess the expected
analyst can then dial in various changes in (or predicted) sales vs. the actual observed
the driver variables, and assuming that the levels. Fundamental to this operation is a
model is reasonably robust to changes in meaningful definition of trade area: it makes
these data, the impact of the changes on the no sense to include measurement of the
expected sales and interaction levels can be attributes of areas far away from a store,
determined. if indeed it is known that few if any shoppers
APPLIED RETAIL LOCATION MODELS 423
come from that area. So, in other words, or attempting to reinvigorate the system by
the measurement of the trade area of the investing a lot of money into the regional
store becomes the first and most important advertising campaign. It all boils down to
operation. There is no hard and fast consensus choices, and these choices are best informed
on how to define a trade area, and much by analytic models.
more will be said on this matter later. For The ease of obtaining a good fit
now, suffice it to say that the trade area to the model will clearly vary across
could be objectively defined as an area within sectors. Department store sales volumes are
5 min drive time of the store. That leads notoriously difficult to predict, in that their
to a computation of the demand that exists aggregate sales volume is a combination
within that area, and that could be one of of the various heterogeneous departments,
the independent variables. (Clearly if we use and the extent of competition for spe-
more sophisticated definitions of trade areas, cific categories in these stores could very
the trade area demand calculation would have well vary in an unsystematic way across
to be re-computed.) locations. On the other hand goodness-
Independent variables collected for all the of-fit for convenience related stores such
stores are saved as the columns of a table. as grocery chains are likely to be quite
GIS is especially helpful to calculate features acceptable, in that there are a few predictable
of trade area and give a quantitative descrip- variables that are very highly correlated
tive nature of a trade area. The dependent with the aggregate performance of the
variable is the actual sales performance: there store. For example, the stores size, its
is often a challenge obtaining these data population base, and the immediate com-
(i.e., weekly sales) in academic research; but petitive environment undoubtedly account
it is important to know how these data would for the bulk of the store-to-store variation
be used in an applied case study. in sales levels. Thus, it is expected that
Basing store closing decisions on this kind the coefficient of store size, and population
of result places a lot of faith in the fitted and competition will be significant, and
model (and so the importance of regression that the resulting fitted model will have a
diagnostics, measures of goodness of fit, strong R-square.5 Refinements to the model
and significance levels on the estimated to include regional dummy variables and
coefficients). What looks like an under- other more precise measures of target market
performer may not actually be an instant demand (through surrogates such as parking
argument for store closure: for instance a studies, or traffic flow) are likely to help to
store is projected to draw $400,000 per week, improve the model.
but actual sales come in at $350,000 (i.e., Some sectors lend themselves readily to
$50,000 below the regression line). While analysis by multivariate regression models
all might agree that it could be doing better (grocery stores) but others require a different
(i.e., it is performing below potential) there approach. If a shoe store, book store, or
may be good reasons that the store has branch of a chain of clothing stores is
not yet reached its full potential. It might typically located in shopping centers, then the
be under attack from particularly aggressive analyst might use the center as a surrogate for
competitors, be poorly managed, or it might the size of the market in which the individual
be built over-sized in anticipation of further store is located (see also Prendergast et al.,
population growth in the area. The store may 1998). Similarly if a chain of this type is
suffer from a depressed regional economy, planning to enter a new regional market,
and so a chain may consider shutting it down it could very well limit its attention to
424 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
the shopping centers. This type of work is flows provides an idea of the likely inflow
useful because it is frequently necessary to to each of the unconstrained destination
manage thousands of location across many trip ends:
areas/regions.
It is hard to get information on gross sales
(what is also called turnover in the British Dj = Pij = Ai Oi Wj exp(bCij ).
i i
literature) in academic case studies, though
practitioners and consultants can of course
gain access to their clients data as part of The production constrained model leaves the
their confidentiality agreement. Many of the amount and type of flow arriving at each
ideas in this chapter have been framed as a center or store open to calculation. With
result of real world experience. In practice, such calculated inflows, the analyst has an
one has access to lots of data; in theory one access to a predictive model for the likely
might have to learn these techniques in a data composition and size of any centers for its
vacuum, recognizing that the proprietary data capture area. Think of a column of the spatial
would become available to a consultant doing interaction matrix that leads to a specific
these analyses for a private sector client. This destination as a listing of the contributions
perhaps accounts for the lack of precision in to that particular destination. Of all the flows
the published literature a lot of literature in that arrive at the destination, we may estimate
retailing location modeling is quite imprecise the percentage that comes from each one of
mathematically and the details are often not the surrounding regional sources. From all
published in a way that makes verification of those, the core or primary contributors
and validation easy. may be determined by sorting the origins
from largest to smallest and cumulating their
contributions until arriving at a subset that
contributes a very significant fraction of
22.1.4. Consumer choice
the total business of the store of interest.
The probabilistic assignment of consumers This is none other than Applebaums (1966)
to retail destinations can be formulated as concept of primary trade area being the
a production constrained spatial interaction region from which a particular store draws
model: a high percentage (say 75%) of its business.
Pij = Ai Oi Wj exp(bCij ).
22.2. ANALYSIS WITH RETAIL
Such models calculate the probability that a
TRADE AREA MODELS
user at a specific origin location will select
22.2.1. Spatial interaction6
one from a number of available alternative
attractive destinations. If these destinations Spatial interaction models in general assume
are shopping centers, for example, the that interaction is determined by the
attraction of those centers can be represented attraction of the alternative facilities and
by a measure of their total retail square by the distance separating the consumer
feet of selling area. Once a calibrated from those alternatives. Huff (1962, 1963,
production constrained spatial interaction 1964) and Lakshmanan and Hansen (1965)
model has been formulated for a specific set are credited with developing specialized
of destinations, the estimated table of such retail variants of the spatial interaction
APPLIED RETAIL LOCATION MODELS 425
based allocation model. From an operational geographic market area. Instead each stores
perspective, Huff introduced a practical market area is a probabilistic surface that
approach to defining the attraction of a shows the probability of a customer from
center as the amount of floor space, rather each small geographic area patronizing that
than the population of the surrounding area as facility. The exact nature of this probabil-
was commonly used in previous models. This ity surface depends on the parameters of
opened up the interpretation of attractiveness the spatial interaction model. Incorporating
and allowed it not only to be determined spatial interaction models into a location
by a number of variables (e.g., number allocation model represents the state of the
of functions, parking capacity, etc.) but art in modeling retail site selection.
also allowed attractiveness to be treated
as an independent variable that could be
estimated in its own right. Another major
22.2.2. Primary trade area
operational consideration was that Huff fitted
the exponent for distance in trip-making Imagine a store attracting customers from
behavior (the influence that distance has surrounding census tracts or city blocks. Such
on a consumers store choice) to particu- data have long been analyzed by proponents
lar circumstances. Finally, he introduced a of the applied school of retail trade area anal-
balancing term that constrained the sum of ysis (Applebaum, 1966). As a starting point,
individual or zonal travel or sales to fit within examine the distribution of the customers of
an overall travel or sales limit. a particular store, with regard to their origins.
With respect to the attractiveness or If the store has a weekly volume of V , then
drawing power of a facility, Huffs use of the customer distribution is used to spread
retail floor space has been widely adopted around that demand to the originating areas,
and adapted to include other important in proportion to their draw of customers.
characteristics. Most important, though, this That spatially distributed demand in turn can
model demystified the idea of drawing power be compared to the potential pot of money
or attraction and allowed its direct estimation that exists in those zones available to be
by focusing on the weight associated with it. spent somewhere, in order to compute a
Nakanishi and Cooper (1974) were particu- measure of store penetration of the market.
larly effective at utilizing Huffs probabilistic From the data, the top 75% (say) of the
choice framework and operational perspec- sales area may be devised, followed by
tive to develop a linearization procedure for the next 20% and the rest (all these are
direct estimates of attractiveness. The MCI hypothetical numbers). Unless some added
model is one of the best tools available spatial constraints are added, it is important
for the allocation of consumer demand to to note that it is not essential for the top
facilities. The main advantages of this model contributing area to a store to be compact
is that it can incorporate a variety of attributes (having for example disconnected outliers).
of the facilities under consideration by the Analytically, the primary trade area, P, is
consumer, yet it is easy to estimate. In cases defined such that iP Pi | j = 0.75 and
where more data on the influence of various the secondary trade area, S, is defined such
store attributes are available, the MCI model that iS Pi | j = 0.20. The remaining or
is apt to provide a more accurate estimation tertiary trade area, captures the remainder of
of market share than the original Huff model. the customers, often sparsely dispersed over
With spatial interaction models, then, a very wide area. For most practical purposes
facilities no longer have a well-defined in the convenience sector, tertiary areas are
426 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
irrelevant to routine operations. On the other tools to diagnose practical issues in the
hand, significant shopping centers drawing trade areas effectiveness, for example, by
from a large region may well have to treat indicating untapped sales potential, the need
the marginal sales to the edge of their for more intense marketing, or special
tertiary area as significant icing on the sales circumstance arising from unique factors
forecast, and may in fact be the key to (ethnicity, mobility, etc.).
understanding top-performing locations.
Retail executives are especially interested
in market share, strength versus direct 22.2.4. Connecting retail location
competitors, and in the yield of customers models and competing
from a pool of potential sales dollars. It destinations
seems that the only thing worse than a
store that has a small sales level is one Retail locational analysis is frequently carried
with a large volume but under-performing out with the aid of spatial interaction
its projected potential! These analyses are modeling. Many features of the trade area
directed to the question: how well are are derived from calculations based on either
our stores capturing the market? Are we actual customer origins (from a survey) or
leaving potential sales untapped? Or are our from a model of such a distribution that has
competitors out-maneuvering us? Penetration been fitted from observations. In either case
of the market area hinges on an assessment assume that the probability that a customer
of how much demand is available there, and in area i shops in store j is given by Pij . This
how much our particular branch is capturing. joint probability can be further manipulated
to give Pi | j and Pj | i , respectively these are:
22.2.3. Characterization of the Pi | j = Pij / Pij is the conditional
i
demography of the
trade area probability that a customer who
shops in j originates from i,
The attributes and weights of demand from
the particular types of respondents in the
trade area can then be recovered. Say, for and:
example, that the numbers of household
in the various tracts that have particular
levels of household income are given. Many Pj | i = Pij / Pij is the conditional
j
useful statistics can be computed from
these data. Among these are the expected probability that a customer from
values of customer characteristics over the origin i shops in zone j.
primary, secondary, and tertiary trade areas
respectively. For example, if we have a
defined area that encloses the primary trade It is this later probability that is highly
area, and the total volume of expenditure useful as it allows a prediction from a given
in that area is X, then the total volume zone i, of how much traffic or business might
attracted to the store of interest from within be expected to arrive at a destination in
that same are is Z, the ratio of X to Z zone j, and this of course can be applied
is very useful information about penetration either to pre-existing stores (to check model
of the market. These analyses provide the fit and validity) as well as the use of the
APPLIED RETAIL LOCATION MODELS 427
model to forecast the likely patronage of a is corrected, the expectation might be that
new or proposed location at j. In that these peripheral residents might show a willingness
probabilities are analytically derived from to travel to distant alternatives at a rate
data that are exogenously available (travel that exceeds those of the comparatively well
times, demand expenditure parameters, and served central residents.
so on) they are quite easily manipulated to This notion of a process at one density
give forecasts of what if for cases where regime being adapted for other situations
there are expected changes in the data or the was nicely foreshadowed in Berrys (1967)
parameters. This kind of sensitivity analysis classic work on commercial centers when
can provide a useful cross check on the the expected sales territory size was con-
validity of the model for example, a trasted in low density rural Iowa with the
sensitivity analysis should predict changes more commercially dense built up areas of
that make sense. Further, extreme values Chicago. Thus there is some interest in
of the parameters often provide consistency whether this theory might be adapted to a
checks in that the model collapses to other more dense urban retail scenario. In the retail
easily recognized forms in these special scenario the central or core resident has lots
circumstances: thus a model with a distance of alternatives within short range, and these
decay parameter collapses to an all-or- can provide opportunity for multipurpose
nothing nearest center allocation model in trips and shopping on a scale that combines
the case that the beta parameter is driven multiple activities. As Eaton and Lipsey have
to the extreme value. In this case the trade shown, such retail agglomerations then gain
area should take on characteristics such more from their collocation than they lose
as that seen in the Voronoi diagram or from the presence of intensified competition.
Thiessen polygons. Thus the theory of competing destinations
In macro spatial analysis (e.g., at the scale developed at a primarily interurban scale
of interregional interactions) the peripheral might be refined for the case of flows within
areas have, by definition, lower access to the an urban area, and indeed the opportunity
dense cluster of the urban core. So, for a to make multi-purpose trips to clusters of
resident of the periphery the number of com- shops in a city might lead to an expected
petitive alternatives in short range is com- agglomeration effect: what we might coin the
paratively small, and according to the theory cooperative destinations effect arising from
of competing destinations (Fotheringham, spillovers in retail demand (see early theory
1983), the demand is therefore spread over of Eaton and Lipsey, 1982).
few alternatives (hence is not divided up
so thinly). It would be expected therefore
that interaction levels over short distances
are enhanced (and comparably the interaction 22.3. CALCULATIONS
over the longer distances is spread thinly,
22.3.1. Data issues
and hence the slope of the flow vs. distance
curve is steeper than it would be expected An interesting aspect of retail trade area
to be, absent a spatial structure effect). At analysis is that the most commonly col-
macro scales then the large beta for peripheral lected data (choice-based samples) are not
zones results from mis-specification, and especially well suited to direct manipulation
does not correctly imply that there are in calibration (see a series of papers on
larger distance decay impacts for peripheral choice based samples by OKelly (1999) and
residents; in fact, once the mis-specification Ding and OKelly (2008)). Choice based data
428 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
from frequent shopper cards at the point of Very large energy costs cause a contraction
sale or from check based data can tell us in peoples willingness to travel long distance
the distribution of actual demand around a or make excess discretionary trips; instead
current store. Clearly the interest in these one would expect two countervailing forces:
data from a predictive point of view is to be to make a smaller number of multipurpose
able to use them to devise some origin based trips to major agglomerations would serve
parameters such that the trade area attributes to support the development of a small
that determine the store success/failure can number of heavily clustered mega malls;
be studied and translated into parameters that on the other hand the smaller willingness
can predict how a proposed new location to travel might cause a stronger tendency
(assuming that represented stores provide a to use the closer alternatives and activate
decent analog for the new operation) might the incentive to build a series of small
be expected to perform. One could expect decentralized regional centers. This trade-off
to take data about existing operations, and between agglomeration and convenience is
develop a list of those parameters of the trade an interesting empirical question.
area that are expected to correlate heavily
with good retail performance. The interaction
model is simply an improved way to gather
22.3.2. Determination of market
data and summarize standardized aspects of
effectiveness and
these trade areas to provide data about the
penetration
branches. In applications, these data can then
be entered into regression or other models to The idea in retail interaction modeling
determine the different aspects of the trade is to use a probabilistic estimate of the
areas that are especially highly correlated demand originating in each sub-area, and
with successful operations. its likelihood of being spent at a particular
An important step in managing a retail store of interest. It is convenient, though
trade area data set is to understand the scope perhaps increasingly less realistic, to assume
and reach of the center to the areas sur- that the pool of available money is all
rounding the store. In fundamental economic allocated to bricks and mortar stores, and
geography we learn concept of the range of that the demand is a simple function of
the good: this is the maximum distance a the population, its income, and expenditure
customer would be willing to travel to reach habits. With that assumption it is possible
the store. This maximum radius or reach has to take readily available census expenditure
relevance for the concept of spatial interac- data and predict how much would be
tion and trade areas as there is clearly no available for particular product categories in
necessity to include demand from a place that each micro-demographic area. Such micro
is so far from the store as to be unable to reach marketing data have been used with great
that stores trade area. Distance impedance precision by the package goods industry,
and maximum travel radius are critical to the car industry, banks, and retailers in general.
accurate specification of gravity models. In These applications represent one of the most
the case of a maximum travel radius, one powerful uses of the gravity model. Some
has to be sure to set up a spare or dummy industry specific intelligence is needed with
destination to allow for demand that has no regard to the reasonable range of potential
feasible option within range to be parked destinations from the point of view of an
there pending either some additional site, or origin. This is because it is necessary to
some relaxation of the maximum range. be able to make an all-inclusive list of the
APPLIED RETAIL LOCATION MODELS 429
probabilistic choice sets that exist or that do with such data alone is to talk about
might provide opportunities for the shoppers the residents in a particular subareas and
to make choices. To adapt this base case their probability of being a customer. For
to the more realistic case of alternative those who are customers (and for those who
non-spatial alternatives (in competition with are not) we need some additional way to
conventional alternatives), we need to be measure reasons as to why or why not. To
able to estimate leakage from an origin get at these added questions we either need
area to electronic, catalog, and on line prior theoretical expectations, or to employ
purchases. From the retailers point of view a survey to ask residents in a residential
at a specific location, it is necessary to be area about their reasons for shopping or not
able to circumscribe the potential originating shopping at our chain. As surveys tend to
zones from which the trip makers might be very expensive, a controlled theoretical
be attracted. For a convenience-oriented choice experiment is perhaps a worthwhile
store like a supermarket, one can imagine future framework for such destination choice
a reasonably compact service area. For problems (see Eagle, 1984).
department stores, or retailers co-located with From these two sources of data detailed
attractions that can draw from farther places intelligence about the trade areas of the
(think of Mall of America as a destination), it various branches can be accumulated and
is perhaps a little more difficult to know the the results used to characterize the stores; if
universe of the attraction, and hence difficult there are added data from the retailer about
to make computations of the share of the which stores are under- or over-performing,
attraction provided for by local or further we could do some correlation analysis, or
away origins. perhaps data envelopment analysis (Donthu
and Yoo, 1998) which allows a gauge of
performance vis--vis peer benchmarks.
where Ii,j,k is the impact of new store k on factors to scale up or down the sales for
is the new
existing store sales in zone i, Pik specific months.
allocation to center k from zone i, and Pij is A simple time series model, with a set of
the allocation to center j from zone i. monthly or seasonal dummy variables can
The types of scenario that can be handled be used to make an empirically fitted set
using the methodology are as follows: of correction factors. Another way that trade
area models need to be corrected is for the
excess in demand that often accompanies a
analyze the trade areas of current stores (run with
new store opening as the novelty of that
just xed locations)
location is added to the mix of existing
pick sites from candidates (run with xed and stores and, at least initially, there may be
potential locations) large incentives or advertising efforts made
to attract customers. Clearly, it would be
re-consider current sites (make currently xed advisable to temper these initial sales figures
sites exible or optional) with some kind of decay or dilution effect that
would bring the stores sales into alignment at
examine specic proposed sites (lock in particular moderate levels (see Kaufmann et al., 2000).
new sites) Rules of thumb abound in this area, and
equilibrium sales after opening may settle
analyze specic closings (lock out particular site
and see what happens)
down to say 60% of the initial week sales.
research publications.7 This diffusion of the hypothesis about spatial behavior. Instead,
innovation of retail trade area analysis from we now expect that consumer behavior
specialized journals such as Environment and may be examined with the same tools
Planning A, into many applied sectors has that econometricians have devised for the
been a major success for analysts. These analysis of discrete choice. Databases in
models serve as a critical underpinning of turn provide a wealth of data. Geographers
the site selection analysis that goes into many have derived a representation of consumer
large format stores in almost every urbanized behavior with a model that locates services;
area in the U.S. and Europe. The reason that this involves a breakthrough in the use of
such models are widely used is that they are spatial interaction models. The key idea was
essential to the rapid pro-forma evaluation to replace the nearest center assignment of
of numerous site proposals. The models customers in central place theory, with a
provide the kinds of rapid computations that more realistic gravitationally based estimate
would ordinarily have taken a great deal of likely destination choice (OKelly, 1987).
of manual computation; and certainly when Thus, the customer might have a certain
a chain is screening as many as 10 sites probability of visiting a large center that is
for every actual chosen location, the need a bit further away than a small center close
for rapid analysis is obvious. For example to the consumer. In gauging these trade-
the early studies by Applebaum (1966), offs, the model makes a carefully calibrated
directly predate the computation of trade area estimate of the impact of size and distance
penetration models that may now be made on the consumers willingness to travel to
using spatial interaction models. particular destinations. Once this calibrated
One of the goals of this chapter is to model is available to us, the analyst can
provide the analytical background to the propose specific new site locations and gauge
models that are now a commercial fact the expected level of consumer patronage
of life for retail analysis. The idea that a at those sites. So called turnover or retail
model of retail attraction could be deployed sales volume is a critical first step in the
as a model for retail site location is an analysis of any commercial property deal
extension over the simple, earliest work as the sales levels helps to support the
in central place theory, where consumers go/no go decision on rental, lease, re-model,
were assumed to patronize closest centers or closing.
(see also Ghosh, 1986). In turn the central Locationallocation models generally
place approach defined a region in close involve the simultaneous selection of
proximity to the store from which it would be locations and the assignment of demand
reasonable to expect that the demand would to those locations in order to optimize
be assigned to that particular store. Following some specified objective or goal (usually
a large amount of study of consumer behavior to maximize market share or profit; see,
indicating dispersal of choices over many for example, Craig, et al., 1984). These
alternatives beyond just the most convenient models have several advantages. They can
(Clark, 1968; Hanson, 1980; OKelly, 1981), determine the optimal (or near optimal)
market researchers and others devised more location of several stores simultaneously by
precise means of estimating likely consumer systematically analyzing the system-wide
behavior. The deterministic all-or-nothing interactions among all stores in the market
allocation of demand to the nearest or area. They are capable of utilizing a wide
most convenient branch is no longer a range of objectives that could be used
necessary or indeed acceptable simplifying in siting stores. In addition, the models
432 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
are flexible in that they can incorporate all the important aspects of retail site
the behavior of retailers, consumers and/or selection which must be addressed in order
the retailing environment. Finally, heuristics to provide the decision maker with the best
are available for these models which provide set of locations for any particular market area
good (optimal or near optimal) solutions in which the stores will be located.
and yet are easy to implement. The use Some aspects of these models are
of locationallocation models typically developed in more detail in the following
involves empirical research to determine the section.
important store attributes for the population
within the market area and a mathematical
model to determine the optimal locations
22.4.2. Retail location models and
for retail outlets based on the pattern of
spatial interaction
market demand, store chains and existing
competing outlets.8 MULTILOC (Achabal et al., 1982) was one
Even though it is recognized that many of the first locationallocation models to
consumers engage in multi-purpose, multi- simultaneously locate more than one store.
stop shopping, models of multi-purpose The model optimizes the location of stores
shopping behavior have not been thoroughly using the knowledge that consumers will
integrated into facility location analysis, choose among the alternatives according
though early efforts by OKelly (1981, to a probabilistic interaction model (the
1983a, b) have been recently reconsidered MCI model). Such models maximize total
as the basis for new location models profit for a retail chain (or a single store)
(Leszczyc et al., 2004). So the assumption after subtracting the fixed costs of estab-
of single-purpose trips is made in order to lishing a store at the determined location
devise practical (usable) store-location mod- (i.e., location-specific fixed costs). It has later
els. Nevertheless, the fact that our analysis been given a more mathematical treatment in
is primarily designed around shopping center OKelly (1987).
destinations ensures that the attraction of The major problem facing the manager
a destination for a specific store is partly of site selection is the large number of
determined by the attraction of the cluster of options from which to choose, although the
stores as a whole. conceptual bases for this model are very
There are several types of retail location simple. A set of potential locations is defined
models in the literature. Some representative and from this set P facilities are to be chosen.
examples include models which combine The so-called N choose P problem clearly
locationallocation with spatial interaction involves a large number of combinatorial
(for example, the MULTILOC model by options. Not all of these choices need to be
Achabal et al., 1982); models which can examined, however, in order for the model
deal with multiple objectives (for example, to make a reasonable estimate of the ideal
Min, 1987); models that consider the uncer- subset of P facilities. Two major strategies
tainty inherent in the retailing environment are available. First, if the model can be
(such as the scenario planning model by posed as an optimization task, computer
Ghosh and McLafferty, 1982); and models programs using mathematical techniques
which involve the decision maker in the such as mixed integer programming (MIP) or
decision-making process (for example, the Lagrangian relaxation to select optimal loca-
STORELOC model by Durvasula et al., tions (OKelly, 1987). Second, and in many
1992). No one model is capable of handling ways more robustly, the modeler can set up
APPLIED RETAIL LOCATION MODELS 433
the problem and employ heuristics in order the context of the surrounding demographics
to make a quick and reliable estimate of the and competition. These models have become
core portion of the preferred site selections. very sophisticated because of the availability
An example may help to make this of detailed micro demographic profiles of
concept clear. Suppose a clothing retailer spatial areas that may be assigned to each
is considering siting stores in some of the potential location.
many available shopping centers in a large As the model explores the number of
metropolitan region such as Atlanta. It is locations, the analyst can keep track of
unlikely that the retailer would want to place the performance of those proposed sites.
a store in every available shopping center. For example a set of five stores distributed
Budget constraints would limit this option throughout the metropolitan region might
and simple common sense would indicate very well succeed in capturing the selected
that the market could not bear the saturation demographic submarkets that are sought and
coverage of too many stores. The question desired by this retailer. In contrast, some
of the optimal number of stores will be other combination of five stores could easily
addressed presently, for now assume that be eliminated from consideration because the
the retailer has a limited number of sites sites do not deliver the expected mix and den-
that are under consideration. Therefore the sity of demand to make this package feasible.
retailer seeks to prioritize a subset of all A great deal depends on a reasonable and
the available centers that might be expected accurate projection of the impact of each new
to perform well given their products and store and its performance both against exist-
customer profile. This latter point is a key ing competitors and any stores that the chain
one. In order for the retailer to prioritize the might already have located in the district.
store locations, the retailer needs to use an
accurate model of the underlying demand
for the service. Thus many geo-demographic
22.4.3. Combinatoric issues
case studies use profiles of existing customers
to create a measure that reflects the attraction A key to the efficient implementation of
of the store for particular populations. This interaction based location models is a data
in essence is a computerized version of the structure that enables the computerized eval-
classic idea by Applebaum (1966) of using uation of sites to be made relatively quickly.
analogs to project the trade area success of The following notes provide a guide to the
a proposed new store location. If the chain collection and organization of data in such a
already has a set of stores in a wide variety way as to make such computations feasible
of different spatial contexts, cross-sectional for quite a large study program. Assume that
comparison of the performance of those there are M origin zones. The N locations
stores can be used to produce a regression from which the model will select sites are
type model for store sales levels. Once these organized as the columns of the interaction
models are estimated, the retailer can then table with an extra column that will be used
seek new locations where the mix of factors to store any user demand that is under-served
leans heavily towards those variables that by the solution program. This modification
have proven to be successful predictors in is essential when dealing with site selection
other locations. The operational version of models. To see this, imagine that a retailer
this idea is to test each of the locational is planning to site three new outlets in a
scenarios by projecting the probable trade very large metropolitan area. If the maximum
area of each store, existing or proposed, in distance a customer would be willing to
434 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
travel to the store is set at say 10 miles the model to be chosen or not as the analysis
(equivalent to the concept of the range in progresses. Once again having an example
CPT) then in a large city, it is quite clear may help to fix these ideas. Suppose that a
that some consumers will be too far from city currently has a total of 35 supermarkets
any of the chosen sites to be able to use from a number of major chain stores. One
this retailers service. It is important that of these chains is considering a variety of
the model provide a means to calculate such expansion programs in this city. Among
unserved customers and we propose to do the locational options available to it are
this by placing those unserved consumers the acquisition of new sites, the acquisition
in a separate dummy destination category of existing sites from competitors, and the
as a holding bin for the under-provided origin expansion of some or all of the current stores
zones. In the absence of competition, the goal in its portfolio. In this case it is reasonable
would be to minimize unserved demand. In to think that the existing stores in the market
the presence of competitive alternatives, the are in a sense locked in and will occur in
goal would be to capture as much unserved all of the comparison scenarios: 35 columns
demand as possible for the clients chain. of the interaction matrix are therefore locked
With the exception of the concept of an in for the purposes of this initial run. Any
additional destination, the basic calculation additional locations are simply tacked on as
process is identical to that of a production say the 36th, 37th or 38th columns of this
constrained spatial interaction model. The interaction matrix. Depending on how many
device used to operationalize a particular candidates sites are available from which to
choice of actively considered facilities is to pick these three additional locations one can
simply keep a list of certain columns from the imagine that the model is exploring a finite
interaction matrix to which consumers might list of potential new store packages. Common
be allocated during that particular iteration. sense dictates that the store chain is unlikely
As the model proceeds from one locational to want all of its new site picks in the same
pattern to another the set of active columns area, as it would make a great deal more
is simply switched on and off to provide an sense to spread the chosen sites across a
indication of the currently available desti- variety of sectors of the city. If it so happened
nation choices. To make these calculations that a pool of presently underserved demand
efficiently the computer is provided with could be found, the model would place a
lists pointing to various types of columns facility in that area. More likely, the model
in the matrix. For example any sites which would be making a complex set of trade-offs,
are required to be provided in all cases trying to eke out a market share from among
may be indicated by placing their column and between the existing set of competitive
numbers in a vector of open facilities. Such centers, and indeed avoiding cannibalizing
a vector might be the noted by the letter the existing store already owned by the
R for required centers. A second set of chain. In this regard the strategy is essentially
pointers might be used to indicate that in similar to the well known gap in the
a particular analysis some potential facility map rubric for locating new services. The
locations are to be ignored completely. These, bulk of the program then would spend
for example might be sites which we wish time computing the benefits of specific
to lock out of the current set of optionally chosen alternatives in, for example, the
available sites. Yet another list could maintain north, east, and south suburbs. For those
a set of pointers to the available remaining with the obvious question of how is this
unexplored options that are freely available to done, it would be realistic to state that the
APPLIED RETAIL LOCATION MODELS 435
efficiently managing the introduction of new scenarios. The method takes as input the fixed
candidate locations. locations, the candidates, and the prohibited
The vertex substitution method also needs sites (if any). As output the model produces
to include the capability of a maximum the requested number of additional facility
service radius for the facilities, and for this sites, and reports on the area characteristics
radius to be flexible/variable between centers: of both the current and the new sites. The
this is essential if some notion of center candidates are either a comprehensive list
hierarchy is to be accommodated. It should be of all feasible shopping centers, stores are
clarified that the vertex substitution method generated from a list of picks and potential
is a local optimally solution in the sense that sites. The user may select the candidates as
there may be a better solution that was not those sites which meet some criteria, and the
reached during the course of the exploration; detail and realism of these selection criteria
this possibility can be reduced by trying are really only constrained by the imagination
the method with various starting values. of the user. All kinds of filters can be used,
Research experience has shown, however, including center size, or selections can be
that the good locations stand out very well based on attributes of the centers. Having
and the possibility that the vertex substitution selected the candidates, the user would have
method completely misses the best package to select the objective function: normally this
of locations is remote. One idea that is is driven on the basis of aggregate market
suggested to prevent mistakes due to local share, or demand, or minimizing competitors
optimality is to produce a list not only of share. This is potentially extended to include
the best locations but other close contenders acquisition, lease, closing and opening finan-
discovered in the course of the algorithms cial decisions.
progress. Vertex substitution has the great advantage
Research by Church has shown that the that as a general purpose optimization
introduction of maximum service radii into strategy (i.e., heuristic) it is robust to
a median type of problem (which is what changes of objective function, in a way,
we have) disrupts one normal property of for example, that would not be true of
the model, making it potentially possible that a specialized exact optimization code. In
the optimal locations occur at points other other words, the weakness of an exact
than the nodes of the network. However, the method is that it typically has to exploit
actual problem that we are concerned with some aspect of the problem structure and
realistically limits the feasible locations to any change in that structure would likely
the nodes of the network, as this is where undermine the mathematical formulation.
the shopping centers are. In other words Heuristics (and there are many of these
we ignore the theoretical possibility that the available for combinatorial problems) can
true optimal solution is at an intermediate frequently be set up to explore a solution
location along street segments, as in practice space effectively and this can be chosen
this kind of locational solution would not be to evaluate the users choice of objective
permissible. (and indeed multiple objectives) to achieve
What does experience tell us about the the desired goals. Indeed the final great
solution of location allocation models? The advantage of an exploratory heuristic is that
basic model is conceptually very simple by careful book-keeping many runner up or
and easy to understand. The idea is to close alternative solutions can be kept and
systematically explore alternative locational compared.
APPLIED RETAIL LOCATION MODELS 437
which ones to close. Predicting retention of would be well worth while, would be the
customers from old stores to re-aligned new inclusion of the interaction based model in
branches is also difficult though the managers a multiobjective and multiattribute decision
of such operations may have good insight into framework. The difficulty would be to elicit
the likely levels of customer loyalty. from the decision maker a set of trade off
An interesting question is to determine the parameters that define the relative scales for
diversion of sales or the result of a store/chain the attributes of the alternative locational
closure. Such questions frequently are packages.
presented in practice to retailers as they The mechanism reviewed in this chapter
have the option to purchase competitors will operate to allocate the sales from the
sites. Which of these sites would make good origin zones to the destinations is called the
acquisitions (if the option to cherry pick the allocation model. It is driven by a gravity
best of the available store)? Which would be based spatial interaction model, and given
blended well and open under the new label if careful data and careful assessment of the
the acquiring chain gets the whole suite? foundation assumptions this is a robust model
If two chains merge, and there are for trade area delimitation.
regulatory concerns that the two chains have
to divest some of their branches, or wish
to streamline their combined operations, one
would have to analyze the closure of branches ACKNOWLEDGMENTS
one by one to determine the package that
makes the most sense from the point of view Parts of this chapter are based on materials
of the combined operations. developed over many years in my Retail
Location Seminar where comments from
Debbie Bryan, Tony Grubesic, and Tim
Matisziw are gratefully acknowledged (see
22.6. SUMMARY AND specific footnotes). In addition, a great deal
CONCLUSIONS of the common sense application flavor of
this paper derives from conversations with
The great strength of the gravity model is Jim Stone (GeoVue), Tony Lea (Environics
its simplicity and its allocation of demand Analytics), and Steve Wheelock. I thank these
to centers in proportion to their attraction individuals while taking full responsibility
and inversely proportional to distance. It can for the product here. Some material originally
incorporate center specific attraction and prepared as a discussion/research memo on
center specific maximum trade area radii. location models for Geonomics. See 22.1.2,
The strength of the SI based location model 22.4.5, 22.4.6, and examples in 5. Other
is that it provides assistance with all of the material derived from Retail Location.
following tasks: measuring saturation, impact Models and Spatial Interaction M.E. OKelly
of changes on current trade areas, assessment and D. Bryan. A Review of Modeling in
of the advantages of certain locations for Retail Location Unpublished working paper.
particular formats, and an estimation of the
forecast of sales. In addition the allocation
models allow a profile of the demographics
NOTES
of a trade area.
What would take a large amount of extra 1 Introduction is based on Geography 845
research effort, but which in my opinion Lecture Jan 2, 2001.
APPLIED RETAIL LOCATION MODELS 439
2 A major sector using the results from spatial Beaumont, J.R. (1981). Locationallocation models in
modeling capability is that of businesses with a plane, a review of some models. Socio-Economic
multi-store/branch locations. Home Depot for exam- Planning Sciences, 15(5): 217229.
ple has made extensive use of reports from what used
to be Thompson Associates, and is now a unit of Berry, B.J.L. (1967). The Geography of Market Centers
MapInfo in Ann Arbor, MI. Other well known users and Retail Distribution. Englewood Cliffs, NJ:
include McDonalds and Blockbuster. Prentice Hall.
3 Based on applications as discussed with
Jim Stone and Tony Lea.
Birkin, M., Clarke, G. and Clarke, M.P. (2002).
4 Some aspects of these following paragraphs Retail Geography and Intelligent Network Planning.
have beneted from discussion with Jim Stone. New York: Wiley.
5 The target level of goodness-of-t in conve- Black, W. (1984). Choice-set denition in patronage
nience store forecasting models is for high r -square
modeling. Journal of Retailing, 60(2): 6385.
values (about 0.8).
6 Section 22.2.1 is based on Retail Location Boots, B. and South, R. (1997). Modeling retail trade
Models and Spatial Interaction by M.E. OKelly and areas using higher-order, multiplicatively weighted
D. Bryan, A Review of Modeling in Retail Location. Voronoi diagrams. Journal of Retailing, 73(4):
Unpublished working paper.
519536.
7 GeoVue has a gravity based software package.
ESRI Business Analyst software has a Huff trade area Borgers, A. and Timmermans, H. (1986). A model
model. of pedestrian route choice and demand for
8 This material derived from Retail Location retail facilities within inner-city shopping areas.
Models and Spatial Interaction by M.E. OKelly and Geographical Analysis, 18(2): 115128.
D. Bryan, A Review of Modeling in Retail Location.
Unpublished working paper. Borgers, A. and Timmermans, H. (1991). A decision
9 Material in section 22.4.5 was originally dis- support and expert system for retail planning.
cussed in an explanatory memo from this author to Computers Environment and Urban Systems, 15(3):
Jim Stone at Geonomics (now GeoVue). Jims critique 179188.
was helpful in framing the discussion.
10 This section beneted from discussion with Brown, S. (1989). Retail location theory, the legacy
Jim Stone and Steve Wheelock. of Harold Hotelling. Journal of Retailing, 65(4):
450470.
Brown, S. (1992). The wheel of retail gravitation.
REFERENCES Environment and Planning A, 24(10): 14091429.
Brown, S. (1994). Retail location at the micro-sale
(Although not all these papers are cited directly, these inventory and prospect. Service Industries Journal,
are however all inuential papers in my analysis; they 14(4): 542576.
are retained as a general bibliographic resource.)
Buckner, R.W. (1998). Site Selection, New Advances
Achabal, D., Gorr, W. and Mahajan, V. (1982). in Methods and Technology, 2nd Edn. New York:
MULTILOC, A multiple store location decision model. Lebhar-Friedman Books.
Journal of Retailing, 58(2): 525.
Clark, W.A.V. (1968). Consumer travel patterns and the
Applebaum, W. (1966). Methods for determining store concept of range. Annals, Association of American
trade areas, market penetration, and potential sales. Geographers, 58: 386396.
Journal of Marketing Research, 3: 127141.
Congdon, P. (2000). A Bayesian approach to prediction
Baker, R.G.V. (2000). Towards a dynamic aggregate using the gravity model, with an application
shopping model and its application to retail trading to patient ow modeling. Geographical Analysis,
hour and market area analysis. Papers in Regional 32(3): 205224.
Science, 79(4): 413434.
Craig, C.S., Ghosh, A. and McLafferty, S. (1984).
Balakrishnan, P.V., Desai, A. and Storbeck, J.E. (1994). Models of the retail location process, a review.
Efciency evaluation of retail outlet networks. Journal of Retailing, 60(1): 536.
Environment and Planning B, 21(4): 477488.
Current, J.R. and Storbeck, J.E. (1994). A multiobjective
Beaumont, J.R. (1980). Spatial interaction models and approach to design franchise outlet networks.
the locationallocation problem. Journal of Regional Journal of the Operational Research Society, 45(1):
Science, 20(1): 3750. 7181.
440 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
Ding, G. and OKelly, M.E. (2008). Choice-based Fotheringham, A.S., Brunsdon, C. and Charlton, M.
estimation of Alonsos Theory of Movement, (2002). Geographically Weighted Regression,
Methods and Experiments. in Environment and The Analysis of Spatially Varying Relationships.
Planning A, 40(5): 10761089. Chichester: Wiley.
Donthu, N. and Yoo, B. (1998). Retail productivity Ghosh, A. (1986). The value of a mall and other
assessment using data envelopment analysis. Journal insights from a revised central place model. Journal
of Retailing, 74: 89105. of Retailing, 62(1): 7997.
Drezner, T. (1994). Optimal continuous location of Ghosh, A. and Craig, C.S. (1983). Formulating retail
a retail facility, facility attractiveness, and market location strategy in a changing environment. Journal
share an interactive model. Journal of Retailing, of Marketing, 47(3): 5668.
70(1): 4964.
Ghosh, A. and McLafferty, S. (1982). Locating stores
Drezner, T. and Drezner, Z. (2002). Validating in uncertain environments, a scenario planning
the gravity-based competitive location model approach. Journal of Retailing, 58(Winter): 522.
using inferred attractiveness. Annals of Operations
Ghosh, A. and McLafferty, S.L. (1987). Location
Research, 111(14): 227237.
Strategies for Retail and Service Firms. Lexington,
Drezner, T., Drezner, Z. and Salhi, S. (2002). Solving MA: Lexington Books.
the multiple competitive facilities location problem.
Ghosh, A. and Craig, C.S. (1991). FRANSYS, a franchise
European Journal of Operational Research, 142(1):
distribution system location model. Journal of
138151.
Retailing, 67(4): 466495.
Durvasula, S., Sharma, S. and Andrews, J.C. (1992).
Ghosh, A. and Tibrewala, V. (1992). Optimal timing
Storeloc a retail store location model-based on
and location in competitive markets. Geographical
managerial judgments. Journal of Retailing, 68(4):
Analysis, 24(4): 317334.
420444.
Golledge, R. and Spector, A. (1978). Comprehending
Eagle, T.C. (1984). Parameter stability in disaggregate
the urban environment, theory and practice.
retail choice models experimental-evidence.
Geographical Analysis, 10(4): 403426.
Journal of Retailing, 60(1): 101123.
Goodchild, M.F. (1984). ILACS, A location-allocation
Eaton, B.C. and Lipsey, R.G. (1982). An eco-
model for retail site selection. Journal of Retailing,
nomic theory of central places. Economic Journal,
60(1): 84100.
92: 5672.
Goodchild, M.F. (1991). Geographic information
Erlenkotter, D. and Leonardi, G. (1985). Facility location
systems. Journal of Retailing, 67(1): 315.
with spatially-interactive behavior. Sistemi Urbani,
1: 2941. Guy, C.M. (1991). Spatial interaction modeling in
retail planning practice the need for robust
Fotheringham, A.S. (1983). A new set of spatial inter-
statistical-methods. Environment and Planning B,
action models, the theory of competing destinations.
18(2): 191203.
Environment and Planning A, 15: 1536.
Hallsworth, A.G. (1994). Decentralization of retailing in
Fotheringham, A.S. (1986). Modelling hierarchical
Britain the breaking of the 3rd wave. Professional
destination choice. Environment and Planning A, 18:
Geographer, 46(3): 296307.
401418.
Hanson, S. (1980). Spatial diversication and mul-
Fotheringham, A.S. and OKelly, M.E. (1989). Spatial
tipurpose travel, implications for choice theory.
Interaction Models, Formulations and Applications
Geographical Analysis, 12: 245257.
Studies in Operational Regional Science. Dordrecht,
Netherlands: Kluwer. Hodgson, M.J. (1978). Towards more realistic alloca-
tion in locationallocation models, an interaction
Fotheringham, A.S. and Knudsen, D.C. (1986).
approach. Environment and Planning A, 10:
Modeling discontinuous change in retailing
12731285.
systems extensions of the HarrisWilson
framework with results from a simulated urban Hodgson, M.J. (1981). A locationallocation model
retailing system. Geographical Analysis, 18(4): maximizing consumers welfare. Regional Studies,
295312. 15(6): 493506.
APPLIED RETAIL LOCATION MODELS 441
Hodgson, M.J. (1986). An hierarchical location Lakshmanan, T.R. and Hansen, W.A. (1965). A retail
allocation model with allocations based on facility market potential model. Journal of the American
size. Annals of Operational Research, 6: 273289. Institute of Planners, 31: 134143.
Houston, F.S. and Stanton, J.(1984). Evaluating retail Langston, P., Clarke, G.P. and Clarke, D.B. (1997).
trade areas for convenience stores. Journal of Retail saturation, retail location, and retail com-
Retailing, 60(1): 124136. petition, An analysis of British grocery retailing.
Environment and Planning A, 29(1): 77104.
Hubbard, R. (1978). A review of selected factors
conditioning consumer travel behavior. Journal of Leonardi, G. (1980). A unifying framework for
Consumer Research, 5: 121. public facility location problems. WP-80-79, IIASA,
Laxenburg, Austria.
Huff, D.L. (1962). Determination of Intra-Urban
Retail Trade Areas. Real Estate Research Program. Leonardi, G. (1983). The use of random-utility theory in
University of California at Los Angeles. building locationallocation models. In: Thisse, J.-F.
and Zoller, H. (eds), Locational Analysis of Public
Huff, D.L. (1963). A probabilistic analysis of shopping
Facilities, pp. 357383. Amsterdam: North Holland.
center trade areas. Land Economics, 39: 8190.
Leszczyc, P. and Timmermans, H.J.P. (1996). An
Huff, D.L. (1964). Dening and estimating a trade area.
unconditional competing risk hazard model of
Journal of Marketing, 28: 3438.
consumer store- choice dynamics. Environment and
Jain, A.K. and Mahajan, V. (1979). Evaluating Planning A, 28(2): 357368.
the competitive environment in retailing using
Leszczyc, P., Sinha, A. and Timmermans, H. (2000).
multiplicative competitive interaction models. In
Consumer store choice dynamics. An analysis of
Sheth, J. (ed.), Research in Marketing, pp. 217235.
the competitive market structure for grocery stores.
Greenwich, Conn: JAI Press.
Journal of Retailing, 76(3): 323345.
Kantorovich, Y.G. (1992). Equilibrium-Models of Spatial
Leszczyc, P., Sinha, A. and Sahgal, A. (2004). The effect
Interaction with Locational-Capacity Constraints.
of multi-purpose shopping on pricing and location
Environment and Planning A, 24(8): 10771095.
strategy for grocery stores. Journal of Retailing,
Kaufmann, P.J., Donthu, N.B. and Brooks, C.M. 80(2): 8599.
(2000). Multi-unit retail site selection processes,
Longley, P. and Clarke, G. (1996). GIS for Business and
incorporating opening delays and unidentied
Service Planning. New York: Wiley.
competition. Journal of Retailing, 76(1): 113127.
McLafferty, S.L. and Ghosh, A. (1986). Multipurpose
Kitamura, R. and Kermanshah, M. (1985). Sequential
shopping and the location of retail rms. Geograph-
model of interdependent activity and destination
ical Analysis, 18(3): 215226.
choices. Transportation Research Record, 987:
8189. Mercer, A. (1993). Developments in implementable
retailing research. European Journal of Operational
Kohsaka, H. (1989). A spatial search-location model
Research, 68(1): 18.
of retail centers. Geographical Analysis, 21(4):
338349. Miller, H. and OKelly, M.E. (1991). Properties and
estimation of a production-constrained Alonso
Kohsaka, H. (1992). Three-dimensional representation
model. Environment and Planning A, 23: 127138.
and estimation of retail store demand by bicubic
splines. Journal of Retailing, 68(2): 221241. Miller, H.J. (1993). Consumer Search and Retail
Analysis. Journal of Retailing, 69(2): 160192.
Kohsaka, H. (1993). A monitoring and locational deci-
sion support system for retail activity. Environment Min, H. (1987). A multiobjective retail service location
and Planning A, 25(2): 197211. model for fastfood restaurants. OMEGA, 15(5):
429441.
Krider, R.E. and Weinberg, C.B. (1997). Spatial
competition and bounded rationality, Retailing at the Munroe, S. (2001). Retail structural dynamics and the
edge of chaos. Geographical Analysis, 29(1): 1634. forces behind big-box retailing. Annals of Regional
Science, 35(3): 357373.
Kumar, V. and Karande, K. (2000). The effect of retail
store environment on retailer performance. Journal Nakanishi, M. and Cooper, L.G. (1974). Parameter
of Business Research, 49(2): 167181. estimation for a multiplicative competitive interaction
442 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
model least squares approach. Journal of Rust, R.T. and Brown, J.A.N. (1986). Estimation and
Marketing Research, 11: 303311. comparison of market area densities. Journal of
Retailing, 62(4): 410430.
OKelly, M.E. (1981). A model of the demand for retail
facilities incorporating multistop multipurpose trips. Rust, R.T. and Donthu, N. (1995). Capturing geo-
Geographical Analysis, 13(2): 134148. graphically localized misspecication error in retail
store choice models. Journal of Marketing Research,
OKelly, M.E. (1983a). Multipurpose shopping trips and
32(1): 103110.
the size of retail facilities. Annals of the Association
of American Geographers, 73(2): 231239. Seldin, M. (1995). The information revolution and real
estate analyses. Real Estate Issues, April 1995.
OKelly, M.E. (1983b). Impacts of multistop multipur-
pose trips on retail distributions. Urban Geography, Thill, J.C. (2000). Network competition and branch
4(2): 173190. differentiation with consumer heterogeneity. Annals
OKelly, M.E. and Storbeck, J.E. (1984). Hierarchi- of Regional Science, 34(3): 451468.
cal location models with probabilistic allocation. Timmermans, H., Arentze, T. and Joh, C.-H. (2002).
Regional Studies, 18(2): 121129. Analysing spacetime behaviour, new approaches to
OKelly, M.E. (1987). Spatial interaction based old problems. Progress in Human Geography, 26(2):
locationallocation models. In: Ghosh A. and 175190.
Rushton, G. (eds), Spatial Analysis and Location Timmermans, H., Vanderhagen, X. and Borgers, A.
Allocation Models, pp. 302326. New York: van (1992). Transportation systems, retail environments
Nostrand Reinhold. and pedestrian trip chaining behavior modeling
OKelly, M.E. and Miller, H.J. (1989). A synthesis of issues and applications. Transportation Research
some market area delimitation tools. Growth and Part B Methodological, 26(1): 4559.
Change, 20: 1433. Tobler, W.R. (1970). A computer movie simulating
OKelly, M.E. (1999). Trade-area models and choice- urban growth in the Detroit region. Economic
based samples, methods. Environment and Plan- Geography, 46: 234240.
ning A, 31(4): 613627. Wee, C.H. and Pearce, M.R. (1985). Patronage Behavior
OKelly, M.E. (2001). Retail market share and toward Shopping Areas a Proposed Model Based
saturation. Journal of Retailing and Consumer on Huffs Model of Retail Gravitation. Advances in
Services, 8(1): 3745. Consumer Research, 12: 592597.
Oppenheim, N. (1990). Discontinuous changes in equi- Weisbrod, G.E., Parcells, R.J. and Kern, C. (1984).
librium retail activity and travel structures. Papers of A disaggregate model for predicting shopping
the Regional Science Association, 68: 4356. area market attraction. Journal of Retailing, 60(1):
6583.
Pirkul, H., Narasimhan, S. and De, P. (1987). Firm
expansion through franchising, a model and solution Wilson, A.G. and Senior, M.L. (1974). Some rela-
procedure. Decision Science, 18: 631641. tionships between entropy maximizing models,
mathematical programming models and their duals.
Prendergast G., Marr, N. and Jarratt, B. (1998). Journal of Regional Science, 14: 207215.
Retailers views of shopping centres, a comparison
of tenants and non-tenants. International Journal Wilson, A.G., Coelho, J.D. Macgill, S.M. and
of Retail and Distribution Management, 26(4): Williams, H.C.W.L. (1981). Optimization in Loca-
162171. tional and Transport Analysis, London: Wiley.
Roy, J.R. and Thill, J.C. (2004). Spatial interaction Zeller, R.E., Achabal, D.D. and Brown, L.A. (1980).
modelling. Papers in Regional Science, 83(1): Market penetration and locational conict in
339361. franchise systems. Decision Sciences, 11: 5880.
23
Spatial Analysis on a Network
Atsuyuki Okabe and Toshiaki Satoh
Figure 23.1 Sites of trafc accidents in Chiba, Japan (the width of each line segment
represents trafc volume).
and analyses via planar spatial methods are is clearly demonstrated in Figure 23.4.
termed planar spatial analyses. Planar spatial Having assessed the distribution of points
methods are generally used for the analysis of in Figure 23.4(a), nobody would consider
network spatial phenomena because: (1) it is that the points are randomly distributed. This
much easier to compute Euclidean distance view is true when points are distributed on
on a plane than by the shortest-path distance a plane; however, this view is false when
on a network, and (2) it is believed that the points are distributed on the network
the shortest-path distance is approximated by indicated by the lines in Figure 23.4(b). In
Euclidean distance. The first reason remains fact, the points in the figure are randomly
true, although the difficulty is reduced generated on the network.
these days because the use of Geographical Figure 23.4 provides the following warn-
Information Systems (GIS) makes it easy ing: analyzing network spatial phenomena
to calculate the shortest-path distance. The using a planar spatial method is likely to lead
second reason might be true over a large to false conclusions. To avoid such errors,
region, but its validity is questionable across this chapter considers a class of network spa-
a small area or within a city. For example, tial methods. The chapter consists of seven
Maki and Okabe (2005) demonstrated that, sections including this introductory section.
in Kokuryo, a Tokyo suburb, the dif- Section 23.2 describes a method, termed the
ference between the shortest-path distance uniform network transformation, that deals
and Euclidean distance is significant if the with a nonuniform distribution function on
Euclidean distance is less than 500 m (see a network. Section 23.3 considers a class of
Figure 23.3). Therefore, to analyze spatial network Voronoi diagrams, and section 23.4
phenomena in small areas such as the market discusses a class of network local and global
areas of convenience stores in a city, planar K function methods. Section 23.5 describes
spatial methods are inappropriate; instead, a class of network kernel methods, and
spatial methods that assume a network Section 23.6 outlines a GIS-based toolbox
space using the shortest-path distance, termed termed SANET, which is used for network
network spatial methods, should be used. spatial analysis. The chapter ends with
The danger in applying planar spatial Section 23.7, which considers network spatial
methods to network spatial phenomena methods that we have not discussed earlier.
Ratio
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
0 1000 2000 3000 4000
Euclidean distance (m)
Figure 23.3 Ratio of the shortest-path distance to its corresponding Euclidean distance for
the street network in Kokuryo, Tokyo (from Maki and Okabe, 2005).
446 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
(a) (b)
Figure 23.4 Non-randomly distributed points on a plane (a), and randomly distributed
points on a network (b). (Note that the two distributions of points are the same.)
c1=3
c4=1 * * c2*
c2=1 c4 c1
I4 I4* * I2*
c3=2 I1 s I1
e1 t
I2
* e*
1
c3
I3
*
I3
(a) (b)
Figure 23.5 A nonuniform network (a) and the equivalent uniform network transformed by
the uniform network transformation (b).
Voronoi diagram), is used in many ways in The set of the resulting subnetworks, V =
spatial analysis (Figure 23.6). In particular, {V1 , . . ., Vm }, is termed the (ordinary) net-
the ordinary planar Voronoi diagram is com- work Voronoi diagram (Okabe et al., 2000);
monly used in retail marketing and facilities an example is provided in Figure 23.7.
management as a first approximation of the It is instructive to compare this network
service areas of stores or facilities. Voronoi diagram with its corresponding pla-
This approximation, however, is problem- nar Voronoi diagram shown in Figure 23.6.
atic when service areas are small. Table 23.1
shows the average radii of circular market
areas in Shinjuku Ward, Tokyo, with respect
23.3.2. Directed network
to store type. In all cases, the distance to the
Voronoi diagrams
nearest store is less than five hundred meters.
Recalling the difference between Euclidean In a downtown area, streets are commonly
distance and the shortest-path distance shown one-way. Pizza delivery stores should
in Figure 23.3, the data in Table 23.1 suggest consider this fact when dispatching delivery
that the ordinary planar Voronoi diagram is bikes. To take one-way regulations into
not appropriate as a first approximation of account, consider a directed network L and
the service areas. let d (pi , p) be the directed shortest-path
Instead, a Voronoi diagram defined on a distance from pi (e.g., a pizza delivery store)
network with shortest-path distance, termed to p (e.g., a house). Let Vi be a set of
the network Voronoi diagram, should be used. points on L (a subnetwork) that satisfies
To show this clearly, let d(p, pi ) be the equation (23.1), where d(p, pi ) is replaced
shortest-path distance between a point p and with d (pi , p). The set of the resulting
a point pi on a network L, where m generator subnetworks, V = {V1 , . . ., Vm }, is
points (e.g., stores) are located at p1 , . . ., pm . termed a directed network Voronoi diagram
Let Vi be a set of points on L (a subnetwork) (Okabe et al., 2008); an example is shown
that satisfies in Figure 23.8, where one-way streets are
indicated by arrows.
Note that the directed shortest-path
Vi = p|d(p, pi ) d(p, pj ), p L, distance is not symmetric, i.e., d (pi , p) =
d (p, pi ) does not always hold. Suppose that
j = i, j = 1, . . ., m . (23.1)
p1 , . . ., pm are parking lots, and a driver at p
wants to use the nearest parking lot among
p1 , . . ., pm . The service area of the parking
lot at pi is then defined by the set Vi of
points on L (a subnetwork) that satisfies
Table 23.1 Average radii of circular market equation (23.1), where d(p, pi ) is replaced
areas in Shinjuku ward, Tokyo (m)
with d (p, pi ). The set of the resulting
Store types Average radius
subnetworks, V = {V1 , . . ., Vm }, is
Bakery 320 also a directed network Voronoi diagram.
Shoe store 255
To distinguish V = {V1 , . . ., Vm }
Fruit shop 213
Book store 177 and V = {V1 , . . ., Vm }, the former is
Chinese noodle shop 153 termed the outward directed network Voronoi
Convenience store 150 diagram and the latter the inward directed
Beauty parlor 114 Voronoi diagram (Okabe et al., 2008).
Clinic 113
Both are directed Voronoi diagrams that
SPATIAL ANALYSIS ON A NETWORK 449
Figure 23.6 The ordinary planar Voronoi diagram generated from parking lots in
Kyoto, Japan.
Figure 23.7 The ordinary network Voronoi diagram generated from parking lots in
Kyoto, Japan.
450 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
Figure 23.8 The outward directed Voronoi diagram generated from parking lots in
Kyoto, Japan.
Figure 23.9 The inward directed Voronoi diagram generated from parking lots in
Kyoto, Japan.
20
40
10
20
40
20
10
5
20 17
Figure 23.10 The additively weighted network Voronoi diagram generated from
convenience stores in Kyoto, Japan (each circle indicates its weight i ).
452 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
23.3.4. Other network Voronoi function method was developed for points
diagrams on a plane, and was termed the planar K
function method (Ripley, 1976, 1977). Okabe
In addition to the above network Voronoi
and Yamada (2001) extended the planar K
diagrams, the kth nearest network Voronoi
function method to the K function method for
diagram, the network Voronoi diagram for
points on a network to develop the network
line segments, and the network Voronoi
K function method. To state this method
diagram for polygons have also been
explicitly, consider a network L on which
proposed in the literature. The reader should
points p1 , . . ., pm are placed, and let Di (t)
consult Furuata et al. (2005) and Okabe et al.,
be a subnetwork of L in which the shortest-
(2008) for information on these diagrams.
path distance between any point on Di (t)
and pi is less than or equal to t (the heavy
lines in Figure 23.12; in the planar case,
Di (t) corresponds to the disk centered at pi
23.4. LOCAL AND GLOBAL
with radius t truncated by a bounded global
NETWORK K FUNCTION
space). Let Ki (t) be the number of points of
METHODS
p1 , . . ., pm that are included in Di (t). In this
23.4.1. Global network auto K term, a network K function is defined by
function
One of the most commonly used tech-
m
niques in statistical spatial analysis is K(t) = Ki (t). (23.2)
the K function method. Originally, the K i=1
200
300 100
150
300
200
100
250 150
50
Figure 23.11 The multiplicatively weighted network Voronoi diagram generated from
convenience stores in Kyoto, Japan (each circle indicates its weight i ).
SPATIAL ANALYSIS ON A NETWORK 453
Note that, in contrast to the cross K burglaries occur uniformly and randomly
function, which is defined below, the above distributed on the street network. Because
function is referred to as the network auto the observed curve is always above the
K function (as with spatial auto correlation); expected curve in Figure 23.13, it is
also note that constants (the density and concluded that burglaries tend to cluster
number of points) are omitted here for themselves.
simplicity. The difference between the network
To show an actual example, the K function and the planar K function is
distribution of street burglaries in Kyoto distinct. Actually, Yamada and Thill (2004)
is depicted in Figure 23.12, where the applied both the planar K function method
triangle marks indicate sits of incidence. and the network K function method to
For this distribution, the network auto the same traffic accident data and found
K function is calculated, and the result that the planar K function method overesti-
is shown in Figure 23.13. The black mates clustering tendency. The authors con-
line indicates the expected value and the cluded that the network K function method
gray line indicates the observed value should be used for the analysis of traffic
obtained under the null hypothesis that accidents.
Figure 23.12 Street burglaries (the triangle marks), railway stations (the circles), and the
Voronoi sub-network (the heavy lines) of the station (the large circle) in Kyoto, Japan.
454 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
160
140
120
100
Expected
80
Observed
60
40
20
0
0 2000 4000 6000 8000
Figure 23.13 The global network auto K function for street burglaries on the street
network in Kyoto, Japan (Figure 23.12).
23.4.2. Global network cross (Note that a constant is omitted from the
K function above equation for simplicity.) Because this
function considers all points of P across the
Another type of network K function
entire network space L (the global space),
method is the network cross K function
the function can be regarded as a global
method (Okabe and Yamada, 2001). Consider
network cross K function. An actual example
two sets of points, P = {p1 , . . ., pm } and
of the global network cross K function
Q = {q1 , . . ., qk }, on a network L. Points of P
is shown in Figure 23.14, where points
are stochastically distributed on L, but points
of P are street burglaries and points of
of Q are fixed (note that the configuration of
Q are railway stations in Kyoto, Japan as
the points is arbitrary). For instance, points
shown in Figure 23.12. Because the observed
of P may be crime spots and points of
curve is always above the expected curve
Q may be railway stations. The network
in Figure 23.14, it is concluded that street
cross K function is used for testing whether
burglaries tend to occur around railway
points p1 , . . ., pm (crime spots) tend to cluster
stations.
around (or apart from) q1 , . . ., qk (railway
stations) as a whole.
To state the network cross K function
explicitly, let Dqi (t) be a subnetwork of L in
which the shortest-path distance between any
23.4.3. Local network cross K
point in Dqi (t) and qi is less than or equal to t,
function
and let Kqi (t) be the number of points of P
that are included in Dqi (t). Then, the network The global network cross K function method
cross K function, KQP (t), is defined by: deals with the average tendency of a point
pattern around all fixed points Q; therefore, it
cannot detect local tendencies. For example,
the global network cross K function cannot
k
KQP (t) = Kqi (t). (23.3) detect the specific railway stations around
i=1 which crime spots tend to cluster. To detect
SPATIAL ANALYSIS ON A NETWORK 455
160
140
120
100
80 Expected
Observed
60
40
20
0
0 2000 4000 6000 8000
Figure 23.14 The global network cross K function for street burglaries in relation to
railway stations in Kyoto, Japan (Figure 23.12).
23.4.5. Global network Voronoi This function deals with all points P in
cross K function the entire network L (the global space).
Therefore, this function can be regarded as
In sections 23.3.2 and 23.3.3, a local K
a global network cross K function, which
function is obtained from a global K function.
is referred to as the global network Voronoi
Conversely, a global function can also be
cross K function. An example is illustrated
obtained from a local function. For instance,
in Figure 23.16. Comparison between the
let KVQP (t) be a function defined in terms of
local network Voronoi cross K function in
the local network Voronoi K functions as
Figure 23.15 and the global network Voronoi
cross K function in Figure 23.16 reveals local
k
variety.
KVQP (t) = KVqi (t). (23.4)
i=1
12
10
8
Expected
6
Observed
4
0
0 200 400 600 800
Figure 23.15 The local network Voronoi cross K function for street burglaries in the local
space (the heavy lines in Figure 23.12) in Kyoto, Japan.
160
140
120
100
Expected
80
Observed
60
40
20
0
0 500 1000 1500
Figure 23.16 The global network Voronoi cross K function for street burglaries in relation
to railway stations in Kyoto, Japan (Figure 23.12).
SPATIAL ANALYSIS ON A NETWORK 457
Dq1(2) Dq1(2) V1
p8
p1
p6
q1 p5 p7
Dq2(2) V2
p2
Dq2(2)
p3 p4 q2
(b)
(a)
0 1 2
Figure 23.17 Comparison between the global network ordinary cross K function (a) and the
global network Voronoi cross K function (b).
458 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
estimated density function, f (x), on a plane is for such points (one million). Therefore, this
given by: method is inappropriate.
An alternative method is to use a net-
m
work kernel function, kL (t) = kL (x), x L,
f (x) = k(xi ). (23.5)
defined on L. An example is shown in
i=1
Figure 23.19, where one million points are
An example is presented in Figure 23.18, uniformly and randomly generated and the
where one million points are uniformly and density function is estimated from those
randomly generated and the kernel function is points using the one dimensional bi-weight
given by the bi-weight function (Silverman, function.
1986). This appears to be a natural extension of
One might estimate the density function, the planar kernel method, but Figure 23.19
fL (x), of points on L from the intersection proves that this method is inappropriate. As
of f (x) with L, i.e., fL (x) = f (x), x L. the points in Figure 23.19 are uniformly
This method would be fine if the estimated and randomly generated on L, the estimated
density function could produce a uniform density should be uniform; however, the
distribution function for the points that are density in Figure 23.19 is not uniform
uniformly and randomly distributed on the on L, which suggests that this method is
network. However, Figure 23.18 shows that inappropriate. One reason that the natural
fL (x) does not show a uniform distribution extension of the planar kernel method does
Figure 23.18 The density function for uniformly random points (one million) on the street
network in Kyoto estimated by the two-dimensional bi-weight kernel function.
SPATIAL ANALYSIS ON A NETWORK 459
Figure 23.19 The density function for uniformly random points (one million) on the street
network in Kyoto estimated by the one-dimensional bi-weight kernel function.
not work is that a plane is isotropic whereas Figure 23.21 shows hot spots of traffic
a network is not isotropic in the sense that accidents in Chiba, Japan, determined by
directions are restricted and is bounded. using the above method and assuming that
Okabe et al., (2009) provide two kernal the given network is a uniform network, i.e.,
functions Ki (t) that produces a uniform the probability of an accident occurring in a
density function for uniformly and randomly unit line segment is constant regardless of the
distributed points. location of the unit line segment. As noted in
Once a density function has been esti- section 23.2, however, it is more likely that
mated, it is easy to find hot spots. Let traffic accidents tend to occur in proportion
L(u) be a subnetwork of L that satisfies to traffic volume, as shown in Figure 23.1
fL (t) u, and let L be the subnetwork L(u) (a nonuniform network). To examine this
that satisfies: tendency of accident hot spots, the uniform
network transformation in section 23.2 is
applied to this nonuniform network, and
;
fL (t) dt fL (t) dt = 100. (23.6) the network kernel method is applied to
tL(u) tL the resulting uniform network. Figure 23.22
shows hot spots of traffic accidents for
the transformed network. These hot spots
The subnetwork L is the area of hot spots; indicate the places where traffic accidents
the probability of points occurring on the tend to occur more frequently than would
subnetworks L is high at the significance be expected from the measured traffic
level . volumes.
460 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
Figure 23.20 The density function for uniformly random points (one million) on the street
network in Kyoto estimated by a one-dimensional modied bi-weight kernel function.
Figure 23.21 Hot spots of trafc accidents on the uniform road network in Chiba, Japan.
SPATIAL ANALYSIS ON A NETWORK 461
Figure 23.22 Hot spots of trafc accidents on the nonuniform road network in Chiba, Japan
that takes account of trafc volume.
K function, and the global Voronoi cross K England: an analysis using geographical information
function; and (3) a class of network kernel systems. International Journal of Geographical
methods that includes a method for detecting Information Science, 13: 159184.
hot spots. Clevenger, A.P., Chruszcz, B. and Gunson, K.E.
In addition to the above network spatial (2003). Spatial patterns and factors inuencing small
methods, Okabe et al. (1995) formulated vertebrate fauna road-kill aggregations. Biological
Conservation, 109: 1526.
the network of the (conditional) nearest
neighbor distance method; Miller (1994, Freund, J.E. (1998). Mathematical Statistics, (6th edn),
1999), Okabe and Kitamura (1996), Okabe Englewood Cliff: Prentice-Hall.
and Okunuki (2001), and Morita et al. (2001) Furuta, T., Suzuki, A. and Inakawa, K. (2005). The
formulated the network Huff model; Shiode k th nearest network Voronoi diagram and its
and Okabe (2004a) formulated the network application to districting problem of ambulance
clumping method; Shiode and Okabe (2004b) systems, Discussion Paper No. 0501, Center for
formulated the network cell count method; Management Studies, Nanzan University.
and Okabe et al. (2006b) proposed the Jones, A.P., Langford, I.H. and Betham, G. (1996).
network spatial interpolation method. There The application of K -function analysis to the
are many other planar spatial methods that geographical distribution of road trafc accident
outcomes in Norfolk, England. Social Science and
have not yet been extended to network
Medicine, 42(6): 879885.
spatial methods. Hopefully, the readers of
this chapter will extend these methods Levine, N., Kim, K.E., Nitz, L.H. (1995). Spatial Analysis
and enrich the field of network spatial of Honolulu Motor Vehicle Crashes: I. Spatial
Patterns. Accident Analysis and Prevention, 27(5):
analysis. 663674.
Anselin, L., Cohen, J., Cook, D., Gorr, W. and Tita, G. Miller, H.J. (1994). Market area delimitation within
(2000). Spatial analyses of crime. In: David Duffee networks using geographic information systems.
(ed.), Criminal Justice 2000: Volume 4. Measurement Geographical Systems, 1: 157173.
and Analysis of Crime and Justice, pp. 213262. Miller, H.J. (1999). Measuring space-time acces-
Washington, DC: National Institute of Justice. sibility benets within transportation networks.
Bashore, T., Tzilkowski, W. and Bellis, E. (1985). Geographical Analysis, 31(2): 187212.
Analysis of deervehicle collision sites in
Morita, M., Okunuki, K. and Okabe, A. (2001).
Pennsylvania. Journal of Wildlife Management,
A market area analysis on a network using GIS
49(3): 769774.
A case study of retail stores in Nisshin city. Papers
Bowers, K. and Hirscheld, A. (1999). Exploring links and Proceedings of the Geographic Information
between crime and disadvantage in north-west Systems Association, 10: 4550.
SPATIAL ANALYSIS ON A NETWORK 463
Nicholson, A.J. (1989). Accident clustering: Some Okabe, A., Satoh. T. and Sugihara, K. (2009). A
Simple Measures. Trafc Engineering and Control, kernel density estimation method for networks,
30: 241246. its computational method, and a GIS-based tool.
International Journal of Geographical Information
ODriscoll, R.L. (1998). Descriptions of spatial pattern Science (to appear).
in seabird distributions along line transects using
neighbor K statistics. Marine Ecology Progress Okabe, A. and Yamada, I. (2001). The K -function
Series, 165: 8194. method on a network and its computational imple-
mentation. Geographical Analysis, 33: 271290.
Okabe, A., Boots, B. and Satoh, T. (2006). A class of
local and global K -functions and cross K -functions, Okabe, A., Yomono, H. and Kitamura, M. (1995).
The 2006 Annual Meeting of the AAG, March 711, Statistical analysis of the distribution of points
2006, Chicago, IL. on a network. Geographical Analysis, 27(2):
152175.
Okabe, A., Boots, B., Sugihara, K. and Chiu, S.N.
(2000). Spatial Tessellations: Concepts and Appli- Okano, K. and Okabe, A. (2004). Algorithms for com-
cations of Voronoi Diagrams (2nd edn). Chichester: puting weighted network Voronoi diagrams. Papers
John Wiley. and Proceedings of the Geographic Information
Systems Association, 13: 311314.
Okabe, A. and Kitamura, M. (1996). A computational
method for market area analysis on a network. Painter, K. (1994). The impact of lighting on crime,
Geographical Analysis, 28: 330349. fear, and pedestrian street use. Security Journal, 5:
Okabe, A. and Okunuki, K. (2001). A Computational 116124.
method for estimating the demand of retail stores
Ratcliffe, H.J. (2002). Aoristic signatures and the
on a street network using GIS. Transactions in GIS,
spatio-temporal analysis of high volume crime
5(3): 209220.
patterns. Journal of Quantitative Criminology,
Okabe, A., Okunuki, K. and Shiode, S. (2004). SANET: 18: 2343.
A toolbox for spatial analysis on a network
Ratcliffe, J.H. and McCullagh, M.J. (1999). Hotbeds of
Version 2.0 040102. Center for Spatial Information
crime and the search for spatial accuracy. Journal of
Science, University of Tokyo, Tokyo.
Geographical Systems, 1: 385398.
Okabe, A., Okunuki, K. and Shiode, S. (2006a).
SANET: a toolbox for spatial analysis on a network. Ripley, B.D. (1976). The second-order analysis of
Geographical Analysis, 38(1): 5766. stationary point processes. Journal of Applied
Probability, 13: 255266.
Okabe, A., Okunuki, K. and Shiode, S. (2006b). The
SANET toolbox: new methods for network spatial Ripley, B.D. (1977). Modeling spatial patterns. Journal
analysis. Transactions in GIS, 10: 535550. of the Royal Statistical Society, Series B, 39:
172192.
Okabe, A. and Satoh, T. (2006). Uniform network
transformation for points pattern analysis on a non- Saeki, M. and MacDonald, D.W. (2004). The effects of
uniform network. Journal of Geographical Systems, trafc on the raccoon dog (Nyctereutes procyonoides
8(1): 2537. viverrinus) and other mammals in Japan. Biological
Conservation, 118: 559571.
Okabe, A., Satoh, T., Furuta, T., Suzuki, A., Okano, A.
(2008). Generalized network Voronoi diagrams: Shiode, S. and Okabe, A. (2004a). Network variable
Concepts, computational methods, and applications. clumping method for analyzing point patterns on a
International Journal of Geographical Information network. The 2004 Annual Meeting of the AAG,
Science, 130. Philadelphia, PA.
Okabe, A., Satoh, T and Sugihara, K. (2009) A Shiode, S. and Okabe, A. (2004b). Cell count
kernel density estimation method for networks, method on a network with SANET and its
Its computational method and a GIS-based tool, application. International Conference on Geoinfo-
International Journal of Geographical Information matics and Geographical Systems Modelling, Beijng,
Science (to appear). China.
464 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
Silverman, B.W. (1986). Density Estimation for network K -function. Landscape Ecology, 19(5):
Statistics and Data Analysis. London: Chapman 491499.
and Hall.
Yamada, Y. and Thill, J.-C. (2004). Comparison of
Spooner, P.G., Lunt, I.D., Okabe, A. and Shiode, S. planar and network K -functions in trafc accident
(2004). Spatial analysis of roadside Acacia analysis. Journal of Transport Geography, 12:
populations on a road network using the 149158.
24
Challenges in Spatial Analysis
Michael F. Goodchild1
Many readers will have their own ideas, and for analyzing and visualizing spatial data.
the chapter on the future of spatial analysis The results of each stage of analysis could
that follows include discussions of additional be fed into further stages, and data could
issues. Meanwhile, the four considered in be managed within a single environment
this chapter are very much a personal list, that recognized a range of data formats.
and reflect the authors own interests and Comparisons were frequently drawn with
concerns at this point in the long history of the statistical packages (e.g., Goodchild,
spatial analytic methods. 1987), which similarly offered easy access
to a multitude of statistical techniques,
along with the necessary housekeeping
functions.
24.2. COMPUTING AND At the time, each GIS software product was
NETWORKING TECHNOLOGY organized into a single, monolithic package.
In the 1980s such packages were typically
In the early 1990s a substantial literature installed on minicomputers such as the VAX
accumulated on the opportunities offered by or Prime, but in the late 1980s the transition
GIS. In 1988 the U.S. National Science to personal Unix workstations and later to the
Foundation had established the National PC and Mac had opened the possibility of
Center for Geographic Information and Anal- an entirely individualized toolbox installed
ysis (NCGIA) at three sites: the University on the researchers desk. GIS was likened
of California, Santa Barbara; the State to a butler an intelligent assistant working
University of New York at Buffalo; and with the user to solve problems, knowing
the University of Maine. One of NCGIAs the foibles and preferences of the user, and
objectives was to advance the use of GIS taking on those tasks that the user found too
across the sciences, as a platform for spatial complex, tedious, time-consuming, or inac-
analysis, so it was considered important to curate if performed by hand. Abler (1987)
assess progress to date, and to identify and hailed GIS as geographys equivalent of the
remove impediments to the greater use of microscope or the telescope, a powerful tool
spatial analysis. NCGIA organized a spe- that allowed researchers to gain insights that
cialist meeting on the topic that eventually were simply impossible with the normal
led to a book (Fotheringham and Rogerson, senses and intuition.
1994), and several additional papers appeared From this perspective, the power of GIS
(Anselin and Getis, 1992; Burrough, 1990; would be judged simply by the proportion
Ding and Fotheringham, 1992; Goodchild, of known techniques of spatial analysis that
1987; Goodchild et al., 1992; Openshaw, it supported, by the accuracy with which
1990; and for a later perspective see it implemented each method, and by the
Goodchild and Longley, 1999). extent to which it prevented misuse and
Underlying this spate of funding and misinterpretation of results. There were many
writing was the simple premise that GIS complaints about this time regarding the
provided an ideal means of implementing success of GIS against these objectives.
the known techniques of spatial analysis, Commercial software developers were seen
as well as techniques that might be devel- as insufficiently interested in supporting
oped in the future. A single package, if advanced spatial analysis, being content
sufficiently sophisticated, could offer easy instead to direct their efforts at satisfying
and largely painless access to an abundance the needs of their more wealthy corpo-
of robust, scientifically sound techniques rate and agency customers, whose interests
CHALLENGES IN SPATIAL ANALYSIS 467
tended to be more in data management and simple or thin, and most actual computation
inventory. GIS designers failed to ground occurs remotely on a more powerful server.
their products in sound theory, preferring In the extreme, the user needs only a Web
intuitive terms and explanations over formal browser such as Microsoft Explorer or
and mathematical ones. Because of this Netscape. Instead of installing a thick piece
lack of formal grounding, each vendor of software, such as a GIS package, the user
tended to adopt its own terms, formats, and obtains many if not all of its services from
structures, leading to endless proliferation a remote server. For example, the task of
and an apparently insurmountable lack of finding the optimum route from an origin
interoperability. to a destination through a street network,
It was in this context that the Web appeared the task performed by many Web sites
on the scene, and the Internet emerged such as mapquest.com, no longer requires
as the dominant and indeed quickly the the user to obtain a powerful GIS and
only network for computer communication. the necessary database representing roads
Since 1993 and the release of Mosaic and streets, and to mount both on his
the impact of communications technology or her desktop machine, since the same
has been so profound as to change the service can be obtained free from the server.
entire landscape of GIS and spatial analysis. The user need only specify the origin and
Sui and Goodchild (2001) have argued destination to the server using a Web browser;
that the metaphor of the butler is no the results are then sent back from the
longer appropriate instead, GIS technology server and displayed locally using the same
now constitutes a medium through which Web browser.
people communicate what they know about In principle all GIS functions and all types
the Earths surface that is comparable to of spatial analysis could be organized in this
traditional media such as print, radio, and way. Instead of installing and operating their
television. As such, its issues are dramatically own software, researchers could send data
different from those of earlier decades. to sites where sophisticated forms of spatial
Bandwidth, interoperability, and metadata analysis were performed. Researchers devel-
have largely replaced computing speed, oping new forms of spatial analysis would
storage capacity, and the sophistication of find it far easier to offer their techniques as
desktop software as major concerns of GIS Web services than to engage in the time-
users. Even the most sophisticated of users consuming distribution of software, and users
no longer program, relying instead on the would benefit by not having to spend time
incredibly abundant resources of the Web, obtaining, installing, and maintaining their
easy mechanisms for sharing code, and own copies.
new forms of software architecture. The Server GIS is now common among public
following three sections explore some of agencies interested in providing public access
these issues, and their implications for spatial to their spatial data, along with simple
analysis. capabilities for query and visual display.
Many local governments provide access to
their land-ownership and property taxation
databases in this way, allowing users to query
24.2.1. Server GIS
details of their own and other properties,
In the clientserver computing paradigm using a map interface.
that underlies the Web, the user or clients In practice, however, server GIS has had
hardware and software are comparatively a limited impact to date, particularly for
468 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
repeated many times in such applications as comprehensive languages that can be used to
Monte Carlo simulation. describe and share computational methods. In
However, the design of an appropriate the past, mathematics provided an adequate
scripting language is a very sophisticated language, and models were effectively shared
task, requiring a high level of knowledge using algebraic representation, through the
of the needs of the research community, pages of learned journals and books. But
across many disciplines and domains. Simple todays computational environments present
scripting languages merely allow the user to a somewhat different problem, since the
invoke any of the commands of the package, language of mathematics lies too far from
but more sophisticated languages imply actual implementation, and cannot readily be
a recognition of the fundamental elements used to express the entire algorithmic basis
from which complex spatial analyses are of spatial analysis.
built. If the granularity of the scripting
language is too coarse, researchers will find
it too difficult to express the full range of
24.2.3. Interchangeable software
applications and if it is too detailed, the
components
script will be unnecessarily long.
The work of Tomlin (1990) provided the Early computer software was comprised
first successful effort at a generic scripting of programs, integrated pieces of software
language for GIS, albeit only for congruent that performed well-defined functions. Early
layers of raster data. The language was GIS developed in this context, and by
adopted by several packages, and several the early 1990s a fully featured GIS such
extensions were made. Van Deursen (1995) as ESRIs ARC/INFO included millions of
analyzed the operations required to support lines of code, all designed to be compiled
dynamic modeling in a raster environment, and executed together to provide a single,
including the implementation of finite- integrated computing environment.
difference models, in what became the This approach to software was both
scripting language for PCRaster (pcraster. redundant, in the sense that large amounts
geo.uu.nl), a raster-based package heavily of code might never be executed by a given
oriented towards environmental modeling. user, whose interests might focus only on
Takeyama and Couclelis (1997) described a small number of functions; and costly, in
a sophisticated language for the manipulation the sense that it was difficult for programmers
of pairs of raster cells, providing support to pull pieces of code out of one package to
for the analysis of spatial interactions. More be reused in another. Even today, the average
broadly, all of these approaches are strongly user of a package such as Microsoft Word
related to the languages developed in image will likely never have invoked many of the
processing, or image algebras. functions in this very large and complex
To date, however, there have been no com- package.
parably ambitious efforts to devise languages Several attempts to break out of this mold
for vector data, or for the broader framework were made in the 1980s and 1990s. One
that spans both discrete objects and continu- of the more successful was the concept
ous fields. Dynamic GIS that addresses both of a subroutine library, a collection of
space and time also lacks comprehensive standard routines that could be called by
scripting languages. The effectiveness of programs, avoiding the need for repetitive
future spatial analysis clearly depends on reprogramming. Subroutine libraries became
the communitys ability to devise simple yet common in areas such as statistics, since they
470 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
allowed comparatively sophisticated users common to more than one form? Perhaps
to develop new programs quickly, relying they will lead eventually to a new approach
on standard subroutines for many of the to teaching in spatial analysis, in which these
programs functions. The idea was diffi- fundamental building blocks are the elements
cult to implement for less sophisticated of a course, rather than the analytic methods
users, however, since it required each themselves.
to possess a substantial knowledge of
programming.
Contemporary approaches to software
emphasize a rather different approach, 24.3. TIME AND DYNAMICS
in which sections of reusable code, or
components, can be freely combined during Many authors have commented on the gener-
the execution of a program. Standards have ally static nature of GIS, and the difficulty of
been developed by vendors such as Microsoft representing time and dynamic phenomena.
that allow compliant components to be freely Most attribute this to the legacy of the
linked and executed. Ungerer and Goodchild paper map, which inevitably emphasizes
(2002) describe one such application, in those aspects of the Earths surface that
which ESRIs ArcGIS and Microsofts Excel remain relatively static, over such dynamic
have been combined to solve a standard phenomena as events, transactions, and
problem in areal interpolation (Goodchild, flows. Several comprehensive reviews have
et al., 1993). Functions that are native appeared, and much progress has been made
to the GIS, such as polygon overlay, are in building spatial databases that include time
obtained from ArcGIS, while operations on (Langran, 1993; Peuquet, 1999, 2001, 2002).
tables, such as matrix multiplication, are This same emphasis on the static is evident
obtained from Excel. The entire analysis in the toolkit of spatial analysis, with its
is invoked through commands written in focus on cross-sectional data. In part this
Visual Basic, a form of scripting language, is due to the difficulty of creating and
though other general scripting languages acquiring longitudinal data; to the administra-
such as Python might also be used. Both tive difficulties that statistical agencies face
packages are compliant with the Microsoft in funding and maintaining data-collection
COM standard, allowing the components that programs through time; to the changing
form the building blocks of each to be freely nature of the Earths surface, and the impact
combined and executed. that this has on data-collection procedures
Approaches such as these are breaking and the definitions of reporting zones; and
down the barriers that previously existed to the changing nature of human society, and
between different types of software in this its notoriously short attention span. Efforts
case, ArcGIS and Excel and allowing much such as the National Historic GIS project
more flexible forms of analysis. They invite (www.nhgis.org) have attempted to overcome
an entirely new approach to software design, these difficulties, building systems that allow
in which fundamental components with users to construct longitudinal series from
widespread application are combined to meet the census for example, but they remain
the needs of specific applications. They also comparatively few and far between.
call for answers to a fundamental question: While much progress has been made,
what are the basic building blocks of spatial the analysis of spatio-temporal data remains
analytic software, and to what extent are the a comparatively underexplored area, and
operations invoked by each form of analysis a source of substantial challenges for the
CHALLENGES IN SPATIAL ANALYSIS 471
community. The next two subsections address then one might reasonably ask whether
two of these in greater detail. similar principles exist for spatio-temporal
data, and whether such principles might
usefully inform the development of a more
dynamic approach to GIS and spatial analy-
24.3.1. Fundamental laws
sis. What is the spatio-temporal equivalent of
Much of the nature of GIS and many of Toblers First Law, for example? Does spatial
the architectural choices that have been heterogeneity apply also in time? What
made over the past several decades are relationships exist between the parameters of
ultimately attributable to the nature of the spatio-temporal and spatial dependence and
data themselves the ways in which spatial heterogeneity? Are other general principles
data are special. Anselin (1989) has identified of spatio-temporal phenomena waiting to be
two general characteristics, and Goodchild discovered?
(2003) has discussed several more.
Spatial dependence describes the widely
observed tendency for the variance of spatial
24.3.2. Dynamic form
data to increase with distance. To paraphrase
Tobler (1970), nearby things are more Spatial dependence and spatial heterogeneity
similar than distant things, a principle that are both properties of how the Earths surface
has become known as the First Law of looks, capturing aspects of its form. Studies
Geography (Sui, 2004). All of the methods of form have a long history in science, but
used to represent geographic phenomena in have given way in the long term to a desire
GIS are to some extent reliant on the validity to understand process to understand how
of this principle. For example, there would systems work, and the effects of human
be no value in representing topography with intervention. In geomorphology, for example,
isolines if elevation did not vary smoothly, many scientists of the 19th and early 20th
and there would be no value in aggregating centuries were content to describe landforms,
areas into contiguous regions if the latter devising elaborate systems of morphological
could not be designed with relatively low classification, and only later did interest
within-region variance. develop in understanding how landforms
Anselins second principle is spatial het- came to be, and the processes that left
erogeneity, the tendency for the Earths such characteristic footprints on the surface.
surface to exhibit spatial non-stationarity. Today, of course, such studies of form are
All of the various techniques developed largely discredited, as they are in many other
over the past two decades for local spatial disciplines.
analysis are based on this principle, since Because of its essentially static legacy,
they attempt to summarize what is true much GIS analysis has focused on form,
locally, rather than what is true globally. and has been criticized for doing so. It is
The Geographically Weighted Regression of comparatively difficult to tease insights into
Fotheringham, et al. (2002) falls into this process from cross-sectional form, though it
category, as do the LISA technique of Anselin is perhaps sometimes possible to eliminate
(1995) and the local statistics of Getis and false hypotheses about process. GIS has been
Ord (1992). accused of being the last manifestation of
If such principles are generally true of the quantitative revolution that occurred in
spatial data, and are useful in guiding geography in the 1960s, when Bunge (1966)
the development of computational systems, and others attempted to draw insights from
472 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
the similarity of forms found on the human citizen with a computer and a high-speed
and physical landscapes (see, for example, Internet connection access to many of
the critique of Taylor, 1990). the data sets and computational functions
Very little is known, however, about the of GIS, and in some cases have even
characteristic forms that may exist in spatio- exposed the more sophisticated functions
temporal phenomena. Hagerstrand (1970) of spatial analysis. For example, anyone
and others have examined the movements requesting driving directions from one of
of individuals in space and time using these sites receives answers that result from
three-dimensional displays, in which the two the execution of a complex algorithm that
spatial dimensions form the horizontal plane was previously the reserve of operations
and time forms the vertical axis. Much of this researchers and specialists in spatial opti-
work focuses on similarities that may exist in mization.
the forms of such tracks, and the implications The methods of cartography and related
they may have for process. We know from disciplines are complex, and it is no surprise
the work of many researchers (e.g., Janelle therefore that sophisticated tools in nave
and Goodchild, 1983) that different social hands can produce mistakes. A suitable
conditions lead to dramatically distinct track example concerns the Greenwich meridian,
forms, as for example in the differences and its position when displayed in Google
between the daily tracks of single mothers, Earth. Many users of this site have noted that
with their orientations to both workplace the zero of longitude misses the Greenwich
and daycare, and the tracks of workers Observatory by approximately 100 m, and
in families in which only one of two have posted comments, some of which
adults works. conclude that a serious mistake has been
The development of greater support for made by Google, and by extension that
time in GIS may lead to many other the georegistration of imagery on the site
recognizable patterns in spatio-temporal data, is poor. In reality, the WGS84 (World
and to a rebirth of interest in the study of Geodetic System of 1984) datum, now widely
spatio-temporal form. A new generation of adopted around the world, does not place
analytic techniques is needed that extracts the Greenwich Observatory at exactly zero
meaningful pattern from the mass of tracks longitude, despite the international treaty that
displayed in the visualizations of Kwan and established it there in 1884 and the position
Lee (2004) and others, and links such patterns shown in Google Earth appears to be correct
to hypotheses about process. to within a few meters.
Although their support for spatial anal-
ysis is extremely limited, these sites have
clearly provided the general public with
24.4. SPATIAL LITERACY access to a rich resource, and thousands of
people have been empowered to create their
In the past few years a remarkable series own applications. The recent publication
of Web sites have brought the sophisti- Mapping Hacks (Erle, et al., 2005) describes
cated functions of GIS and spatial analysis many fascinating examples, but contains
much closer to the general public. While not a single reference to the cartographic
effective use of GIS requires extensive literature. At the same time students who
training, and in many cases advanced work have endured many hours of lectures and lab
at the undergraduate level, technologies exercises to become competent in GIS may
such as Google Earth have given every be frustrated to realize that a child of ten
CHALLENGES IN SPATIAL ANALYSIS 473
can create a computationally complex fly-by material might fit in the already stove-piped
using Google Earth in a few minutes. curriculum.
It seems clear that in part as a result
of these developments the demand for
basic knowledge of the principles of spatial
analysis, GIS, geography, cartography, and 24.5. BEYOND TRADITIONAL
related fields for basic spatial literacy PRACTICE IN SCIENCE
is perhaps two or more orders of magnitude
out of alignment with the supply. Education When Harvey wrote his well-known and
in these topics cannot be confined to a few highly influential Explanation in Geography
advanced undergraduates, and to campuses (Harvey, 1969) the dominant form of
lucky enough to have faculty interest, if it scientific practice centered on the individual
is to be accessible to the numbers of people investigator, whose methods followed a set of
now exposed to and enthusiastically adopting well-defined principles. For example, every
these tools. In this respect, spatial analysis experiment was to be reported in sufficient
faces an unprecedented challenge, to make detail to allow its replication by another
itself known to a much larger community independent investigator. Every numerical
than previously. result was to be reported with a level of
There are several ways in which such precision that matched its accuracy. Every
a challenge might be met, by concerted search of the literature was to be complete
effort on the part of the spatial-analysis and comprehensive, so that the investigator
community. One is to bring spatial literacy could demonstrate knowledge of all previous
into the general-education or core curriculum and relevant work and prove the new
of institutions of higher education, making works originality. The principle of Occams
its material accessible and eligible for credit Razor a willingness to adopt the simplest
for the vast majority of undergraduates. of several competing explanations was
Courses in other kinds of literacy are already universally accepted, as was the notion
available in this form; the argument needs that all conclusions could be subject to
to be made that familiarity with spatial empirical test and possible rejection. The
analysis and GIS represents another, and goal of science was complete explanation,
arguably a more powerful form of literacy or in statistical terms an R2 of 1. When
that should be part of the education of every sample data were analyzed, all numerical
citizen. Another strategy would be to develop results were to be subject to tests of
a larger and more visible set of courses statistical significance, to prove that they
in the informal education sector, making were not likely to be simply artifacts of the
spatial literacy part of on-line and certificate particular sample chosen, but properties of
programs, and exposing its contents through the population from which the sample was
libraries, museums, and other institutions. presumed to be drawn. All terms were to be
A third is to work to introduce spatial rigorously defined, and vague terms were to
literacy earlier in the educational hierarchy, be replaced by ones that met the standard of
in high school and even elementary school. objectivity rigorous and shared definition,
Valiant efforts have been made in this such that two investigators would always
direction in the past, but they remain agree on the outcome when the definition
minimal in comparison with the size of was applied.
the primary and secondary sectors, and These standards are of course collectively
there is much confusion about where such unattainable in all circumstances. They may
474 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
be more attainable in some disciplines than when it was no longer possible to believe
others, and certainly it is possible to imagine that every aspect of a computational analysis
a physicist having no difficulty adhering could be replicated by hand, given enough
to them, and being fiercely critical of any time. Operating systems were perhaps the
study that appeared to relax them. But first such area of computing by 1990 they
researchers in the general domain of this had advanced to the point where it was no
book clearly encounter situations in which longer possible to believe they were the work
one or more of them is distinctly problematic. of one person, or that any one individual fully
This is not to say that one should therefore understood every aspect of their operation.
reject them outright, and follow the lead of Today these failures are commonplace. The
those who have looked for alternatives to documentation of our more sophisticated
scientific principles rather, they constitute software, including GIS, is often not suffi-
goals to which research should attempt cient to detail every aspect of an analysis, and
always to aspire, while admitting that it may it may be impossible to discover exactly how
sometimes fall short. This section explores a given system computes a standard property,
three of these issues in some detail, and such as slope, from a given input (Burrough
then argues for a renewed approach to and McDonnell (1998) detail some of the
scientific methodology that better reflects the options, but many more can be hidden in
real conditions under which spatial analysts the details of a given implementation). In
currently work. effect the developers of software, many
of them operating in for-profit commercial
environments, have become authorities that
must be trusted, and it is difficult to submit
24.5.1. Collaboration, replicability,
their products to rigorous and exhaustive test.
and the black box
Moreover, researchers now find it
Before the widespread adoption of com- increasingly effective to work in teams,
puting, it was customary for instructors in each team member providing some specific
statistics courses to insist that each student expertise. Funding agencies often express
be able to carry out a test by hand, before a willingness to fund research that brings
using any computational aids. Only then, together teams from many disciplines, in the
it was argued, would the student fully interests of greater collaboration and cross-
understand the process involved, and be able fertilization of ideas. But such arrangements
to replicate it later. In this simple world it inevitably lead to situations in which no
was possible to assume that every researcher one individual knows everything about an
knew every detail of every analysis, and analysis, and members of the team have
that the published version of the research little alternative but to trust each other, just
would include sufficient detail to allow others as researchers often have little alternative
to repeat the experiment and replicate the but to trust software.
results.
This principle has come under fire in
recent decades, for a number of reasons.
24.5.2. Keeping the stakeholders
Computational aids have advanced to the
happy
point where it is not possible for any
one individual to comprehend fully all of Tools such as GIS invite researchers to
the algorithms involved. The author recalls become involved in the processes of policy
passing a threshold, some time around 1990, formulation and decision making. The very
CHALLENGES IN SPATIAL ANALYSIS 475
architecture of GIS, with its database of this guarantees, however, that the results
local details and its procedures representing presented to the stakeholders are in fact based
general principles, invites engagement with on good science. It is easy, with a little
the ultimate users of research, since it allows thought, to manipulate the outcomes of such
decision makers to investigate the effects processes to achieve hidden objectives. For
of manipulating outcomes in local contexts, example, when stakeholders are presented
and gives them many useful tools for with five alternatives and asked to choose
implementing the results of analysis. A new one, it is easy to see how the outcome
subdiscipline, public-participation GIS, has might be manipulated by presenting a set
grown up to study these issues, and to that includes the desired outcome, plus
improve the use of GIS and spatial analysis four obviously unacceptable red herrings.
in public decision making. Experience suggests that stakeholders will
Many of the arguments for the use of find no difficulty in assigning relative mea-
technology in support of decision mak- sures of importance to factors, irrespective
ing for spatial decision support sys- of whether the factors are or are not
tems (Densham, 1991) center on the commensurate, and whether or not any
benefits of these tools in settings that definition of importance has been advanced
involve the potentially conflicting views and agreed.
of multiple stakeholders. Much has been
written about spatial-analytic techniques that
support multiple views, and address multiple
24.5.3. Accuracy, uncertainty,
criteria (Voogd, 1983; Eastman, 1999; Thill,
and cost
1999; Malczewski, 1999). GIS may allow
stakeholders to express their own views as All measurements are subject to error,
sets of weights to be given to relevant and science has developed sophisticated
factors. Saatys Analytic Hierarchy Process techniques for measuring instrument accu-
(Saaty, 1980) is a widely used technique racy, and for determining how accuracy
for eliciting such weights from stakeholders, impacts the results of analysis. The basic
and for deriving consensus weights and principles of error analysis have been adapted
measures of agreement. Stakeholders benefit to the specific needs of geographic data by
from the visualization capabilities of these Heuvelink (1998) and others, and statistical
systems, which allow them to see the effects models have been developed for most of the
of decisions in readily understood ways. standard geographic data types.
They gain the impression that decisions are Uncertainty is often defined as the degree
made scientifically, with abundant use of to which data leaves the user uncertain about
mathematics and computation, and are led the true nature of the real world. As such it
to believe that these approaches represent presents a greater problem, because it derives
a more objective, more desirable approach to not from errors in measurement, but from
debate and conflict resolution. vagueness in definitions, lack of detail, and
It is all too easy in such circumstances numerous other sources. When definitions are
to see stakeholder satisfaction as the pri- vague, there can be no objective definition of
mary goal of the exercise. If stakeholders truth, but only the less satisfactory concept
leave the room believing that a rigorous, of consensus. A scientist steeped in tradi-
scientific process has been conducted then tional methodology would react by rejecting
everyone can feel that a useful exercise has vague terms entirely, replacing them with
come to an acceptable conclusion. None of terms that have rigorous definition, and are
476 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
therefore capable of supporting replicability. have been made over the past decade
Subjective terms such as warm, cold, in data warehouses, spatial data centers,
near, and far would be replaced by well- and geo-portals, with a view to facilitating
defined scales of temperature measurement the discovery and sharing of spatial data.
and distance. Metadata standards have been devised that
Nevertheless, GIS and to a lesser extent support search, by allowing researchers to
spatial analysis clearly exist at the interface hunt through catalogs looking for data that
between the rigorous, scientific world of might meet their needs.
well-defined terms and replicable experi- Yet almost certainly data discovered in
ments, and the vague, intuitive world of this way will fail to meet the exact needs
human discourse. Many users of GIS appear of the researcher. The data set will be too
happy to work with vaguely defined classes generalized, not sufficiently current, too inac-
of vegetation or land use, and there has been curate, or inadequate in another of a myriad
much interest in building user interfaces to of possible ways. In these circumstances
GIS that come closer to emulating human it is inevitable that research objectives
ways of reasoning and discovering. Nave become modified to fit the properties of
geography has been defined as a field that the available data, if the alternative is an
studies the simplifications humans often exercise in field data collection that may
impose on the world around them, and be impossibly expensive. But the prevailing
writers have speculated about the potential methodology of science says nothing about
for systems that also simplify that think such compromises, maintaining instead that
more like humans do. data must be exactly fit for purpose, and
In the past decade or so there has been providing no basis on which users can
much interest in the application of fuzzy find compromises between cost on the one
sets, rough sets, and related ideas in spatial hand, and accuracy or fitness for use on
analysis. There seems to be some degree the other.
of intuitive appeal in the idea of assigning
degrees of membership to a class, even
when the class is not itself well defined.
24.5.4. Summary
Methods have been devised for eliciting
fuzzy membership values from professionals, The previous three sections have presented
from remotely sensed data, and from other examples of the ways in which spatial
sources, and for displaying these values in analysts increasingly find the traditional prin-
the form of maps. All of these methods ciples of scientific methodology inadequate
stretch the norms of science, by arguing that as a guide to practice. While much of
it is possible to observe and measure useful science is concerned with the nomothetic
properties despite a lack of agreement on the goal of discovering general principles that
definitions of those properties. As such, they apply everywhere in space and time, spatial
demand a re-examination of the basic tenets analysis is increasingly concerned also with
of scientific method. the variations that exist in such principles
Finally, spatial analysts find themselves from place to place, and in the ways
today in a world overflowing with data. in which such principles are placed in
Satellite images, digital topographic maps, local context to solve problems and make
and a host of other sources provide an decisions. As Laudan (1996) has argued,
unprecedented opportunity for new and there is no longer an effective method-
interesting research. Massive investments ological distinction between science and
CHALLENGES IN SPATIAL ANALYSIS 477
problem-solving, since the same principles basic activities as wayfinding and activity
apply to both. In summary, spatial analysts planning.
face an important challenge, to develop a How the field responds to these challenges
new methodological understanding that is remains to be seen, of course. Undoubtedly
consistent both with the traditional tenets of new and better techniques will be discovered
the scientific method, and with the realities and published in the next few years, new
of current practice. code will be written, and new application
areas will be described. But the challenges
described in this chapter seem to go beyond
such business-as-usual, and to require dis-
24.6. CONCLUSIONS cussion across the entire community. Such
community-wide debate has occurred very
The four major sections of this chapter rarely in the past, yet is more feasible
have argued that spatial analysis faces many than ever with todays communications
challenges at this time, but it also faces technologies.
unprecedented opportunity. More people than
ever are aware of its potential, and the tools
to implement it are more sophisticated and
powerful than ever. ACKNOWLEDGMENTS
Discussions of the importance of spatial
analysis often focus on one or two partic- Support of the U.S. National Science Foun-
ularly compelling application domains, and dation through award BCS 0417131 is
it may well be that by making the case for gratefully acknowledged. The author also
spatial analysis in support of improved public benefited from an E.T.S. Walton Fellowship
health, for example, or better response to which allowed him to spend much of the
emergencies, it will be possible at the same 20056 academic year at the National Centre
time to promote the entire field. On the other for Geocomputation, National University of
hand, one might argue that identifying spatial Ireland, Maynooth.
analysis too clearly with one application
domain tends to render the case for other
applications more difficult. Essentially, it
can be very difficult to promote a set of
techniques that are applicable to almost
NOTE
everything the case for spatial analysis is 1 National Center for Geographic Information
everywhere, and yet at the same time it is and Analysis, and Department of Geography,
nowhere. University of California, Santa Barbara, CA 93106-
4060, USA. Phone +1 805 893 8049, Fax +1 805
The argument for spatial literacy made 893 3146, E-mail good@geog.ucsb.edu
in section 24.4 seems especially relevant in
this context. Many skill areas are important
across a vast array of human activities,
including skill in language, in mathematics,
and in logic. Spatial analysis should not REFERENCES
be a highly specialized area of technique
Abler, R.F. (1987). The National Science Foundation
that is only accessible to experts, but National Center for Geographic Information and
should be part of every citizens basic Analysis. International Journal of Geographical
set of skills, and used every day in such Information Systems, 1: 303326.
478 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
Anselin, L. (1989). What is Special About Spatial Journal of Geographical Information Systems, 1:
Data? Alternative Perspectives on Spatial Data 327334.
Analysis. Technical Report 894. Santa Barbara,
Goodchild, M.F. (2003). The fundamental laws
CA: National Center for Geographic Information and
of GIScience. Paper presented at the Summer
Analysis.
Assembly of the University Consortium for
Anselin, L. (1995). Local indicators of spatial associa- Geographic Information Science, Pacic Grove, CA,
tion LISA. Geographical Analysis, 27: 93115. June. Available: http://www.csiss.org/aboutus/
presentations/les/goodchild_ucgis_jun03.pdf
Anselin, L. and Getis, A. (1992). Spatial statistical
analysis and geographic information systems. Annals Goodchild, M.F., Anselin, L. and Deichmann, U.
of Regional Science, 26: 1933. (1993). A framework for the areal interpolation of
socioeconomic data. Environment and Planning A,
Bunge, W. (1966). Theoretical Geography. 2nd edn. 25: 383397.
Lund Studies in Geography Series C: General and
Mathematical Geography, No. 1. Lund, Sweden: Goodchild, M.F., Fu, P. and Rich, P. (in press).
Gleerup. Sharing geographic information: an assessment of
the geospatial one-stop. Annals of the Association
Burrough, P.A. (1990). Methods of spatial analysis of American Geographers.
and GIS. International Journal of Geographical
Information Systems, 4: 221223. Goodchild, M.F., Haining, R.P. and Wise, S. (1992).
Integrating GIS and spatial analysis: problems and
Burrough, P.A. and McDonnell, R.A. (1998). Principles possibilities. International Journal of Geographical
of Geographical Information Systems. New York: Information Systems 6: 407423.
Oxford University Press.
Goodchild, M.F. and Longley, P.A. (1999). The future
Densham, P.J. (1991). Spatial decision support systems. of GIS and spatial analysis. In: Longley, P.A.,
In: Maguire, D.J., Goodchild, M.F. and Rhind, D.W. Goodchild, M.F., Maguire, D.J. and Rhind, D.W.
(eds), Geographical Information Systems: Principles (eds), Geographical Information Systems: Principles,
and Applications. pp. 403412. Harlow, UK: Techniques, Management and Applications.
Longman Scientic and Technical. pp. 235248. New York: Wiley.
Ding, Y. and Fotheringham, A.S. (1992). The inte- Hgerstrand, T. (1970). What about people in regional
gration of spatial analysis and GIS. Computers in science? Papers of the Regional Science Association,
Environmental and Urban Systems, 16: 319. 24: 721.
Eastman, J.R. (1999). Multi-criteria evaluation and GIS. Harvey, D. (1969). Explanation in Geography.
In: Longley, P.A., Goodchild, M.F., Maguire, D.J. New York: St Martins Press.
and Rhind, D.W. (eds), Geographical Information
Systems: Principles, Techniques, Management and Heuvelink, G.B.M. (1998). Error Propagation in
Applications. pp. 225234. New York: Wiley. Environmental Modelling with GIS. Bristol, PA: Taylor
and Francis.
Erle, S., Gibson, R. and Walsh, J. (2005). Mapping
Hacks: Tips and Tools for Electronic Cartography. Janelle, D.G. and Goodchild, M.F. (1983).
Sebastopol, CA: OReilly Media. Transportation indicators of space-time autonomy.
Urban Geography, 4: 317337.
Fotheringham, A.S., Brunsdon, C. and Charlton, M.
(2002). Geographically Weighted Regression: The Kwan, M.-P. and Lee, J. (2004). Geovisualization of
Analysis of Spatially Varying Relationships. Hoboken, human activity patterns using 3D GIS: A time-
NJ: Wiley. geographic approach. In: Goodchild, M.F. and
Janelle, D.G. (eds), Spatially Integrated Social
Fotheringham, A.S. and P. Rogerson, (eds), (1994). Science, pp. 4866. New York: Oxford University
Spatial Analysis and GIS. London: Taylor and Francis. Press.
Getis, A. and Ord, J.K. (1992). The analysis of spatial Langran, G. (1993). Time in Geographic Information
association by distance statistics. Geographical Systems. London: Taylor and Francis.
Analysis, 24: 189206.
Laudan, L. (1996). Beyond Positivism and Relativism:
Goodchild, M.F. (1987). A spatial analytical perspective Theory, Method, and Evidence. Boulder, CO:
on geographical information systems. International Westview Press.
CHALLENGES IN SPATIAL ANALYSIS 479
Maguire, D.J. and Longley, P.A. (2005). The emergence Sui, D.Z. (ed.), (2004). Forum: on Toblers First Law of
of geoportals and their role in spatial data Geography. Annals of the Association of American
infrastructures. Computers, Environment and Urban Geographers, 94(2): 269310.
Systems, 29(1): 314.
Takeyama, M. and Couclelis, H. (1997). Map dynamics:
Malczewski, J. (1999). GIS and Multicriteria Decision integrating cellular automata and GIS through
Analysis. New York: Wiley. Geo-Algebra. International Journal of Geographical
Information Science, 11(1): 7391.
National Research Council (2006). Learning to Think
Spatially: GIS as a Support System in the K-12 Taylor, P.J. (1990). GKS. Political Geography Quarterly,
Curriculum. Washington, DC: National Academies 9(3): 211212.
Press.
Thill, J.-C. (1999). Spatial Multicriteria Decision Making
Openshaw, S. (1990). Spatial analysis and geographical and Analysis: A Geographic Information Sciences
information systems: a review of progress and Approach. Brookeld, VT: Ashgate.
possibilities. In: Scholten, H.J. and Stillwell, J.C.H.
Tobler, W.R. (1970). A computer movie simulating
(eds), Geographical Information Systems for Urban
urban growth in the Detroit region. Economic
and Regional Planning. pp. 153163. Dordrecht:
Geography, 46: 234240.
Kluwer.
Tomlin, C.D. (1990). Geographic Information Systems
Peuquet, D.J. (1999). Time in GIS and geographical
and Cartographic Modeling. Englewood Cliffs, NJ:
databases. In: Longley, P.A., Goodchild, M.F.,
Prentice Hall.
Maguire, D.J. and Rhind, D.W. (eds), Geograph-
ical Information Systems: Principles, Techniques, Ungerer, M.J. and Goodchild, M.F. (2002). Integrating
Management and Applications. New York: Wiley. spatial data analysis and GIS: a new implementation
using the Component Object Model (COM). Interna-
Peuquet, D.J. (2001). Making space for time: issues
tional Journal of Geographical Information Science,
in spacetime representation. Geoinformatica, 5(1):
16(1): 4154.
1132.
van Deursen, W.P.A. (1995). Geographical Informa-
Peuquet, D.J. (2002). Representations of Space and
tion Systems and Dynamic Models: Development
Time. New York: Guilford.
and Application of a Prototype Spatial Mod-
Saaty, T.L. (1980). The Analytic Hierarchy Process: elling Language. Nederlandse Geograsche Studies
Planning, Priority Setting, Resource Allocation. 190. Utrecht: Koninklijk Nederlands Aardrijkskundig
New York: McGraw-Hill. Genntschap/Faculteit Ruimtelijke Wetenschappen
Universiteit Utrecht.
Sui, D.Z. and Goodchild, M.F. (2001). Guest Editorial:
GIS as media? International Journal of Geographical Voogd, H. (1983). Multi-Criteria Evaluation for Urban
Information Science, 15(5): 387389. and Regional Planning. London: Pion.
25
The Future
for Spatial Analysis
Reginald G. Golledge
25.1. SPATIAL ANALYSIS PAST for valid and reliable conclusions from
AND PRESENT active and innovative research. A variety
of exploratory and confirmatory, qualita-
The future of geography is inextricably bound tive and quantitative procedures have been
to the future of spatial analysis. Why? Simply developed or explored for relevance, and
because spatial analysis captures the essence relevant procedures and methodologies have
of a support system for the science and been globally termed spatial analysis.
technology involved in geospatial thinking While some parts of the discipline are
and reasoning. The latter are the distinct content to imitate the theories, methods, and
and unique contributions of geography to technologies of other physical or human
the universe of academe, government, and sciences, or to copy the research designs
business. and practices of the various humanities,
For about 50 years, geographers have parts of geography have vigorously explored
been slowly but surely building a structure the development of unique means of think-
of theories, models, methods, technologies, ing, reasoning, analyzing, and represent-
and vocabulary that anchor the disciplines ing geospatial information. Spatial analy-
claim to being a science. This effort has sis has been perhaps the most vigorous
occurred both in the physical and human of these throughout the years. Lately, it
components of the discipline. A common has been complemented by the enthusiasm
theme in both efforts has been the search for technology particularly Geographic
482 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
Information Systems (GIS). However, most manner, and covers both physical science
academic practitioners realized quickly that (natural science) and human science (the science
GIS needed a wider base: a base of analysis as involved in comprehending humanhuman and
well as its forte in representation. To provide humanenvironment relations). It provides a
this base, Geographic Information Science menu for ensuring valid and reliable reasoning
in the forum of knowledge accumulation.
(GISc) developed. Spatial analysis proved
to be a primary support system for GISc, Spatially referenced data either in relative
and the two themes have converged to give (qualitative) or absolute (quantitative) form
geographic researchers and teachers powerful has become the currency of todays information
new ways to explore the massive data banks processing society. Spatial analysis is exclusively
of the new digital world. developed for analyzing place-based digital
Many geographers would not agree with information. It includes the use of topologies,
my opening statement. I would challenge geometry, fuzzy logic, and multidimensional
them to disprove it or to make valid reasoning capabilities, all directed towards the
claims for other dimensions of the discipline. spatial domain. Thus, it is useful at all scales from
One could not support a contrary argument the nano and micro levels to the gigantic scale of
universe-wide exploration, and is being diffused
based on geographys traditional role of
through areas as different as neurological exper-
collecting facts about the earths physical
imentation, archeological reconstructions of past
or human environment. While other aspects
civilizations, and the search for extraterrestrial
of the discipline continue to have much understanding.
to offer in terms of understanding the
relations between people and places, it is It is generally agreed that geographers have a
not always possible to differentiate the unique way of examining problems (Beck, 1967;
geographic/geospatial component from the Uttal, 2000), and that diagrammatic (including
more general humanities, political sciences, map-based) reasoning provides insight into many
or social sciences thinking and reasoning that problems that is unattainable using conventional
drives much of this work. Thus, it has the reasoning such as verbal, text-based, and mathe-
potential to contribute to the accumulation matical procedures. This uniqueness begins with
the accepted signicance of the spatial domain
of general social and cultural knowledge
(something that has been rather neglected by
more than to geospatial knowledge. This
other disciplines and by many parts of the
can be viewed as a positive result if one human side of geography), and then expresses
accepts that integrated disciplinary thinking itself via its emphasis on visualization and
is likely to be of future importance, but spatialization processes. Data is collected with
does little to support or enhance the image some form of spatial coding (Klatzky, et al.,
and practice of geography in the real 1990; Fujita, et al., 1993), and is represented
world. by the spatializations such as at (2D) paper
So, why does Spatial analysis hold the key maps, 3D models, and on-screen image-based
(in my opinion) to the future of geography? representation (graphs and graphics), all of which
To reflect on this, I offer the following require a particular form of interpretation. Faithful
thoughts (see also Goodchild, 2001): representation is one of the prerequisites for
spatial analysis.
Spatial analysis is a unique and special During the second half of the 20th century,
contribution by geographers to the ongoing geography matured by borrowing (sometimes
trend of integrated science. Here, science is wholesale, sometimes modied) theories from
interpreted in both a qualitative and quantitative other disciplines. As the profession gained more
THE FUTURE FOR SPATIAL ANALYSIS 483
condence in its ability to offer innovative, expanded through academe, government, and
exploratory and conrmatory investigations of business, many disciplines have laid claim to
spatial and geospatial concerns, there nally being the principal originator and purveyor of GIS
emerged a series of spatially explicit theories technology. But none have been able to dispute
of the relations that were being uncovered geographys claims to the special conuence
by research in the spatial (and specically of GIScs search for relevant spatial theory,
geographical) domains. These theories tended its representational capability, and the many
to be investigated and validated using spatial procedures of spatial analysis that add meaning
analysis. They included timespace associations, and usefulness, validity and reliability to GIScs
spatial decision making, spatial choice, location problem solving activities. The integration of
theory, locationallocation processes, population GIS and spatial analysis has been inuential
density gradients, the form and structure of in moving GISc-related research beyond mere
built environments, geospatial learning, move- technology to scientic status. Via this link,
ment behavior at different scales, and other spatial analysis has been forming the basis
areas that are explicitly spatial (see earlier for new theories that incorporate human
chapters). And, as the profession learned to environment relations, e.g., spatial knowledge
think and reason spatially (rather than socially, acquisition (Golledge, 1978; Montello, 1998) and
politically, or economically), the processes new theories of data and data manipulation
involved in spatial analysis continued to grow in (Goodchild, 2004; Couclelis, 2003).
importance.
But these later trends have provided a global climate change or understanding human
rationale and need for specific spatially-based spatial abilities;
means of examining, processing, and repre-
made geographic training and expertise a
senting the data that is becoming increasingly
valuable commodity in the job market;
available in digital form. The need for such
procedures is not confined to geography. brought the realization that, as globalization
Other social, behavioral, political, economic, of societies and their essential activities occur,
and health sciences, for example, have geographers have a unique contribution to
discovered that their data banks are being make in the form of geo-education, spatial
spatialized by geocoding of occurrences and concept recognition, and spatial thinking and
attributes, and that traditional measures of reasoning;
statistical analysis do not account for the
encouraged exploring the possibility of enhancing
effects of spatial coincidence or variation.
geography in the K-12 system of education.
Hence, the demand for spatial analysis
is growing in these disciplinary areas.
I predict it will continue to grow. It is I anticipate that each of these contributions
the goal of every spatial theorist to see will become more important in both the near
various methods for spatial analysis of data and distant futures.
incorporated into every standard statistical To speculate about the what and where
package, thus imprinting this contribution of spatial analysis contribution to the future
by geographers on the domain of every of geography, consider the following:
spatially oriented discipline. One recent
example of this recognition is the inclusion
of a chapter on GIS in a recent Handbook Recognition that spatial analysis applies and can
of Environmental Psychology (Bechtel and be used at all scales from the nano scale to
Churchman, 2002) and a decision by the the universal. We already have evidence that
American Psychological Association (APA) researchers in microbiology, neurology, DNA, and
stem cell research (as well as other research
to support an advanced institute on GIS and
areas not traditionally identied with geography)
spatial analysis (probably in 2007).
are facing questions concerning representation
and analysis of their spatially-based ndings.
Both GISc and spatial analysis potentially have
an important contribution to make in these
25.3. NEW DIRECTIONS FOR areas (e.g., via spatialization, representation, and
SPATIAL ANALYSIS analysis).
The interweaving of GISc and spatial analysis One of the most important frontiers for future
has given to geography a justifiable scientific research is to investigate how the mind works.
base that, for most of geographys history, has Great advances already have been made in
discovering how the brain works. Indeed, one
been lacking. This new basis has:
of the most intriguing investigations from a
geographers viewpoint is the extent to which
increased the public and academic image of place cells exist (OKeefe and Nadel, 1978) and
geography as a serious scientic discipline; form a basis for internal data manipulations that
constitutes the minds contribution to solving
improved the standing and reputation of geog- spatial problems. The question arises then, if
raphy as a useful contributor to the examination place cells do exist, what light is shed on
and solution of problems such as comprehending how data is sensed and coded and stored in
THE FUTURE FOR SPATIAL ANALYSIS 485
the brain? What happens when we start to or symbolic representation. For example,
think spatially? Is there a particular pattern of a glance at the psychology literature on
neural excitation when we think spatially? Can spatial perception and cognition reveals
spatial analysis help both to investigate this little comprehension of the role space plays
and add a newly emerging area for geospatial in information gathering and information
investigation?
processing in the large uncontrolled spaces
The world is digitizing. We already have more of the real, inhabited world, and various
data from satellites than can conceivably be graphic and image-based representations of
analyzed in the present or the near future. The this world.
question arises as to whether the existing form There also appears to be a growing
of spatial analysis can contribute to performing demand for applied geography, particularly
data mining and, as necessary, add new and in government and business domains. We
valuable components to existing search engines. have already seen such a demand within
A question for the future may be: are there the business community as with the use
yet other levels of spatial analysis we have of locationallocation models and use of
not yet thought about but which could be location-based services. Spatial analysis is a
an essential part of recovering the spatial
key to expanding this demand. The result
relations contained in these massive archival
should be a more widespread acceptance of
structures?
the contributions that geography can make to
everyday life and practice throughout local
As disciplines such as psychology and and global societies.
cognitive science experiment more in the real In my opinion, therefore, spatial analysis,
world (in addition to ongoing research in perhaps in conjunction with the use of GIS
laboratories and virtual systems), and as the technology and a GISc search for reliable
importance of scale effects and the significant and valid bases for knowledge accumulation,
role of place-to-place variation in forming will provide an avenue for maintaining and
attitudes and behavior is realized, so too has expanding the image and acceptance of
the demand for spatial analysis started to geography as an integrated science that has a
emerge. There is much room for geographers positive capacity to assist the search for new
to both teach about and disseminate spatial knowledge, and improve our general quality
analysis procedures within and beyond the of life.
profession of geography. For decades, we As a final statement, allow me to raise
have been borrowing from measurement a question that is critical to the future
theory from maths symbolic thinking strate- of geography itself. Are we producing
gies, from mathematical models developed graduates who can compete for jobs
in economics, and analytic procedures from in academic, government, or business
psychology and mathematical statistics; it is marketplaces? Sadly, the answer for most of
time to return this favor by encouraging the the profession is NO! But spatial analysts
use of spatial analytic techniques for pro- and GIS programs are doing this, very
cessing relevant geospatial data and drawing successfully. To return to my opening
attention to the very specific contributions statement: the future of geography as a
of space in the construction of knowledge. viable discipline is inextricably tied to
At the very least, psychologists and cognitive the continued development and use of
scientists should become aware of both the spatial analysis. Weve already seen the first
advantages and disadvantages of spatializ- indicators of this in terms of which students
ing data for graphic, map, image-based, are getting jobs today outside of academe.
486 THE SAGE HANDBOOK OF SPATIAL ANALYSIS
As a discipline, we must become more aware completion without vision. Geographical Analysis,
of this need and do our best to ensure that 25(4): 295314.
those areas contributing most to this pattern Golledge, R.G. (1978). Learning about urban environ-
are well supported in the near and more ments. In: Carlstein, T., Parkes, D. and Thrift, N.
distant future. (eds), Timing Space and Spacing Time, Volume I:
Making Sense of Time, pp. 7698. London: Edward
Arnold.
Goodchild, M.F. (2001). A geographer looks at spatial
ACKNOWLEDGMENTS information theory. In: Montello, D.R. (ed.), Spatial
Information Theory: Foundations of Geographic
The research for this chapter was partly Information Science. Proceedings, International Con-
supported by NSF Grant # BCS0239883 ference, COSIT 2001, Morro Bay, CA, September,
pp. 113. New York: Springer.
(Spatial Thinking) and by UCTC Grant #
SA4655 (Assessing Route Accessibility for Goodchild, M.F. (2004). GIScience: geography, form,
Wheelchair Users). and process. Annals of the Association of American
Geographers, 94(4): 709714.
Klatzky, R.L., Loomis, J.M., Golledge, R.G.,
Cicinelli, J.G., Doherty, S. and Pellegrino, J.W.
REFERENCES (1990). Acquisition of route and survey knowledge
in the absence of vision. Journal of Motor Behavior,
Beck, R. (1967). Spatial meaning and the properties 22(1): 1943.
of the environment. In: D. Lowenthal (ed.), Montello, D.R. (1998). A new framework for under-
Environmental Perception and Behavior (Research standing the acquisition of spatial knowledge
Paper No. 109, pp. 1829). Chicago: Department in large-scale environments. In: M.J. Egenhofer
of Geography, University of Chicago. and Golledge, R.G. (eds), Spatial and Temporal
Bechtel, R.B., and Churchman, A. (eds). (2002). Reasoning in Geographic Information Systems,
Handbook of Environmental Psychology. New York: pp. 143154. New York: Oxford University Press.
John Wiley & Sons.
OKeefe, J. and Nadel, L. (1978). The Hippocampus as
Couclelis, H. (2003). The certainty of uncertainty. a Cognitive Map. Oxford: Clarendon Press.
Transactions in GIS, 7(2): 165175.
Uttal, D.H. (2000). Seeing the big picture: Map use and
Fujita, N., Klatzky, R.L., Loomis, J.M. and Golledge, the development of spatial cognition. Developmental
R.G. (1993). The encoding-error model of pathway Science, 3: 247264.
Index
sequential Gaussian simulation (SGS) 168 Goodchild, M.F. 6, 25, 26, 29, 30, 33, 34, 36,
spatial prediction and simulation 1658 116, 117, 256, 358, 404, 405, 406, 466, 467,
stochastic imaging 168 470, 471, 472, 482, 483
using secondary variables 1713 Google Earth 34, 4723
GeoSurveillance 352 Goovaerts, P. 93, 99, 159, 160, 167, 168, 173,
GeoTools 291 178, 191
GeoVISTA Studio 478, 50, 52 Gopal, S. 3878, 412
Geovisual Analytics 567 Goreaud, F. 96
geovisualization see also visual data exploration; Gottsegen, J. 116
visualization Gotway, C.A. 119, 173, 178, 302, 304, 305, 343
3D 502 gradient descent optimization 3823
bedrock-fractures-radon visualization as Graniero, P.A. 227, 229
a 2.5D surface. 51 graphic representation, of data 43
definition and description 435 graphical tests 75
developing tools 467 Green, J.L. 90
examples 4855 Green, M. 119
fixed row matrix of bivariate visualizations 53 Greenland, S. 360
GeoVISTA-based system displaying a Gress, B. 269
synthetic spatial dataset 51 grid computing environments software 413
mobile 57 grid-enabled computing 36
research topics 57 grids 8
software 478 Griffith, D.A. 113, 185, 255, 267
and spatial data exploration 437 Grtschel, M. 203
spatialization of a non-spatial phenomenon 55 group work 56
Visually discovering relationships between the Gstat 165
spatio-temporal attributes from the SOM Gumerman, G.J. 306
component planes visualization 54 Guneralp, B. 228
Getis, A. 21, 92, 96, 219, 343, 466, 471 Guptill, S.C. 6, 10
Ghosh, A. 422, 431 Gustafson, E.J. 100
Ghosh, S. 330 Guy, C.M. 422
Gibbons, S. 269 GWR software 31, 250
Gibbs sampler 324, 328 model editor 251
Gilley, O. 1468
Gimblett, H.R. 411 Haas, T.C. 173, 174
GISc see Geographical Information Haase, P. 96
Science (GISc) Hagen, A. 228
GIScience 26 Hagen-Zanker, A. 228
GISystems 26 Hgerstrand, T. 278, 358, 411, 472
Glennerster, H. 282, 286 Haggett, P. 356
global network auto K function 4524 Haining, R.P. 11, 13, 14, 15, 16, 17, 18, 19, 20,
street burglaries, Tokyo 454 32, 33, 91, 92, 93, 183, 184, 188, 190, 202,
global network cross K function 454, 455 203, 255
global network cross K function, comparison Hall, P. 260
between ordinary and Voronoi 457 Han, D. 358, 360
global network Voronoi cross K function 4567 Han, J. 41, 43, 71, 72
global spatial statistics 3434 see also local Hancock, R. 283, 286
spatial statistics; spatial statistics Hanna, A.S. 234
effects of extent 956 Hansen, W.A. 424
sampling issues 96 Hanson, S. 431
Godfrey, L. 263 Hare, M. 411
Goldberg, D.E. 203, 3824 Harvey, D. 473
Golledge, R.G. 483 Hastie, T. 389
Gong, P. 412 Hastings, 324
Good, P. 98 Hatch, M. 357
496 INDEX
Lawson, A.B. 306, 316, 338, 339, 340, 344, local statistics 343, 344, 471
349, 352 local variance 177
learning data 68 locally equivalent alternatives 263
Lee, J. 108 location errors 1011
Lee, L.-F. 265, 266, 268 locations 76
Lee, P. 323 actual and predicted 83
Lee, P.M. 214 Lodwick, W.A. 227
Lee, S. 268 logistic models 365
Leenders, R.T.A.I. 257 Long, L. 357
Legendre, P. 95, 100 long-term mobility 357
Leong, T. 317 Longley, P.A. 6, 25, 27, 36, 277, 402, 404, 405,
LeSage, J. 218, 267 430, 466
Lessof, C. 282, 285 Lorenz attractor 4067
Leszczyc, P. 432 Louis, T. 323, 325
Leung, Y. 225, 384, 391, 400, 402 Lovsz, L. 203
Lewis, T. 73, 177 Lucas, J.M. 3467
Li, D. 263 Lundberg, C.G. 225
Li, S. 69
Li, X. 409 MA see moving average (MA)
Lichstein, J.W. 91 MacEachren, A.M. 43, 44, 52, 56
Liew, A.W.C. 233 Machin, S. 269
LIFEMOD 285, 286 Mackay, D.S. 230, 234, 236
light scattering 11 MacKinnon, J.G. 263
likelihood function 82 MacMillan, R.A. 227
likelihood ratio 262 macros, programming 34
likelihood ratio test statistic 263 Madow, W.G. 190
limit models Maes, P. 410
correlation matrices 1501, 151 Magnus, J. 266
correlograms 1501, 151 Makarovic, B. 197, 203
weight matrices 14850 Maki, N. 445
Lin, J.-J. 232, 234 Malczewski, J. 475
Lin, X. 268 Mamdani-type inference 235
linear errors 11 Mandelbroit, Benoit 404
linear regression 2434, 256 Manly, B. 211
linear separability 52 Mantel, N. 344
Lipsey, R.G. 427 map comparison 2278
literacy, spatial 4723, 477 mapping, modifiable areal units problem
Liu, K. 404 (MAUP) 112
Liu, Z. 233 maps, hand drawn 229
Lloyd, C.D. 92, 159, 165, 173, 177 Marble, D. 28, 36
local Getis statistic 96 Marceau, D.J. 410
local indicators of spatial autocorrelation Mardia, K. 259
(LISA) 967, 471 mark connection functions 101
local interactions 411 Mark, D. 358, 404, 405
local-ness 1778 marked spatial point process, spatial clustering
local network auto K function 4545 72, 72
local network Voronoi cross K function 455, 456 market basket datasets 78
local Ord statistic 96 Markov Chain Monte Carlo method (MCMC)
local range parameter 177 306, 321, 324, 325, 3279
local sill 177 Gibbs updates 328
local spatial statistics 967 see also global spatial Metropolis and Metropolis-Hastings
statistics; spatial statistics algorithms 327
monitoring many 3512 Metropolis and Metropolis-Hastings
monitoring single 3501 updates 3278
INDEX 499
global network Voronoi cross K function 4567 Okabe, A. 117, 445, 446, 447, 448, 450, 452, 455,
local network auto K function 4545 457, 461
local network Voronoi cross K function 455 Okano, K. 450
network K function methods 4527 OKeefe, J. 484
network kernel method 45761 OKelly, M.E. 421, 4278, 431, 432
network Voronoi diagrams 44752 Olea, R.A. 196, 200, 203
ordinary network Voronoi diagram 4478 OLeary, E.S. 357
types of methods 462 Oliver, M.A. 159, 162, 164, 168, 192
uniform network transformation 4467 OLoughlin, J. 118
weighted network Voronoi diagrams 4501 Olwell, D.H. 346, 347
networks, uniform and nonuniform 447 Open GIS (OGIS) consortium 645
neural networks see also feedforward neural open source software 34
networks Openshaw, S. 30, 32, 105, 106, 108, 112, 115,
and Bayesian approaches 391 116, 118, 209, 219, 278, 290, 311, 312, 313,
limitations 391 398, 401, 402, 466
origin and use of term 372 optimal bandwidth selection 247
potential developments 391 Orcutt, G.H. 278, 283
neutral models 99 Ord, J.K. 8, 17, 21, 91, 92, 93, 96, 155, 219, 255,
Newell, J. 301, 311, 344, 358 261, 262, 266, 267, 307, 343, 471
Newey, W.K. 268 ordinary kriging (OK)
Newman, M.E.J. 50, 54 block 167
Nielsen, J. 46 derived map of precipitation 169
Nijkamp, P. 407, 412 predictions 165
Nikitenko, D. 229 punctual 167
Nocedal, J. 383 weights 165, 167
nomothesis 476 ordinary least squares (OLS) 17, 261, 265
non-linear dynamical systems 406 fitting models to semi-variograms 165
non-parametric statistics 211 ordinary network Voronoi diagram 4478
OSullivan, D. 408
non-spatial processes 14
outcomes, manipulating 475
non-spherical error covariance matrix 259
outliers 734
non-stationarity 2, 16874
dataset for detection 75
two-dimensional 202
output patterns, spatial data mining 6781
non-stationary mean 1703
overdispersion, in generalized linear
non-stationary mean parameter 160
modeling 1920
nonfuzzy approach, risk of ignoring
Overton, W.S. 183
information 227
nonuniform networks 446
p-value 210
normal distribution 220
Paas, G. 289
normality, assumed 93
Pace and Gilleys continuous version of nearest
Nuckols, J.R. 357
neighbors 1468
nugget effect 167, 193, 196, 331 correlation matrices 148
nuisance parameter approach 268 correlograms 1489, 149
null hypothesis 210 Pace, K. 1468
Nyberg, F. 357 Pace, R.K. 267
Paelinck, J. 255
Oberthur, T. 227, 228, 235 Paez, D. 228
object view 67 Page, E.S. 346, 350
observational data 6 Pang, M.Y.C. 408
Occams Razor 473 parallel coordinates plot 50
Odeh, I.O.A. 225, 227, 232 parallelism 399400
Oden, N.L. 94 parameters 812, 211, 226
Odland, J. 113 parametric significance testing, implications of
Office for National Statistics 29 spatial autocorrelation 979
502 INDEX
Pardo-Igzquiza, E. 165, 173, 174, 178, 203 point referenced data 321
Parker, D.C. 409, 411 modeling 331
partial-join based approach, spatial data point spread function 11
mining 80 points on a plane, randomly and non-randomly
participation index interest measure 84 distributed 446
Patil, G.P. 313 Poisson cusum 3467
Patil, P. 260 Poisson distribution 10, 19, 93
pattern recognition 42 Poisson-Poisson 93
patterns 43, 89, 299 Poisson processes, homogeneous 71
Pattie, C. 118 policy evaluation, area-based 2778
Peano curve 404 polygons, representing spatial data 810
Pearson, D.M. 97 polynomial trend models 170
Pearsons product moment correlation popularization, of spatial analysis 34
coefficient (r) 15 population, defining 20
Pebesma, E.J. 165 population inferences 201
Peitgen, H.-O. 405, 406 population microdata example 281
Plissier, R. 96 population size, and comparability 10
Penninga, F. 84 populations 2201
PENSIM 283 Posterior distribution for = 12. 216
pensions, estimation 2834 posterior predictive loss approach 330
Penttinen, A. 101 Powell, S. 306
Perle, E.D. 113, 116 Power, C. 228, 235
permutation 98 Pownall, C.E. 2789, 284
permutation test 21112 PPGIS 291
Pesaran, M.H. 260 Preece, J. 46
Peschel, J.M. 233 Prendergast, G. 423
Peterson, C. 413 Press, W.H. 383
Peterson, G.D. 92 prevalence measures 80
Pettitt, A.N. 193 Price, P.N. 10
Peuquet, D. 36, 470 principal coordinates of neighbor matrices
Pham, D.L. 233 (PCNM) 100
Phillips, J.D. 405, 407, 408 process inference 220
Phipps, M. 409 process models 218
Piachaud, D. 286 process scripts 4689
Piccioni, M. 203 processes, stochastic 20
pilot studies 95 progressive sampling 197
Pinkse, J. 269 properties, fundamental 78
Pipkin, J.S. 225 Propper, C. 282, 286
Pitts, W. 372 Prucha, I.R. 260, 262, 265, 268, 269
pixels 7, 9, 106 public awareness 34
place cells 484 Public Participation GIS (PPGIS) 291
Plaisant, C. 41, 57 Pyle, I. 404
planar kernel functions 457
planar spatial methods 4435 Q-statistics 355, 3589, 361, 3636, 370
Plante, M. 96 quadrats 71
Plog, S. 306 qualitative data 100
Plumlee, M. 50 Quattrochi, D.A. 117, 119
Pocock, S. 259 queen contiguity 12830
point data correlation matrices 133, 1356, 136
interpolation of surfaces 2267 correlogram 137
irregularly located. see irregularly located neighbors in 129, 136
point data subset of unstandardized weight
regular lattice 138 matrix 130
point process 65 Quenouille, M.H. 190
INDEX 503
Taylor, G.H. 84, 105, 112, 115, 118, 233 Tukey, J.W. 42
Taylor, M.F. 279 Turnbull, B.W. 311, 358
Taylor, P.J. 228, 472 Turnbulls test 358
team working 474 Turton, I. 413
temporal change, detection 344 two-sample t-test 210, 21213, 213
temporal surveillance type-2 fuzzy sets 230
average run length (ARL) 3457 type I errors 19, 210
cumulative sum (CUSUM) charts 3456 type II errors 210
cusums for exponential data 347
exponentially weighted moving average Ulam, S. 211
(EWMA) chart 3478 uncertainty 226, 4756
other methods 3478 unconstrained transitions, cellular automata
Poisson cusum 3467 (CA) 409
Shewhart charts 345 uncorrelated heterogeneity 337
Shiryaev-Roberts method 348 undercounting 11
Teng, C.H. 233 Ungerer, M.J. 25, 30, 33, 36, 470
Tesfatsion, L. 411 uniform network transformation 445, 4467
tessellation 406 uniform random sampling 185
test statistic 210 Unwin, A. 42
testing and interval estimation 323 Unwin, D.J. 19, 42, 50
tests, for spatial outliers 75 Upton, 255
tests of significance, effects of Urban, D.L. 100
autocorrelation 15 usability, geovisualization tools 467
Thill, J.-C. 227, 453, 475 user modifiable areal unit problem 30
Thisse, J.-F. 401 Uttal, D.H. 482
Thomas, G.S. 332
Thompson, S.K. 183, 197 validation, microsimulation outputs 293
Tibshirani, R. 387, 389 value estimation, parameters 812
Tiefelsdorf, M. 262 van Deursen, W.P.A. 469
time and dynamics 4701 Van Groenigen, J.W. 191, 192, 193, 194, 196,
time-space stationarity, cellular automata 200, 203
(CA) 409 variability, loss of detail 7
time, uni-directional flow 8 variables, estimating spatial structure 162
Tobler, W. 48, 49, 66, 116, 118, 119, 304, 408, variance-covariance matrix 112, 119, 130
422, 471 variance inflation factor 19
Toblers First Law 7, 66, 208, 304, 422, 471 variogram clouds 75, 76, 85
Toblers migration model 118 variograms, locality of 1778
Tobn, C. 57 Vatsavai, T.E.B. 72
Tomaszewski, B. 56 Verhulst, P.F. 361
Tomlin, C.D. 469 Verkuilen, J. 230
Tomlinson, R.F. 27 Verstraete, J. 230, 235
Torrens, P. 409, 410, 411 Vesanto, J. 53
Torres, R. 226 Virtual Decision-Making Environments
Townshend, J.R.G. 117 (VDME) 292
Train, K.E. 232 Visual Analytics 567
training data 68, 72, 234 visual data exploration 43, 46 see also
Tranmer, M. 119 geovisualization; visualization
transaction-based approaches, spatial data visual exploratory data analysis 41
mining 80 Visual Information Seeking Mantra 43
transformation, assignment by 2324 visualization 6 see also geovisualization; visual
TRANUS GIS module software 34 data exploration
travel mobility 357 description 42
trend surface model fit 19 of spatial data 1415
triangular irregular networks (TIN) 230 spatial relationships 85
510 INDEX