Spatial Statistics 6

Spatial Statistics
PM599, Fall 2013
Lecture 6: October 14, 2013
1 / 39
Areal Data
I Global indexes of spatial autocorrelation

I Local indexes of spatial autocorrelation (LISA)
I Spatial autoregressive models (CAR, SAR)
2 / 39
Areal Data
Global Indexes of Spatial Autocorrelation
I The goal of global indexes of spatial autocorrelation is to

summarize the degree to which similar observations tend to
occur near each other
I Global indexes are summaries over the entire study area, akin
to testing clustering rather than a test to detect individual
clusters
I Indexes share a common structure: calculate the similarity of
values at locations i and j then weight the similarity by the
proximity of locations i and j
I High similarities with high weight indicate similar values that
are close together; low similarities with high weight indicate
dissimilar values that are close together
3 / 39
Areal Data
Form of global spatial autocorrelation
I We want to summarize similarity between nearby areal units
I Spatial autocorrelation is the the correlation of the same
measurement taken at different areal units
I The similarity of values at locations Bi and Bj are weighted by
the proximity of i and j
I The weight wij defines proximity
I In general the extent of similarity is represented by the
weighted average of similarity between areal units: Global
indexes of spatial autocorrelation are built on this basic form:
n P
P n
wij simij
i=1 j=1
Pn P n
wij
i=1 j=1
4 / 39
Areal Data
Morans I
I Morans I (1950) follows the basic form for global indexes of
spatial autocorrelation with similarity between areal units i
and j defined as the product of the respective difference
between yi and yj with the overall mean
I Similarity simij = (yi y )(yj y )
Where y = ni=1 yi /n
P
I
I Divide the basic form by the sample variance to get the

Morans
P IPstatistic:
1 i j (yPi y )(yj y )
I I = s2
P
i j wij
(y y )2
P
I Where s 2 = i ni
5 / 39
Areal Data
Morans I
I Therefore, I is a random variable having a distribution defined
by the distributions of and interactions between the yi
I When neighbouring regions have similar values (pattern is
clustered), I will be positive
I When neighbouring regions have different values (pattern is
regular), I will be negative
I When there is no correlation between neighbouring values:
1
E (I ) = n1
I When n , E(I) 0
I I is asymptotically normally distributed where
1
I + n1
N(0, 1)
Var (I )
6 / 39
Areal Data
I Morans I is similar to Pearsons correlation but it is not

bounded on [-1,1] because of the spatial weights
I Null hypothesis: NO spatial association, i.e. yi iid
I Compare the z-score to a standard normal distribution
I The z-score that we compare to the standard normal is
I E (I ) 1
z= where E (I ) = n1 and V(I) is a little
Var (I )
complicated (shown later)
7 / 39
Areal Data
Gearys c
I Geary (1954) devides the contiguity ratio or Gearys c
I Similarity simij = (yi yj )2
I If regions i and j have similar values, simij wil be small
2
P P
i Pj w ij (yi yj )
I c= P n1 P
2 i (yi y )2 i j wij
I Like Morans it is a weighted average, but here it is scaled by
a measure of the overall variation around the mean, y
8 / 39
Areal Data
Gearys c
I c ranges from 0 to 2 with 0 indicating perfect positive spatial
correlation and 2 indicating perfect negative spatial correlation
I c is not a Pearson correlation (related to the Durbin-Watson
statistic)
I Low values of Gearys c denote positive autocorrelation and
high values indicate negative correlation
I Expected value, E (c) = 1 under spatial independence
9 / 39
Areal Data
Local Indexes of Spatial Autocorrelation
I Global measures (Morans I or Gearys c) are a single value
that apply to the entire study area
I The same pattern or process occurs over the entire geographic
area
I Global statistic suggests that there is clustering but does not
identify areas of particular clusters
I Global test is often used first to determine if there is evidence
of spatial association
I Want to detect local areas of similar values, need a local
statistic
I LISAs are decompositions of global indicators into the
contribution of each individual observation (i.e. Bi D)
I As a result the sum of LISAs is proportional to the equivalent
global indicator
I Local Morans I, Getis-Ord G*
10 / 39
Areal Data
I With LISAs, each observation gives an indication of the extent

of significant spatial clustering of similar values located
around that observation
I The locations around one particular observation is defined
as a neighbourhood and is formalized with the spatial
adjacency weights matrix, W
I Recall, W can be based on sharing a border (full or partial) or
distance
I Row standardization of W helps with interpretation of the
statistic
11 / 39
Areal Data
LISAs can be used to detect

I Clusters (areal units with similar neighbours): Local Morans I
I Hotspots (areal units with dissimilar neighbours): Getis-Ord
G*
12 / 39
Areal Data
Morans I vs Local Morans I

n P
P n
(yi y )(yj y )
1 i=1 j=1
I = 2 n Pn
s P
wij
i=1 j=1
n
(yi y )2
P
i
s2 =
n
n
yi y X (yj y )
Ii = wij
s s
j
I Ii is calculated for each areal unit Bi

13 / 39
Areal Data
Morans I
I Inference is performed on I under the randomization
assumption or by Monte Carlo tests
I Null hypothesis is always that there is no spatial association
I We have a normality assumption of spatial independence such
that all observations follow iid gaussian distribution
I Random permutations of the null distribution are computed
14 / 39
Areal Data
Morans I
I Randomization means the observations are assigned at
random in the Bi areal units
I Test statistic is z= (observed-expected)/s.d expected
1
I E (I ) = (n1) under null hypothesis of no autocorrelation
I V (I ) is more complicated and is dependent upon the weight
matrix. The normal approximation of the variance under
randomization is (Cliff and Ord, 1981):
ns1 s2 s3
V (I ) =
(n 1)(n 2)(n 3)( i j wij )2
P P
XX X X X XX
s1 = (n2 3n +3)(0.5 (wij + wji )2 )n( ( wij + wji )2 )+3( wij )2
i j i j j i j
n1 i (yi y )4
P
s2 = 1 P 2 2
(n i (yi y ) )
XX XX XX
s3 = 0.5 (wij + wji )2 2n(0.5 (wij + wji )2 ) + 6( wij )2
i j i j i j
15 / 39
Areal Data
Morans I
I Monte Carlo approach repeats randomization of the
observations into the areal units a large number of times (e.g.
Nsim 999)
I For each randomization the Morans I statistic is calculated
I Compare the observed Morans I to the random set
I If the actual I falls at the 5th/95th percentile (or
smaller/greater) then it is significant at = 0.05
16 / 39
Areal Data
I R output for global Morans I under randomization

I moran.test(sids79.rate,sids.kn1.w)
Morans I test under randomisation
data: sids79.rate
weights: sids.kn1.w
Moran I standard deviate = 2.56, p-val= 0.0052
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.30032839 -0.01010101 0.01468478
I The null hypothesis of no spatial correlation is rejected.
17 / 39
Areal Data
I R output for monte carlo estimate of global Morans I

I moran.mc(sids79.rate,sids.kn1.w,nsim=999)
Monte-Carlo simulation of Morans I
data: sids79.rate
weights: sids.kn1.w
number of simulations + 1: 1000
statistic = 0.3003, observed rank = 987, p-value
= 0.013
alternative hypothesis: greater
I The null hypothesis of no spatial correlation is rejected.
18 / 39
Areal Data
Permutation Test for Moran's I 999 permutations
40
30
Frequency
20
10
0
0.4 0.2 0.0 0.2 0.4
moranSIDS.mc$res
19 / 39
Areal Data
Issues with spatial autocorrelation tests:

I they assume that the mean trend has been removed, for
example elevation effect when examining spatial pattern in
precipitation
I this assumption is in centering the mean yi y as its
equivalent to saying the correct model has constant mean and
the spatial pattern is represented in the spatial weights
I removing trend may be impossible if you dont have covariate
data
I spatial weights may be misspecified for testing
autocorrelation, for instance too few neighbour weights when
spatial pattern is based on larger distances or vice versa
I they require min(N) 20 to provide asymptotically accurate
results
20 / 39
Areal Data
I We can look at correlograms of the Morans I statistic to

determine appropriate number of neighbours or distance
I Calculate I based on knn for a range of k (e.g. 1,...,8) or
number of borders shared (e.g. queen, rook)
I Calculate I based on dij for a range of distances
21 / 39
Areal Data
Moran's I for SIDS rate Correlogram, Neighbour Lags
0.2
0.1
Moran's I
0.0
0.1
0.2
1 2 3 4 5 6 7 8
lags
22 / 39
Areal Data
Moran's I for SIDS rate Correlogram, Distance Lags
0.10

0.05

Moran I statistic
0.00

0.05

0.10
0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5
distance classes
23 / 39
Areal Data
I Global tests of spatial autocorrelation can be broken down

into components
I We can construct local tests that identify clusters
I One preliminary step is to construct a Morans I scatterplot,
where variable of interest (e.g. SIDS rate) is on the x-axis and
their spatially lagged values on the y-axis
24 / 39
Areal Data
Moran Scatterplot
0.005 Robeson
Stanly
0.004
Richmond

spatially lagged sids79.rate

0.003

Scotland

Columbus

Montgomery

0.002

Camden

Franklin

0.001

Alleghany

0.000
0.000 0.001 0.002 0.003 0.004 0.005 0.006
sids79.rate
25 / 39
Areal Data
I This plot can be divided into quadrants of low-low, low-high,

high-low and high-high
I The global Morans I is the slope of this plot lm(wx x) where
wx is the spatially lagged value of the values on the x-axis
(e.g. SIDS rates)
I Can also detect outliers by testing for influence measures
26 / 39
Areal Data
Alleghany
Ashe Surry Northampton Gates
Stokes Rockingham Caswell Person Vance Warren
Camden
Currituck
Hertford
Granville Pasquotank
Watauga Halifax
Wilkes Perquimans
Yadkin Forsyth Chowan
Avery Guilford Alamance
Orange Franklin Bertie
Mitchell Durham
CaldwellAlexander Davie Nash
Yancey Edgecombe
Madison Martin
WashingtonTyrrell
Iredell Davidson Wake
Burke Dare
McDowell Randolph Chatham Wilson
Catawba Rowan
Buncombe Pitt Beaufort
Haywood Hyde
Swain Lincoln Johnston Greene
Lee
Rutherford Cabarrus Harnett Wayne
Graham Henderson Cleveland Gaston Montgomery Moore
Stanly
Jackson Polk Mecklenburg
Transylvania Lenoir Craven
Cherokee Macon Pamlico
Clay
Union Anson Richmond Hoke Cumberland Sampson Jones
Duplin
Scotland Carteret
Onslow
Robeson Bladen
Pender
Columbus New Hanover
Brunswick
None HL LH HH
27 / 39
Areal Data
Local Morans I
n
yi y X (yj y )
Ii = wij
s s
j
I Have a value of I for each Bi

I Sometimes the Ii are mapped to indicate units with high
values indicating stronger local autocorrelation
I More often, z-score and significance of z-score is plotted
I As before, test statistics are generated under randomization
I Since in a local setting, there are multiple comparisons being
made (neighbours sharing observations when calculating II )
we need a Bonferroni adjustment
28 / 39
Areal Data
Local Morans I
n
yi y X (yj y )
Ii = wij
s s
j
I Have a value of I for each Bi

I Sometimes the Ii are mapped to indicate units with high
values indicating stronger local autocorrelation
I More often, z-score and significance of z-score is plotted
I As before, test statistics are generated under randomization
I Since in a local setting, there are multiple comparisons being
made (neighbours sharing observations when calculating II )
we need a Bonferroni adjustment
29 / 39
Areal Data
Local Moran's I (|z| scores)
30 / 39
Areal Data
Statistically significant Local Morans I as blue dots
31 / 39
Areal Data
Getis-Ord G vs Local Getis-Ord G

n P
P n
wij yi yj
i=1 j=1
G= n P n
P
yi yj
i=1 j=1
n
P
wij yj
j=1
Gi = n
P
yj
j=1
32 / 39
Areal Data
Getis-Ord G
I Again, we compute the z-score for spatially randomized G to
determine if it is significantly different than our observed
I z=observed-expected/s.d. observed
P P
i j wij
I E (G ) = n(n1)
I V(G) is complicated
I the sign of the z-score is important; positive z means high
values cluster together, negative values means low values
cluster together
I the p-value must be computed to determine significance of G
33 / 39
Areal Data
Local Getis-Ord G
I Similar to local Morans Ii , Gi is calculated for each areal unit
I A group of areal units with high Gi indicates a hotspot
where as low Gi means a coldspot
34 / 39
GetisOrd G statistic for SIDS rates
Areal Data
Getis-Ord G* for SIDS79 rates
0.1 0.6
1.3 1.2 0.7 1.5 0.2 3
0.2 0.2 0.1 0.1 0.1 0.9 3
0.1 0.1
1.7 0.7 0.8
0.3 0.3 1.7
0.1 0.9 0.6 0.1 0.6 0.1 1.2
0.7 0.6
0.1 1.2 0.8 1.2 1.5
0.2 0.7 0.9 0.8 1.7
0.1 0.7 0.1 1.7
0.2 0.4 0.2 0 0.6
0.1 1.2 1.1 1.7
1.1 0.6 1.7
0.5 0.8 0.7 1.1
1.7 1.7 0.6 0 1.1
1 0.2 0.4 0.4 1.2 0.3 0
0.2 1.2 1.1 0.9
1.1 0.1
0 0.7
0 3.4 3.4 0 0.3
0.7 0.2 0.3 0.1
1.3 0.4
0.9
1.2 1.7
0.6
0.4 0.1
0.6
1.7247 0.7055 0.0968 0.1705 0.9052 3.4011

35 / 39
Areal Data
Issues with spatial autocorrelation tests:

I They assume that the mean trend has been removed, for
example household income effect when examining spatial
pattern in SIDS rate
I One solution is to run a linear model and then test for spatial
association on residuals
I Use autoregressive models
36 / 39
Areal Data
Simultaneous Autoregressive models

I Similar to universal kriging or regression kriging
I Use regression on values from neighbouring areal units to
account for spatial dependence
I Autocorrelation reflects self regression where you use
observations of the outcome at other locations as additional
covariates in the model
I Y (s) MVN(X , )
I In universal kriging we modeled as a parametric function of
distance
I In areal modeling, we restrict distances to those between our
areal units
37 / 39
Areal Data

I We represent
P as our residual errors
(si ) = j bij (sj ) + (si ) and apply spatial correlation to
these residuals
X
Y (si ) = x(si ) + bij (sj ) + (si )
j
X
Y (si ) = x(si ) + bij [Y (sj ) x(sj )] + (si )
j
The
P degree of spatial dependence is through the term
j bij [Y (sj ) x(sj )]
38 / 39
Areal Data

I SAR models are often represented in matrix form
I From the equation on the previous slide,
Y = X T + B(Y X T ) +
39 / 39

Spatial Statistics 6

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Spatial Statistics 6

Uploaded by

Copyright:

Available Formats

Spatial Statistics

PM599, Fall 2013

Lecture 6: October 14, 2013

I Global indexes of spatial autocorrelation

I The goal of global indexes of spatial autocorrelation is to

I Divide the basic form by the sample variance to get the

I Morans I is similar to Pearsons correlation but it is not

I With LISAs, each observation gives an indication of the extent

LISAs can be used to detect

Morans I vs Local Morans I

I Ii is calculated for each areal unit Bi

I R output for global Morans I under randomization

I R output for monte carlo estimate of global Morans I

Permutation Test for Moran's I 999 permutations

0.4 0.2 0.0 0.2 0.4

Issues with spatial autocorrelation tests:

I We can look at correlograms of the Morans I statistic to

Moran's I for SIDS rate Correlogram, Neighbour Lags

Moran's I for SIDS rate Correlogram, Distance Lags

0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5

I Global tests of spatial autocorrelation can be broken down

0.000 0.001 0.002 0.003 0.004 0.005 0.006

I This plot can be divided into quadrants of low-low, low-high,

Columbus New Hanover

I Have a value of I for each Bi

I Have a value of I for each Bi

Local Moran's I (|z| scores)

Getis-Ord G vs Local Getis-Ord G

1.7247 0.7055 0.0968 0.1705 0.9052 3.4011

Issues with spatial autocorrelation tests:

Simultaneous Autoregressive models

Simultaneous Autoregressive models

Simultaneous Autoregressive models

You might also like