Professional Documents
Culture Documents
1 / 39
Areal Data
2 / 39
Areal Data
Global Indexes of Spatial Autocorrelation
3 / 39
Areal Data
Global Indexes of Spatial Autocorrelation
Form of global spatial autocorrelation
I We want to summarize similarity between nearby areal units
I Spatial autocorrelation is the the correlation of the same
measurement taken at different areal units
I The similarity of values at locations Bi and Bj are weighted by
the proximity of i and j
I The weight wij defines proximity
I In general the extent of similarity is represented by the
weighted average of similarity between areal units: Global
indexes of spatial autocorrelation are built on this basic form:
n P
P n
wij simij
i=1 j=1
Pn P n
wij
i=1 j=1
4 / 39
Areal Data
Global Indexes of Spatial Autocorrelation
Morans I
I Morans I (1950) follows the basic form for global indexes of
spatial autocorrelation with similarity between areal units i
and j defined as the product of the respective difference
between yi and yj with the overall mean
I Similarity simij = (yi y )(yj y )
Where y = ni=1 yi /n
P
I
5 / 39
Areal Data
Global Indexes of Spatial Autocorrelation
Morans I
I Therefore, I is a random variable having a distribution defined
by the distributions of and interactions between the yi
I When neighbouring regions have similar values (pattern is
clustered), I will be positive
I When neighbouring regions have different values (pattern is
regular), I will be negative
I When there is no correlation between neighbouring values:
1
E (I ) = n1
I When n , E(I) 0
I I is asymptotically normally distributed where
1
I + n1
N(0, 1)
Var (I )
6 / 39
Areal Data
Global Indexes of Spatial Autocorrelation
7 / 39
Areal Data
Global Indexes of Spatial Autocorrelation
Gearys c
I Geary (1954) devides the contiguity ratio or Gearys c
I Similarity simij = (yi yj )2
I If regions i and j have similar values, simij wil be small
2
P P
i Pj w ij (yi yj )
I c= P n1 P
2 i (yi y )2 i j wij
I Like Morans it is a weighted average, but here it is scaled by
a measure of the overall variation around the mean, y
8 / 39
Areal Data
Global Indexes of Spatial Autocorrelation
Gearys c
I c ranges from 0 to 2 with 0 indicating perfect positive spatial
correlation and 2 indicating perfect negative spatial correlation
I c is not a Pearson correlation (related to the Durbin-Watson
statistic)
I Low values of Gearys c denote positive autocorrelation and
high values indicate negative correlation
I Expected value, E (c) = 1 under spatial independence
9 / 39
Areal Data
Local Indexes of Spatial Autocorrelation
I Global measures (Morans I or Gearys c) are a single value
that apply to the entire study area
I The same pattern or process occurs over the entire geographic
area
I Global statistic suggests that there is clustering but does not
identify areas of particular clusters
I Global test is often used first to determine if there is evidence
of spatial association
I Want to detect local areas of similar values, need a local
statistic
I LISAs are decompositions of global indicators into the
contribution of each individual observation (i.e. Bi D)
I As a result the sum of LISAs is proportional to the equivalent
global indicator
I Local Morans I, Getis-Ord G*
10 / 39
Areal Data
Local Indexes of Spatial Autocorrelation
11 / 39
Areal Data
Local Indexes of Spatial Autocorrelation
12 / 39
Areal Data
Local Indexes of Spatial Autocorrelation
n
(yi y )2
P
i
s2 =
n
n
yi y X (yj y )
Ii = wij
s s
j
Morans I
I Inference is performed on I under the randomization
assumption or by Monte Carlo tests
I Null hypothesis is always that there is no spatial association
I We have a normality assumption of spatial independence such
that all observations follow iid gaussian distribution
I Random permutations of the null distribution are computed
14 / 39
Areal Data
Morans I
I Randomization means the observations are assigned at
random in the Bi areal units
I Test statistic is z= (observed-expected)/s.d expected
1
I E (I ) = (n1) under null hypothesis of no autocorrelation
I V (I ) is more complicated and is dependent upon the weight
matrix. The normal approximation of the variance under
randomization is (Cliff and Ord, 1981):
ns1 s2 s3
V (I ) =
(n 1)(n 2)(n 3)( i j wij )2
P P
XX X X X XX
s1 = (n2 3n +3)(0.5 (wij + wji )2 )n( ( wij + wji )2 )+3( wij )2
i j i j j i j
n1 i (yi y )4
P
s2 = 1 P 2 2
(n i (yi y ) )
XX XX XX
s3 = 0.5 (wij + wji )2 2n(0.5 (wij + wji )2 ) + 6( wij )2
i j i j i j
15 / 39
Areal Data
Morans I
I Monte Carlo approach repeats randomization of the
observations into the areal units a large number of times (e.g.
Nsim 999)
I For each randomization the Morans I statistic is calculated
I Compare the observed Morans I to the random set
I If the actual I falls at the 5th/95th percentile (or
smaller/greater) then it is significant at = 0.05
16 / 39
Areal Data
17 / 39
Areal Data
18 / 39
Areal Data
40
30
Frequency
20
10
0
moranSIDS.mc$res
19 / 39
Areal Data
20 / 39
Areal Data
21 / 39
Areal Data
0.2
0.1
Moran's I
0.0
0.1
0.2
1 2 3 4 5 6 7 8
lags
22 / 39
Areal Data
0.10
0.05
Moran I statistic
0.00
0.05
0.10
distance classes
23 / 39
Areal Data
24 / 39
Areal Data
Moran Scatterplot
0.005 Robeson
Stanly
0.004
Richmond
spatially lagged sids79.rate
0.003
Scotland
Columbus
Montgomery
0.002
Camden
Franklin
0.001
Alleghany
0.000
sids79.rate
25 / 39
Areal Data
26 / 39
Areal Data
Alleghany
Ashe Surry Northampton Gates
Stokes Rockingham Caswell Person Vance Warren
Camden
Currituck
Hertford
Granville Pasquotank
Watauga Halifax
Wilkes Perquimans
Yadkin Forsyth Chowan
Avery Guilford Alamance
Orange Franklin Bertie
Mitchell Durham
CaldwellAlexander Davie Nash
Yancey Edgecombe
Madison Martin
WashingtonTyrrell
Iredell Davidson Wake
Burke Dare
McDowell Randolph Chatham Wilson
Catawba Rowan
Buncombe Pitt Beaufort
Haywood Hyde
Swain Lincoln Johnston Greene
Lee
Rutherford Cabarrus Harnett Wayne
Graham Henderson Cleveland Gaston Montgomery Moore
Stanly
Jackson Polk Mecklenburg
Transylvania Lenoir Craven
Cherokee Macon Pamlico
Clay
Union Anson Richmond Hoke Cumberland Sampson Jones
Duplin
Scotland Carteret
Onslow
Robeson Bladen
Pender
Brunswick
None HL LH HH
27 / 39
Areal Data
Local Morans I
n
yi y X (yj y )
Ii = wij
s s
j
28 / 39
Areal Data
Local Morans I
n
yi y X (yj y )
Ii = wij
s s
j
29 / 39
Areal Data
30 / 39
Areal Data
Statistically significant Local Morans I as blue dots
31 / 39
Areal Data
n
P
wij yj
j=1
Gi = n
P
yj
j=1
32 / 39
Areal Data
Getis-Ord G
I Again, we compute the z-score for spatially randomized G to
determine if it is significantly different than our observed
I z=observed-expected/s.d. observed
P P
i j wij
I E (G ) = n(n1)
I V(G) is complicated
I the sign of the z-score is important; positive z means high
values cluster together, negative values means low values
cluster together
I the p-value must be computed to determine significance of G
33 / 39
Areal Data
Local Getis-Ord G
I Similar to local Morans Ii , Gi is calculated for each areal unit
I A group of areal units with high Gi indicates a hotspot
where as low Gi means a coldspot
34 / 39
GetisOrd G statistic for SIDS rates
Areal Data
Getis-Ord G* for SIDS79 rates
0.1 0.6
1.3 1.2 0.7 1.5 0.2 3
0.2 0.2 0.1 0.1 0.1 0.9 3
0.1 0.1
1.7 0.7 0.8
0.3 0.3 1.7
0.1 0.9 0.6 0.1 0.6 0.1 1.2
0.7 0.6
0.1 1.2 0.8 1.2 1.5
0.2 0.7 0.9 0.8 1.7
0.1 0.7 0.1 1.7
0.2 0.4 0.2 0 0.6
0.1 1.2 1.1 1.7
1.1 0.6 1.7
0.5 0.8 0.7 1.1
1.7 1.7 0.6 0 1.1
1 0.2 0.4 0.4 1.2 0.3 0
0.2 1.2 1.1 0.9
1.1 0.1
0 0.7
0 3.4 3.4 0 0.3
0.7 0.2 0.3 0.1
1.3 0.4
0.9
1.2 1.7
0.6
0.4 0.1
0.6
36 / 39
Areal Data
37 / 39
Areal Data
The
P degree of spatial dependence is through the term
j bij [Y (sj ) x(sj )]
38 / 39
Areal Data
Y = X T + B(Y X T ) +
39 / 39