Professional Documents
Culture Documents
?@; >
;A
EDC B
FHG>
>
F >I
LJKI
; ( &'
)
*+
M
-.,
01/
432
*+
+5
!
6
43 7 "
938
M #
N
N "
PON
$
QR %
O =;
SF
O T
?U
Introduction to Geostatistics
V
XYW
Z
Centre de Géostatistique - EMP
Founded 1968 by: Georges M ATHERON
XYW
V
CG: Main application fields
Also:
Bioinformatics group (SVM, kernel methods)
XYW
V
Geostatistics worldwide
Other groups:
Stanford (petroleum), Trondheim (petroleum), Calgary
(mining, petroleum), Brisbane (mining), Johannesburg
(mining), Valencia (hydrogeology),. . .
Main meetings:
International Geostatistics Conference:
1st in Rome (1975), . . . , 7th in Banff (2004)
−→ 2008: Santiago de Chile
geoENV (european geostatistics conference for
environmental applications):
1st in Lisbon (1996),. . . , 5th in Neuchatel (2004)
−→ 2006: Greece
Software:
R (www.r-project.org), −→ www.ai-geostats.org
XYW
V
Geostatistics
definition
XYW
V
Geostatistics
is an application of
the Theory of Regionalized Variables
(usually considered as realizations of Random Functions)
XYW
V
Concepts
XYW
V
Basic Statistics
concepts
XYW
V
Center of mass
Seven weights w are hanging on a bar whose own weight is
negligible:
center of mass
5 6 7 8
elementary weight v
weight w
XYW
V
Center of mass
The weights w are suspended at points:
z = 5, 5.5, 6, 6.5, 7, 7.5, 8,
w(z) = 3, 4, 6, 3, 4, 4, 2.
Z
XYW
V
Center of mass
Defining normed weights:
w(zk )
p(zk ) =
P
w(zk )
k
P
with p(zk ) = 1, we can write:
k
7
X
z = zk p(zk )
k=1
ZZ
XYW
V
Center of mass
The weights w(zk ) are subdivided into n elementary weights vα :
center of mass
5 6 7 8
elementary weight v
weight w
Z
XYW
V
Center of mass
The average squared distance to the center of mass
n
2 1X
dist (z) = (zα − z)2 = .83
n α=1
Z
XYW
V
Histogram
5 6 7 8
mean m*
Z
XYW
V
Histogram
The average squared deviation from the mean is the variance
n
1 X
s2 = (zα − m? )2
n α=1
Z
XYW
V
Cumulative histogram
An alternate way to represent the frequencies of the values z is
to cumulate them from left to right:
CUMULATIVE
FREQUENCY
Z
1 2 3 4 5 6 7 8
Z
XYW
V
Probability distribution
Suppose we draw randomly values z from a set of values Z.
F (z) = P (Z < z)
Z
XYW
V
Probability density
We shall only consider differentiable distribution functions.
F (dz) = p(z) dz
Properties:
0 ≤ p(z) ≤ 1
Z
p(z) dz = 1
Z
XYW
V
Expected value
The idealization of the concept of mean value is the
mathematical expectation:
Z
E Z = z p(z) dz = m.
z∈R
so that
E a + bZ = a + bE Z
Z
XYW
V
Variance
The second moment of the random variable Z is:
Z
2
E Z = z 2 p(z) dz
z∈R
XYW
V
Covariance
Covariance σij between Zi and Zj :
cov(Zi , Zj ) = E Zi − E Zi · Zj − E Zj
= E (Zi − mi ) · (Zj − mj ) = σij
Correlation coefficient:
σij
ρij = q
σi2 σj2
Z
XYW
V
Linear regression
V
YXW
Regression line
z1
z *1 = a z 2 + b
●
●
●
● ●
● ●●
● ●
● ●
●
● ●
● ●●
● ●●● ● ● ●
● ●● ●● ●
● ●
● ●●●
● ● ●●● ● ● ●
● ●
● ●
● ●
● ●
●●●
● ●●
●
●●● ● ●● ●
● ●●●●● ●
●
● ●● ● ●
m*1 ● ●●●
● ● ●● ● ●●
● ● ● ●
●
●
● ● ● ●● ●
●● ● ● ●
● ●
●
●●● ●●● ● ●●●●●●
●
●● ●● ●● ● ●
●
●●●● ●
● ● ● ● ●●
● ●●● ●●
● ● ●
● ● ●●●
● ●
● ●●● ●● ● ●● ●
● ● ●● ●
● ● ●●●●
● ●● ●● ●
● ● ● ●
● ●●
●●
m*2 z2
YXW
V
Optimal regression line
Two variables with experimental covariance:
n
1X α
s12 = (z1 − m?1 ) · (z2α − m?2 )
n α=1
we get
s12
a = 2 b = m?1 − a m?2
s2
YXW
V
Optimal regression line
s12
z1? = (z 2 − m ?
2 ) + m ?
1
s22
s1
= m1 + r12 (z2 − m?2 )
?
s2
YXW
V
W
Multiple linear regression
V
YXW
Multivariate data set
The data matrix Z with n samples of N variables:
V ariables
z11 . . . z1i . . . z1N
.. .. ..
Samples . . .
zα1 . . . zαi . . . zαN
.. .. ..
. . .
zn1 . . . zni . . . znN
YXW
V
Matrix of means
Define a matrix M with the same dimension n × N as Z,
replicating n times in its columns the mean value of each
variable:
m?1 ... m?i ... m?N
.. .. ..
. . .
? ? ?
M = m . . . m . . . m
1 i N
.. .. ..
. . .
m?1 . . . m?i . . . m?N
YXW
V
Centered variables
A matrix Zc of centered variables is obtained by subtracting M
from the raw data matrix:
Zc = Z − M
YXW
V
Variance-covariance matrix
The matrix V of experimental variances and covariances is:
var(z1 ) . . . cov(z1 , zj ) . . . cov(z1 , zN )
.. ... ..
. .
1 >
V = Zc Zc = cov(zi , z1 ) . . . var(zi ) . . . cov(zi , zN )
n
.. . ..
. . . .
cov(zN , z1 ) . . . cov(zN , zj ) . . . var(zN )
s11 . . . s1j . . . s1N
.. .. ..
. . .
= si1
. . . sii . . . siN
.. ... ..
. .
sN 1 . . . sN j . . . sN N
YXW
V
Multiple linear regression
For a regression of z0 on the N variables from n samples
we have the matrix equation
z?0 = m0 + (Z − M) a
2 1
dist (a) = (z0 − z?0 )> (z0 − z?0 )
n
= var(z0 ) + a> Va − 2 a> v0 ,
Z
YXW
V
Minimizing the squared distance
The minimum is found for:
∂ dist2 (a)
=0 ⇐⇒ 2 Va − 2 v0 = 0 ⇐⇒ Va = v0
∂a
This system of linear equations:
var(z1 ) . . . cov(z1 , zN ) a1 cov(z0 , z1 )
.. . . .. ..
=
..
. . . . .
cov(zN , z1 ) . . . var(zN ) aN cov(z0 , zN )
YXW
V
Simple kriging
W
V
YXW
Spatial data
Data points xα and the estimation point x0 in a spatial domain D
●
●
●
D
x0
❍
● ●
xα
●
●
●
● ●
● ●
● ●
●
●
●
W
YXW
V
Translation invariance
The expectation and the covariance are both assumed
translation invariant over the domain,
i.e. for any vector h between points x and x+h:
E Z(x+h) = E Z(x) = m
cov Z(x+h), Z(x) = C(h)
The expectation E Z(x) has the same value m
at any point x of the domain D.
The covariance between any pair of locations
depends only on the vector h.
W
YXW
V
Known mean
We assume the mean m is known
and build the estimator:
n
X
Z ? (x0 ) = m + wα Z(xα ) − m
α=1
n
X
i.e. Z ? (x0 ) − m = wα Z(xα ) − m
α=1
W
YXW
V
Simple kriging equations
The kriging equations with known mean are simple:
n
X
wβSK C(xα −xβ ) = C(xα −x0 ) for α= 1, . . . , n
β=1
i.e.
the linear combination of weights with
the covariances between a data point
and the other data points
=
the covariance between that data point
and the point to estimate.
W
YXW
V
Simple kriging: a multiple linear regression
Simple kriging is a multiple linear regression between spatial
random variables.
Like: Va = v0 , we have: Cw = c0
cov(Z(x0 ), cov(Z(x1 ))
= .
.
.
cov(Z(x0 ), Z(xN ))
W
YXW
V
and random function
Regionalized variables
V
YXW
The concept of a Random Function
Consider a domain D with points x:
x
●
YXW
V
Regionalized Variable
The regionalized variable z(x) is the spatial variable of interest
(“reality”).
Z
YXW
V
Epistemological Problem
−→ G “Estimating and Choosing”
YXW
V
Variogram
definition
YXW
V
The Variogram
x1
The vector x = : coordinates of a point in 2D.
x2
Let h be the vector separating two points:
xβ ●
h
●
xα
YXW
V
The Variogram Cloud
Variogram values are plotted against distance in space:
2
| z(t+h) - z(t) |
2
●
● ●
●●
● ●
● ● ●●
●● ●●
● ● ● ● ●
● ●
● ● ● ●
●
● ● ● ●● ● ● ● ●
● ●● ●
●●
● ● ● ●
● ● ●
●
● ●
●
● ● ●
●
●●● ●
● ●
●●
●●● ●
●
●●
● ●●
|h|
YXW
V
The Experimental Variogram
Averages within distance (and angle) classes k are computed:
γ∗(h )
k
●
● ●
●●
● ●
● ●●
●● ●● ●
● ● ● ● ●
● ●
● ● ●
●
● ● ● ●● ● ● ● ● ●
● ●● ●
●●
● ● ● ●
● ● ●
●
● ●
●
● ● ●
●
●●● ●
● ●
●●
●●● ●
●
●●
● ●●
|h|
h1 h2 h3 h4 h5 h6 h7 h8 h9
YXW
V
The Theoretical Variogram
A theoretical model is fitted:
γ (h)
|h|
YXW
V
Intrinsic Hypothesis
The first two moments of the increments are assumed stationary
(translation-invariant):
YXW
V
Definition of the Variogram
By the intrinsic hypothesis:
1 h 2 i
γ(h) = E Z(x+h) − Z(x)
2
Properties
- zero at the origin γ(0) = 0
- positive values γ(h) ≥ 0
- even function γ(h) = γ(−h)
←→ not differentiable
←→ discontinuous
YXW
V
Variogram and Covariance Function
The covariance function is defined as:
h i
C(h) = E Z(x) − m · Z(x+h) − m
YXW
V
Variogram
examples
Z
YXW
V
Power variogram
Power model
5
4
p=1.5
p=1
VARIOGRAM
2 3
p=0.5
1
0
-4 -2 0 2 4
DISTANCE
YXW
V
Spherical covariance function
3
3 |h| 1 |h|
C(h) = − |h|≤a
2 a 2 a3
YXW
V
Exponential covariance function
|h|
C(h) = exp −
a
YXW
V
Gaussian covariance function
2
|h|
C(h) = exp − 2
a
YXW
V
Cardinal sine covariance function
|h|
sin a
C(h) = |h|
a
YXW
V
Geometric anisotropy
of the variogram
XYW
V
Geometric anisotropy
In practice the range of the variogram may change depending
on the direction:
h2
h’2 h’1
h1
Correction:
0 cos θ sin θ
rotation h = Qh of angle θ where Q =
− sin θ cos θ
linear transformation of the coordinates h0 = (h01 , h02 )
XYW
V
Rotation in 3D
In 3D the rotation is obtained by a composition of elementary
rotations:
cos θ3 sin θ3 0 1 0 0 cos θ1 sin θ1 0
Q = − sin θ3 cos θ3 0 cos θ2 sin θ2 0 − sin θ1 cos θ1 0
0 0 1 − sin θ2 cos θ2 0 0 0 1
XYW
V
2D example: Ebro river vertical section
0.
-1.
Depth (Meter)
-2.
-3.
-4.
-5.
-6.
-10. -5. 0.
Ebro river (Kilometer)
XYW
V
2D conductivity variogram model
1250.
D2
M2
Variogram : CONDUCTIVITY 1000.
750.
D1
M1
500.
250.
0.
0. 1. 2. 3. 4. 5. 6.
Distance (D1: km; D2: m)
Z
XYW
V
Behavior at the origin
of the variogram
XYW
V
Ebro river: water samples
0.
-1.
Depth (m)
-2.
-3.
-4.
-5.
-6.
0.
-1.
Depth (m)
-2.
-3.
-4.
-5.
-6.
XYW
V
Nitrate variogram: which behavior at origin?
3000. 3000.
CUBIC EXPONENTIAL
Variogram: NITRATE
Variogram: NITRATE
D1 D1
2000. 2000.
M2
D2 D2
M2
1000. 1000.
M1
M1
0. 0.
0. 1. 2. 3. 0. 1. 2. 3.
Lag (RED: m; BLACK: km) Lag (RED: m; BLACK: km)
XYW
V
Cubic variogram: conditional simulations
0.
Depth (m)
-1.
-2.
-3.
-4.
>=124.8
0. 117
109.2
101.4
Depth (m)
-1. 93.6
85.8
78
-2. 70.2
62.4
54.6
46.8
-3. 39
31.2
23.4
-4. 15.6
7.8
<0
-15.0 -12.5 -10.0 -7.5 -5.0 -2.5 M
0.
Depth (m)
-1.
-2.
-3.
-4.
XYW
V
Exponential model: conditional simulations
0.
Depth (m)
-1.
-2.
-3.
-4.
>=124.8
0. 117
109.2
101.4
Depth (m)
-1. 93.6
85.8
78
-2. 70.2
62.4
54.6
46.8
-3. 39
31.2
23.4
-4. 15.6
7.8
<0
-15.0 -12.5 -10.0 -7.5 -5.0 -2.5 M
0.
Depth (m)
-1.
-2.
-3.
-4.
XYW
V
Kriging of the mean
of a random function
YXW
V
Spatially Correlated Data
Sample locations xα in a geographical domain:
●
●
●
● ●
●
●
●
● ●
● ●
● ●
●
●
●
YXW
V
Estimation of the Mean Value
Using the formula of the arithmetic mean:
n
1 X
M? = Z(xα )
n α=1
1
all samples get the same weight:
n
YXW
V
Stationary random function
We assume translation-invariance of mean and covariance:
h i
∀ x ∈ D : E Z(x) = m; ∀ xα , xβ ∈ D : C(xα , xβ ) = C(xα −xβ ).
The estimation error in our statistical model:
?
| M
{z } − | {z
m }
estimated value true value
YXW
V
No bias
No bias is obtained using weights of unit sum:
n
X
wα = 1
α=1
Consider:
h i n
hX i
E M? − m = E wα Z(xα ) − m
α=1
n
X h i
= wα E Z(xα ) −m
α=1 | {z }
m
n
X
= m wα −m = 0
|α=1{z }
Z
YXW
1
V
Variance of the estimation error
The variance σE2 of the estimation error is:
h i h i2
var(M ? − m) = E (M ? − m)2 − E M? − m
| {z }
0
h i
= E M ? 2 − 2 mM ? + m2
X n
n X h i
= wα wβ E Z(xα ) Z(xβ )
α=1 β=1
n
X h i
−2 m wα E Z(xα ) +m2
α=1 | {z }
m
n X
X n
⇒ σE2 = wα wβ C(xα − xβ )
α=1 β=1
YXW
V
Minimal estimation variance
We want weights wα that produce a minimal estimation variance:
n
X
minimum of var(M ? − m) subject to wα = 1
α=1
The objective function ϕ has n+1 parameters:
n
X
ϕ(w1 , . . . , wn , µ) = var(M ? − m) − 2 µ wα − 1
α=1
∂ϕ(w1 , . . . , wn , µ) ∂ϕ(w1 , . . . , wn , µ)
∀α : = 0, =0
YXW
V
Kriging equations
The method of Lagrange yields the equations for
the optimal weights wαKM of the kriging of the mean:
n
X
KM
w β C(xα − xβ ) − µKM = 0 for α = 1, . . . , n
β=1
n
X
wβKM = 1
β=1
YXW
V
Case of no autocorrelation
When the covariance model is a pure nugget-effect:
2
σ if xα = xβ
C(xα − xβ ) =
0 if xα 6= xβ
1
The solution weights are all equal: wαKM =
n
n
? 1X 2 1
⇒ M = Z(xα ) the arithmetic mean! µKM = σKM = n
σ2
n α=1
YXW
V
Ordinary Kriging
YXW
V
Estimation at a Point
Sample locations xα (dots)
in a domain D:
●
x0
●
●
● ●
● ●
●
● ●
● ●
●
● ●
●
●
●
YXW
V
Ordinary kriging
The estimate Z ? is a weighted average of data values Z(xα ):
n
X n
X
Z ? (x0 ) = wα Z(xα ) with wα = 1
α=1 α=1
n
X
2
Minimal variance: σOK = µOK + wαOK γ(xα −x0 )
α=1
YXW
V
Cross-validation
YXW
V
Cross-validation
Comment: the sound way to cross-validate is to leave out half
of the data locations and to re-estimate them from the other
half : this requires many data! For that reason it is often done in
the following way (implemented in sotware packages). . .
The difference between the data value and the estimated value:
Z(xα ) − Z ? (x[α] )
gives an indication of how well the data value fits into the
neigborhood of the surrounding data values.
YXW
V
Average cross-Validation error
If the average of the cross-validation errors is not far from zero:
n
!
1 X
Z(xα ) − Z ? (x[α] ) ∼=0
n α=1
Z
YXW
V
Standardized cross-validation error
The kriging standard deviation σK represents the error predicted
by the model.
Z(xα ) − Z ? (x[α] )
σKα
YXW
V
Average squared Standardized Errors
If the average of the squared standardized cross-validation
errors is not far from one:
!2
n
Z(xα ) − Z ? (x[α] )
1 X ∼
2 =1
n α=1 σKα
YXW
V
Mapping with kriging
on a regular grid
with irregularly spaced data
W
W
YXW
V
Kriging for interpolation
Kriging is an estimation method.
It is not the quickest method to make an interpolation on a
regular grid for generating a map.
Its advantages are:
W
W
YXW
V
Generating a map
A regular grid is defined by the computer and
at each node of this grid a value is kriged.
● x0
❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ● ❍ ❍ ❍
❍ ❍ ❍ ❍ ❍ ●❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍
❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍
❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍
❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍
● ●
● ●
●
● ●
● ●
●
● ●
●
●
●
W
W
YXW
V
Moving Neighborhood
If all data are used: this is called a unique neighborhood.
Using a subset of close data points: a moving neighborhood.
●
x0
❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ● ❍ ❍ ❍
❍ ❍ ❍ ❍ ❍ ●❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍
❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍
❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍
❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍
● ●
●
●
●
● ●
● ●
● ●
●
●
●
W
W
YXW
V
Kriging weights
XYW
V
Kriging weights
2
Nugget-effect model σOK = 1.25
25%
●
25% 25%
● ❍ ●
L
25%
●
XYW
V
Isotropic variogram
2
Spherical model with range a/L = 2 σOK = .84
40.6%
●
9.4% 9.4%
● ❍ ●
L
40.6%
●
2
Gaussian model with range a/L = 1.5 σOK = .30
49.8
●
0.2% 0.2%
● ❍ ●
L
49.8%
●
XYW
V
Geometric anisotropy
Spherical with isotropic range
25%
●
25% 25%
● ❍ ●
L
25%
●
32.4% 32.4%
● ❍ ●
L
17.6%
●
Z
XYW
V
Relative position of samples
2 2
σOK = .45 σOK = .48
33.3% 33.3%
● ●
❍ 37.1% ● ❍ ● 37.1%
● ●
33.3% 25.9%
XYW
V
The screen effect
2
Spherical model with range a/L = 2 σOK = 1.14
65.6% 34.4%
● ❍ ●
A B
2
σOK = 0.87
XYW
V
Nested variogram
YXW
V
Nested Variogram Model
A nested variogram γ(h) is composed of
a sum of elementary variograms γu (h)
with u = 0, . . . , S:
S
X
γ(h) = γ0 (h) + . . . + γS (h) = γu (h)
u=0
YXW
V
Example: Arsenic in soil (Loire, France)
285.
280.
275.
270.
river
Loire
YXW
V
Example: Nested Variogram Model
A nugget-effect (nug) and two spherical (sph) structures:
γ (h)
1.0
0.5
short range
long range
nugget
0.0 h
0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.
YXW
V
Nested Covariance Function
S
X S
X
C(h) = Cu (h) = bu ρu (h)
u=0 u=0
YXW
V
Regionalization Model
Z(x) built up with uncorrelated components Yu (x)
of zero mean, with covariance functions Cu (h).
Example:
YXW
V
Linear Model with S + 1 components
S
X
Z(x) = Yu (x) + m
u=0
Z
YXW
V
Kriging Spatial Components
A component Y1 (x) at x0 is estimated from n data:
n
X
Y1? (x0 ) = wα Z(xα )
α=1
n
X
“No bias” with wα = 0 : this filters the mean m
α=1
Z
Z
YXW
V
Example: Short-range Component of As
285.
280.
275.
270.
Z
YXW
V
Example: Long-range Component of As
285.
280.
275.
270.
Z
YXW
V
Demographic application
fertility data
Z
W
W
YXW
V
Demographic application: fertility 1990
150
100
51°
50°
50
49°
48°
47°
46°
45°
0
44°
Communes de France
FERT500 class
Z
W
W
YXW
V
Variograms: class 100-500 women / commune
110.
100. M1
D1
90.
Variogram : FERT500
80.
70.
60.
D4 50.
100. D1
D3
D2
Variogram : FERT500
40.
75.
30.
20.
50.
10.
25. 0.
0. 100. 200. 300. 400.
Distance (km)
0.
0. 100. 200. 300.
Distance (km) short range long range
Z
W
W
YXW
V
Kriging: short range effect only
Fert500 estimation
5600. >=9.96
9.18
8.4
7.62
5500. 6.84
6.06
5.28
5400. 4.5
3.72
2.94
5300. 2.16
UTM (Km)
1.38
0.6
-0.18
5200. -0.96
110.
-1.74
100. M1
D1
-2.52
5100. -3.3
90. -4.08
Variogram : FERT500
80. -4.86
-5.64
70. 5000. -6.42
60. -7.2
-7.98
50.
4900. -8.76
40. -9.54
-10.32
30. -11.1
20.
4800. -11.88
-12.66
10. -13.44
0. 4700. -14.22
0. 100. 200. 300. 400. <-15
Distance (km) -100. 0. 100.200.300.400.500.600.700.800.
short range UTM (Km)
Z
W
W
YXW
V
Kriging
UTM (Km)
5300. 53.1
UTM (Km)
51.4062
50.625 52
49.8438 50.9
5200. 49.0625 5200. 49.8
48.2812 48.7
47.5 47.6
46.7188 5100. 46.5
5100. 45.9375 45.4
45.1562 44.3
44.375 43.2
5000. 43.5938 5000. 42.1
42.8125 41
42.0312 39.9
4900. 41.25 4900. 38.8
40.4688 37.7
39.6875 36.6
38.9062 35.5
4800. 38.125 4800. 34.4
37.3438 33.3
36.5625 32.2
4700. 35.7812 4700. 31.1
<35 <30
-100. 0. 100.200.300.400.500.600.700.800. -100. 0. 100.200.300.400.500.600.700.800.
UTM (Km) UTM (Km)
Z
W
W
YXW
V
Kriging with external drift
Z
XYW
V
Drift
ZZ
XYW
V
External drift method
ZZ
Z
XYW
V
External Drift method
n
X
Z ? (x0 ) = wα Z(xα )
α=1
h i n
X
⇒ E Z ? (x0 ) = wα b0 + b1 s(xα )
α=1
ZZ
XYW
V
Constraint: no bias
The constraint
n
X
wα = 1
α=1
n
X
=⇒ b 0 + b1 wα s(xα ) = b0 + b1 s(x0 )
α=1
n
X
=⇒ wα s(xα ) = s(x0 )
α=1
ZZ
XYW
V
Interpolation of external drift
This second constraint:
n
X
wα s(xα ) = s(x0 )
α=1
ZZ
XYW
V
Kriging System with linear and external drift
n
X
1 2
w β C(x α −x β ) + µ 0 + µ 1 x α + µ 2 x α + µ3 s(xα ) = C(xα −x0 ), ∀α
β=1
n
X
wβ = 1
β=1
n
X
wβ x1β = x10 (longitude)
β=1
n
X
2 2
w β x β = x 0 (latitude)
β=1
Xn
wβ s(xβ ) = s(x0 ) (external drift)
β=1
ZZ
XYW
V
Kriging temperature
ZZ
XYW
V
Temperature Data
temperature conditions the growth of plants
Scotland (without the Shetland and Orkney Islands)
1272 m
ZZ
XYW
V
January temperature vs latitude / longitude
ZZ
XYW
V
E-W and N-S temperature variograms
ZZ
XYW
V
Kriging mean January temperature
Z
XYW
V
Temperature vs elevation
Z
Z
XYW
V
Kriging temperature with elevation as drift
Z
XYW
V
Temperature estimates vs elevation
Z
XYW
V
Estimated external drift coefficient
n
X
b?1 = wα Z(xα )
n α=1
X
1 2
w β C(x α −x β ) + µ 0 + µ 1 x α + µ 2 x α + µ3 s(xα ) = 0 , ∀α
β=1
Xn
wβ = 0
β=1
n
X
wβ x1β = 0 (longitude)
β=1
n
X
2
w β x β = 0 (latitude)
β=1
Xn
wβ s(xβ ) = 1 (external drift)
β=1
Z
XYW
V
Estimated external drift coefficient
Z
XYW
V
Conditional simulation
V
YXW
Z
Conditional simulation vs Kriging
Z
YXW
V
Change of support
geostatistical simulation of O3
Z
YXW
V
CASE STUDY: Geostatistical simulation of O3
of realizations a lognormal random function
800 × 600 Km2
1 × 1 Km2
with a range of 50 Km
Z
YXW
V
Simulation of Ozone: 1×1 Km2 support
O3: 1x1km2
>=96
90
500.
84
78
400. 72
66
60
Km
300. 54
48
42
200. 36
30
24
100. 18
12
6
0. <0
0. 100. 200. 300. 400. 500. 600. 700.
ug/m3
Km
Z
YXW
V
Simulation of Ozone: 10×10 Km2 support
O3: 10x10km2
>=96
500. 90
84
78
400. 72
66
60
Km
300. 54
48
42
200. 36
30
24
100. 18
12
6
100. 200. 300. 400. 500. 600. 700. <0
ug/m3
Km
Z
Z
YXW
V
Simulation of Ozone: 20×20 Km2 support
O3: 20x20km2
>=96
500. 90
84
78
400. 72
66
60
Km
300. 54
48
42
200. 36
30
24
100. 18
12
6
100. 200. 300. 400. 500. 600. 700. <0
ug/m3
Km
Z
YXW
V
Simulation of Ozone
Frequencies
0.2
0.075
0.050
0.1
0.025
0.0 0.000
0. 100. 200. 0. 10. 20. 30. 40. 50. 60. 70.
O3 (ug/m3) O3 (ug/m3)
Z
YXW
V
Simulation of Ozone
D1 70. D1
100.
60.
D2
Variogram: O3
Variogram: O3
50.
75.
40.
50.
30.
D2
20.
25.
SUPPORT 10. SUPPORT
1x1 Km2 20x20 Km2
0. 0.
0. 50. 100. 150. 200. 0. 50. 100. 150. 200.
Distance (Km) Distance (Km)
Z
YXW
V
Simulation of Ozone
100.
50.
0.
40. 50. 60. 70. 80. 90. 100. 110. 120.
O3 cutoff (ug/m3)
Z
YXW
V
Simulation of Ozone
1.0
0.5
0.0
40. 50. 60. 70. 80. 90. 100. 110. 120.
O3 cutoff (ug/m3)
Z
YXW
V
Change of support
concept
Z
W
YXW
V
TOPIC: The Support of a Random Function
Mining
3D
Soil pollution
v 2D
Volumes V
s
Surfaces
S
Industrial hygienics
1D
∆t T
Time intervals
Z
W
YXW
V
The Effect of Changing the Support
Distribution of samples on small volumes (cm3 ) is different from
that of model output averages over large blocks (m3 ):
frequency
blocs
samples
Z
mean
Z
W
YXW
V
Neglecting the Support Effect
We are often interested in what is above a threshold:
overestimation!
threshold
Z
W
YXW
V
Neglecting the Support Effect
. . . or to systematic under-estimation:
underestimation!
threshold
Z
Z
W
YXW
V
Kriging of a Block average
Z
YXW
V
Estimation of a block value
Sample locations xα (dots)
in a domain D:
●
V0
●
●
● ●
●
●
●
● ●
● ●
● ●
●
●
●
YXW
V
Block Kriging
The block value Z ? (V0 ) is estimated as a weighted average of
the data values Z(xα ):
n
X n
X
Z ? (V0 ) = wα Z(xα ) with wα = 1
α=1 α=1
n
X
2
Kriging variance: σOK = µOK − γ(V0 , V0 ) + wαOK γ(V0 , xα )
α=1
YXW
V
Block kriging with non-point data
In applications the data can be averaged on blocks Vα .
We then use average variograms between these blocks:
Z Z
1
γ Vα , Vβ = γ(x − y) dx dy
|Vα | |Vβ |
x∈Vα y∈Vβ
vα
V0
Z
YXW
V
Change of support
W
W
YXW
V
Change of support
The variability of spatial or temporal data depends on the
averaging volume/interval(= the support)
Increasing support, the variability decreases
(reduction of variance, extremes...)
Observations are on point support as compared to the cells
of a numerical model.
End-users are often interested by a support of different
(intermediate) size −→ blocks
It is thus necessary to describe statistically how variability
changes as a function of support.
If the distribution is monomodal and not too asymmetrical,
an affine correction may suffice. Otherwise, non-linear
geostatistics or geostatistical simulation are needed
Applications: data aggregation, estimation of small block
statistics, downscaling. . .
W
W
YXW
V
Ozone in Paris on 17 july 1999 at 15h UTC
Airparif stations and Chimere grid
49.4°
49.2°
RUR_NO
49° RUR_NE
Mantes
Tremblay
Gennevilliers
Aubervilliers
P18
Neuilly
RUR_O Garches P7 P6
48.8° P13
Vitry RUR_E
Montgeron
48.6°
RUR_SO
Melun
48.4°
RUR_SE
48.2°
50 km
1.4° 1.6° 1.8° 2° 2.2° 2.4° 2.6° 2.8° 3° 3.2°
Z
W
W
YXW
V
Air quality regulations
Two ozone thresholds refering to a support of 1 hour:
W
W
YXW
V
Discrete Gaussian point-block model
(due to Georges M ATHERON, 1976)
x is a point randomly located in a block v.
E Z(x) | Z(v) = Z(v),
Z
W
W
YXW
V
Point-block-cell correlations
The Gaussian block anamorphosis is:
∞
X ϕk
ϕv (Y (v)) = rk Hk (Y (v)),
k=0
k!
Z
Z
W
W
YXW
V
Uniform conditioning
It consists in taking the conditional expectation of a
non-linear function of blocks knowing the cell value
containing them.
yc − rvV Y (V0 )
E Z(v)≥zc | Z(V0 ) = 1−G √ .
1 − rvV 2
Z
W
W
YXW
V
Variogram of Airparif measurements
4000.
Variogram : Ozone_17JUL15H
Variogram : Ozone_17JUL15H
1000.
3000.
750.
2000.
500.
1000.
250.
0. 0.
0. 10. 20. 30. 40. 50. 60. 0. 10. 20. 30. 40. 50. 60.
Distance (Kilometer) Distance (Kilometer)
Z
W
W
YXW
V
Anamorphosis of Airparif measurements
r=.97 r’=.72
225. 225.
200. 200.
Ozone
Ozone
175. 175.
150. 150.
125. 125.
100. 100.
Z
W
W
YXW
V
Histograms
30.
Frequencies (%)
20.
10.
0.
120. 130. 140. 150. 160. 170. 180. 190. 200. 210.
Ozone
Z
W
W
YXW
V
Proportion of values above threshold
100.
80.
70.
60.
50.
40.
30.
20.
10.
0.
120. 130. 140. 150. 160. 170. 180. 190. 200. 210.
Ozone
Z
W
W
YXW
V
Uniform conditioning by CHIMERE
UC 120: CHIMERE + Airparif stations
49.4°
49.2°
0.
49° 6
0.6
0.5
0.8
7
0.
0.
9
48.8° 0.2
0.9
0.4
0.3
0.
1
0.4
48.6°
0.3
0.1
0.
8
0.6
0.2
0.5
48.4°
0.5
0.7
0.3
0.4
48.2°
50 km
1.4° 1.6° 1.8° 2° 2.2° 2.4° 2.6° 2.8° 3° 3.2°
Z
W
W
YXW
V
Uniform conditioning by CHIMERE
UC 180: CHIMERE + Airparif stations
49.4°
0.4
5
49.2°
0.6
0.
0.4
0.9 0.7
0.
8
49°
0.6
0.1
0.3
0.3 0.7
48.8°
0.5
0.2
0.2
0.4
48.6°
48.4° 0.1
48.2°
50 km
1.4° 1.6° 1.8° 2° 2.2° 2.4° 2.6° 2.8° 3° 3.2°
Z
W
W
YXW
V
Precipitation in SE Norway
geostatistical downscaling
Z
W
XYW
V
Histogram of precipitation: July 2001
W
W
V
XYW
Z
Variogram of precipitation
Z
Z
W
XYW
V
Block and cell anamorphosis
r=.7 r=.365
10×10km2 blocks NCEP cells
Z
W
XYW
V
Reconstructed histograms
Z
W
XYW
V
Proportion above threshold
Z
W
XYW
V
Proportion blocks >100mm within NCEP cells
Z
W
XYW
V
NCEP cells and station values
Color codes: 0 < x < 75mm < x < 100mm < x < 125mm < x
Z
W
XYW
V
N 8
L R O E ) (
@ & 'A
- D 9
:/ +D P + :
C $ 5 F <=; %
N ! ?>
:
B
D <4 & *
!
& I
S @ HG @ +
; F
:
4 &,
:/ UTE C # I
8
T A J
M
A Q K -
MN
@
! +
0/.
D / I !
A 1
& L
2
(
GH D @ 43
:/ BC A
: / 5
8
I
M "D
V 8M & 6
XW "! 6
K 76 "!
YZ[\
^] @ D#
Y`a_ / I & $
A #
`b[ ^ $
\ cc ^ %
&
W^ $ &'
dY
ef`
gh
i