You are on page 1of 12

Kriging and Simulation in Presence

of Stationary Domains: Developments


in Boundary Modeling
Brandon J. Wilde and Clayton V. Deutsch
Abstract Perhaps the most critical decision in geostatistical modeling is that of
choosing the stationary domains or populations for common analysis. The bound-
aries between the stationary domains must be modeled with uncertainty. The corre-
lations and trends across these boundaries must be used in modeling. Interpolating a
distance function is a useful method for modeling boundaries with uncertainty. The
current implementation of distance function boundary modeling with uncertainty
requires expensive calibration with simulated data and numerous reference models
to ensure unbiasedness and fair uncertainty. A method for using the available data
to calibrate the distance function is proposed which greatly reduces the calibration
expense. The nature of the boundaries between stationary domains must be consid-
ered. Boundaries can be hard or soft. The boundary nature must be accounted for in
the modeling. A contact plot is a useful tool for identifying the nature of the grade
transition across a boundary. Guidelines for determining the nature of a boundary
from a contact plot are suggested. With the location of the boundary dened and
the nature of transition across the boundary determined, the next step is to model
the grades in a manner that accounts for the boundary information. The resulting
models should reproduce the boundaries of the stationary domains and reproduce
the nature of the boundary at the boundary location.
1 Introduction
Interpolating a distance function has been shown to be a reasonable method for lo-
cating boundaries as it is simple and exible [2, 4]. However, it needs a large amount
of hard data and does not provide direct access to uncertainty. The denition of the
distance function is related to the notion of distance to an interface separating two
distinct domains. Distance is measured to the nearest unlike data location. Distance
can be positive or negative depending on the location of the data inside or outside
B.J. Wilde () C.V. Deutsch
University of Alberta, Edmonton, Canada
e-mail: bwilde@ualberta.ca
C.V. Deutsch
e-mail: cdeutsch@ualberta.ca
P. Abrahamsen et al. (eds.), Geostatistics Oslo 2012,
Quantitative Geology and Geostatistics 17,
DOI 10.1007/978-94-007-4153-9_23, Springer Science+Business Media Dordrecht 2012
289
290 B.J. Wilde and C.V. Deutsch
the domain. The choice of sign for distance values inside or outside the domain is
trivial, but consistency must be emphasized. The distance function varies smoothly
between increasingly positive values outside and further away from the boundary
interface to increasingly negative values inside and further away from the bound-
ary surface. To determine the location of the boundary, the distance to the nearest
unlike sample is calculated for all available samples. This distance function data
is then used to condition the interpolation of distance function on a regular grid.
The boundary is considered to lay at the transition between positive and negative
interpolated distance function values.
There is uncertainty in the boundary location. Munroe and Deutsch [5, 6] pro-
pose assessment of the uncertainty by calibration of parameters C and where C
controls the width of the uncertainty and controls the bias. These parameters are
optimized to give appropriate uncertainty. Optimizing these parameters is an expen-
sive operation requiring multiple reference models and two objective functions.
This work proposes a simpler, less expensive calibration. Only the C parameter
is calibrated and this is done using only the data to calibrate in a relatively compu-
tationally inexpensive manner. A subset of data is removed prior to the calculation
of the distance function at the data locations. The subset of data removed will here-
after be referred to as the jackknife data. This effectively creates two data sets: the
distance function data and the jackknife data. The distance function data are used to
condition the estimation of distance function at the jackknife data locations. A num-
ber of jackknife data that are coded as inside the domain will have positive distance
function estimates (outside the domain) and a number of jackknife data that are
coded as outside the domain will have negative distance function estimates (inside
the domain). The C parameter is adjusted until the desired proportion of incorrectly
classied jackknife data is correctly classied. Once the C parameter is determined,
it is applied to the calculation of the distance function for all available sample data.
All of the data are then used to condition the interpolation of the distance function.
In addition to identifying the location of the domain boundary, the nature of
the grade transition across the boundary must be considered. Domain boundaries
are typically referred to as hard or soft. Hard boundaries are found when there is
an abrupt change in the mineralogy or grade. Contacts where the variable changes
transitionally across the boundary are referred to as soft boundaries. Geological
models should reproduce the boundary types indicated by the data.
2 Distance Function Formalism
The rst requirement in the calibration of distance function uncertainty is a dataset
where all data locations have been coded as either inside or outside the domain of
interest:
i(u

) =

1 if domain of interest present at u

,
0 otherwise,
= 1, . . . , n, (1)
where u is a location vector, is the sample index, and n is the number of sam-
ples. For each sample u

, = 1, . . . , n, the nearest sample located in a different


Developments in Boundary Modeling 291
domain u

is determined such that i(u

) =i(u

). The Euclidean distance between


these two locations is the distance function value at location u

: df (u

). If u

is
within the domain, the distance is set to negative; otherwise the distance is positively
signed:
df (u

) =

+(u

) if i(u

) = 0
(u

) if i(u

) = 1.
(2)
The calculation of distance would account for anisotropy, if present. Notice that
this approach is designed for binary systems where locations are in or out of a par-
ticular domain. Multiple domains could be modeled hierarchically. The presence of
many intermingled domains would not be possible with this approach. Also note
that the distance function data correspond to the distance to the nearest observed
contact; not the distance to the nearest real contact.
Once distance function has been calculated for each sample, this distance func-
tion data can be interpolated on a regular grid using a smooth estimator such as
kriging or inverse distance estimation. A global estimator is particularly suited to
the task as such an estimator is free from artifacts due to the search for local data.
The boundary is considered to lay at the transition between positive and negative
interpolated distance function values.
To illustrate, consider the sample locations coded as inside and outside the do-
main in Fig. 1a where black-lled bullets represent samples inside the domain and
white-lled bullets represent samples outside the domain. The distance to the nearest
outside sample is calculated for each inside sample and vice versa. These distances
are shown in Fig. 1b. Samples outside the domain have positive distance function;
samples inside the domain have negative distance function. These samples are in-
terpolated yielding the map of values shown in Fig. 1b. Negative estimates are con-
sidered inside the domain and positive estimates are considered outside the domain
yielding the map shown in Fig. 1c where black is inside the domain. An estimate of
the boundary location falls at the transition between positive and negative distance
function estimates (the black-white interface). There is, of course, uncertainty in the
location of the boundary.
3 C Parameter
This work proposes a new method for quantifying distance function uncertainty.
This method uses the data to calibrate a single additive factor, C, which modies the
distance function values calculated at the sample locations. This method is similar to
the jackknife where a subset of the data is held back when estimation is performed
and estimated values are compared with the true values at sample locations.
The C parameter modies the distance function value at each sample location.
C is an additive parameter, being added to the distance function when outside the
domain and subtracted from the distance function when inside the domain:

df (u

) =

df (u

) +C if i(u

) = 0
df (u

) C if i(u

) = 1,
(3)
where

df (u

) represents the modied distance function at the sample locations.


292 B.J. Wilde and C.V. Deutsch
Fig. 1 (a) Locations coded as inside (black) and outside (white) the domain, (b) distance function
calculated at each sample location and interpolated, (c) distance function less than zero considered
inside the domain
The C parameter increases the difference between positive and negative distance
function values. Once the C parameter has been applied to the data, the modied
distance function is interpolated. Modied distance function estimates greater than
C are considered outside the domain. Modied distance function estimates less than
C are considered inside the domain. Any modied distance function estimates
between C and C are within the range of boundary uncertainty; the boundary is
located between modied distance function estimates of C and C.
The uncertainty band for different C values is illustrated in Fig. 2 for different
values of C. The same data shown in Fig. 1 are used. A C value of 3 increases
the positive and decreases the negative distance function values by 3. The modied
distance function values are interpolated. Any modied distance function estimate
greater than 3 is considered outside the domain (white) while any modied distance
function estimate less than 3 is considered inside the domain (black). The grey
areas have modied distance function estimates between 3 and 3 and represent
the region within which the boundary may lay. This is repeated for C values of 5
and 7. As C increases, the data values change and the size of the grey boundary
Developments in Boundary Modeling 293
Fig. 2 Boundary uncertainty for C values of 3, 5, and 7. Blacksurely inside domain,
whitesurely outside domain, greyregion of boundary uncertainty
uncertainty region increases. There is a need to infer a reasonable C value for each
boundary.
4 C Calibration
The C parameter controlling the size of the boundary uncertainty is calibrated in a
manner similar to the jackknife. A subset of the data is removed and the remaining
data are used to estimate distance function at the jackknife locations. The number
of jackknife data that fall on the wrong side of the boundary is reduced as C is
increased.
The rst step in the calibration of C is to remove a subset of the data. This can be
done by randomly choosing drillholes to exclude. Distance function values are then
calculated for the remaining data with an initial value of C = 0.
These distance function data are used to condition the estimate of distance func-
tion at each of the jackknife data locations. There are four possible outcomes re-
294 B.J. Wilde and C.V. Deutsch
Fig. 3 Possible outcomes for
distance function estimation
at jackknife data locations
sulting from this estimation as shown in Fig. 3. The location could be: (1) correctly
estimated to be outside the domain, (2) correctly estimated to be inside the domain,
(3) incorrectly estimated to be outside the domain, and (4) incorrectly estimated to
be inside the domain. We are interested in the number of data that fall on the wrong
side of the boundary, that is, the number of times the estimate is positive but the
data is coded as inside the domain and the number of times the estimate is negative
but the data is coded as outside the domain. The number of times a data falls on the
wrong side of the boundary for C = 0 is the base case.
C is then increased and the distance function values at the non-jackknife sample
locations modied. This modied distance function is estimated at each of the jack-
knife locations. The boundary is now considered to fall between C and C. A data
falls on the wrong side of the boundary when a jackknife location is coded as inside
but has a modied distance function estimate greater than C or a jackknife location
is coded as outside but has a modied distance function estimate less than C. The
number of data falling on the wrong side of the boundary decreases as C increases.
C is increased until the number of data falling on the wrong side of the boundary is
acceptable. The C value where this occurs is the calibrated C value which quanties
the boundary uncertainty.
This is illustrated in Fig. 4 using the same data as previously. There are jackknife
data not used in the initial estimation of distance function that are now considered.
The white-lled circles represent sample locations outside the domain while the
black-lled circles represent sample locations inside the domain. For C = 0, there
are two samples coded as outside the domain which fall inside (white circles in
black region) and four samples coded as inside the domain which fall outside (black
circles in white region) for a base case of six incorrectly classied data. Increasing
the C parameter to 3 decreases the number of samples coded as outside the domain
which fall inside from two to one (one white bullet that was in the black region now
falls in the grey region) and decreases the number of samples coded as inside the
domain which fall outside from four to three (one black bullet that was in the white
region now falls in the grey region). Increasing C to 5 further decreases the number
of black circles falling in the white region to two for a total of 3 incorrectly classied
data, one half of the base case at C = 0.
The calibration of C is sensitive to which data are used as jackknife data. Us-
ing a different quantity and/or subset of the data will lead to a different number of
incorrectly classied data for a given C value. Therefore, it is recommended that
the C calibration be performed for a variety of jackknife subsets to ensure that the
calibration is robust.
Developments in Boundary Modeling 295
Fig. 4 Illustration of C calibration methodology for C values of 0, 3, and 5 respectively. The
number of jackknife data on the wrong side of the boundary decreases as C increases
Once C has been determined it is used to calculate the modied distance function
at all sample locations. The C parameter is applied as shown previously; it is added
to positive distance function values and subtracted from negative distance function
values. The resulting modied distance function is then interpolated. The boundary
is considered to fall inside the distance function transition from C to C. Different
boundaries can be extracted by applying a threshold between C and C. A thresh-
old of C corresponds to the white-grey interface in Fig. 4 and leads to a dilated
boundary that is big everywhere. A threshold near C corresponds to the black-
grey interface in Fig. 4 and leads to an eroded boundary that is small everywhere.
A threshold of zero is the base-case boundary.
5 Boundary Simulation
There can be cases where it is useful to have multiple realizations of the bound-
ary such that it is neither big everywhere nor small everywhere, yet uncertainty in
296 B.J. Wilde and C.V. Deutsch
Fig. 5 The same two models of interpolated distance function. The left map is smoothly varying;
the right map has thresholds applied at values of 5 and 5
its location is accounted for. This can be done within the calibrated distance func-
tion framework presented by simulating distance function values uniformly between
C and C. Wherever the interpolated distance function is less than the simulated
distance function is inside the domain and wherever the interpolated distance func-
tion is greater than the simulated distance function is outside the domain. The point
where the interpolated distance function is equal to the simulated distance function
is the boundary location.
To simulate L realizations of distance function between C and C, Gaussian
deviates, y
l
(u), l = 1, . . . , L, can be simulated and transformed according to:
df
l
(u) = 2CG
1

y
l
(u)

C, l = 1, . . . , L, (4)
where df
l
(u) is the simulated distance function value, y
l
(u) is an unconditionally
simulated standard normal value and G
1
represents the determination of the stan-
dard normal CDF value corresponding to y
l
(u). Multiplying by 2C and subtracting
C ensures that the values are between C and C. It is only necessary to simulate
at locations where the interpolated distance function is between C and C. Inter-
polated values that are greater than C are surely outside the domain; interpolated
values that are less than C are surely inside the domain. This can reduce the time
required to simulate distance function realizations.
Determining whether a location, u, is inside or outside the domain requires com-
paring the interpolated distance function with the simulated distance function:
i(u

) =

inside if df
l
(u) >

df (u)
outside if df
l
(u) <

df (u).
(5)
Since a boundary is being simulated, it is recommended that a Gaussian var-
iogram be used in the simulation of the distance function as this model maintains
high short-range continuity. A very small positive nugget is recommended for math-
ematical stability. There is no clear method for determining the range of the vari-
ogram model. A short range leads to a more rapidly changing boundary; a longer
Developments in Boundary Modeling 297
Fig. 6 Two realizations of distance function values simulated uniformly between 5 and 5 at the
locations where interpolated distance function falls between 5 and 5
range creates a smoother boundary. This variogram range could come from geo-
logic calibration from data or observations, visualization and expert judgment, or an
analogue deposit. The variogram range can be too small leading to unrealistic tran-
sitions between domains. A short range variogram causes the boundary to change
rapidly while a longer range leads to a smoother boundary. A too-long variogram
range is similar to using a single distance function threshold to locate the boundary;
the domain will be big or small everywhere.
An example of boundary simulation is shown using the same data as previously
for C = 5. The smoothly varying interpolated distance function is shown in Fig. 5.
Also shown in Fig. 5 is this same map with thresholds applied at 5 and 5. Those lo-
cations where the interpolated distance function falls between 5 and 5 are shaded
grey. These are the locations where distance function values are simulated uniformly
between 5 and 5. Two realizations of these simulated distance function values are
shown in Fig. 6. The determination of whether each location falls inside or outside
the domain is made by comparing the simulated distance function to the interpolated
distance function. Those locations where the interpolated distance function is less
than the simulated distance function are inside the domain and are shaded black in
Fig. 7; the locations where the interpolated distance function is greater than or equal
to the simulated distance function are outside the domain and are shaded white in
Fig. 7. The two simulated boundary realizations are similar in a global sense in that
the general shape of the black domain is about the same. However, locally the real-
izations can be quite different. In particular, the black domain in the rst realization
encompasses an island of the white domain where in the second realization this fea-
ture is not present. This relates well with the boundary uncertainty summarized by
the grey shaded region in the right of Fig. 5.
6 Boundary Nature
In addition to identifying the location of domain boundaries, the nature of transition
in the geologic properties across the domain boundaries should also be investigated.
298 B.J. Wilde and C.V. Deutsch
Fig. 7 Two boundary realizations based on the simulated distance function values in Fig. 6.
Domain boundaries are often referred to as either hard or soft [3]. Hard bound-
aries are found when there is an abrupt change in the mineralogy or grade without a
transition at the scale of observation [7]. They do not permit interpolation or extrap-
olation across domains. Contacts where the variable changes transitionally across
the boundary are referred to as soft boundaries. These allow selected data from ei-
ther side of a boundary to be used in the estimation of each domain.
A contact analysis is undertaken to detect hard and soft boundary transitions as
well as different types of hard boundary transitions. The corehole data with rocktype
or facies information is required for this analysis. McLennan [4] proposes two types
of contact analysis: expected value contact analysis and covariance function contact
analysis. Cuba and Leuangthong [1] show how the variogram can be used to identify
nonstationarity in the local variance in addition to the local mean.
Amore qualitative contact analysis is performed by use of a contact plot. Figure 8
shows the general form of a contact plot for a soft and hard boundary. This type of
plot is useful for determining the nature of a boundary. Sample data values z are
plotted against their distance inside either the left domain, d
12
, or right domain,
d
21
, from the boundary. Expected values are represented with the solid lines. Notice
the transition zone bound by the vertical dotted lines on the left. The size of the
transition zone may vary, but this zone will be present for soft boundaries. The
stationary random function (SRF) to the left and right of the boundary are denoted
by Z
1
(u) and Z
2
(u). In contrast to soft boundaries, the Z
1
(u) and Z
2
(u) SRFs are
applicable all the way through their respective domains up until the hard boundary.
There is no transition zone present.
In addition to plotting the expected value within each domain it is useful to plot
the variability seen in each domain as shown in Fig. 9 where the grey shaded region
represents the 90 % probability interval. The number of points within the specied
distance is also shown for each domain.
Considering the expected value and relative spread allows inferences to be made
regarding the nature of the boundary. Consider four boundary classications: none,
soft, hard stationary and hard nonstationary. The boundary illustrated by a contact
Developments in Boundary Modeling 299
Fig. 8 The general form of a
contact plot for both a soft
and a hard boundary
(modied from [4])
plot can be classied as one of these four types based on the slope of the expected
value line and the width of the 90 % probability interval. For example, one could
conclude that no boundary is present when the expected value at zero distance for
each domain is within the range of variability at zero distance for the other domain.
A hard boundary is likely present when the expected value at zero distance for at
least one domain falls outside the range of variability for the other domain. This
hard boundary is considered stationary when neither domain exhibits a strong trend
near the boundary. The hard boundary would be considered nonstationary when a
signicant trend is present. A soft boundary is present when the grade within one or
both domains exhibits a strong trend near the boundary with no signicant change
in grade at the boundary.
There are two general categories for modeling the nature of boundaries: implicit
and explicit [4]. The rst refers to the conventional method of pasting together do-
mains predicted from separate stationary random functions and conditioning data
sets. This is appropriate for hard boundaries. For soft boundaries, a near-boundary
model describing how separate stationary random functions interact is needed to
build in realistic geological transitions explicitly.
300 B.J. Wilde and C.V. Deutsch
Fig. 9 Another form of
contact plot showing the
change in porosity between
two different facies
7 Conclusions
Determining stationary domains is an important step in geostatistical modeling. The
locations of the boundaries between stationary domains must be determined. There
is uncertainty in the location of the boundary between stationary domains away
from data. The uncertainty in boundary location should be accounted for and should
be fair. The boundary uncertainty can be calibrated using the available data in a
manner similar to the jackknife. An additive parameter, C, is calibrated to quantify
the boundary uncertainty.
This calibration framework makes simulating boundary realizations straightfor-
ward. A uniform deviate between C and C is simulated. Comparing the simulated
deviate to the interpolated distance function classies each location as inside or out-
side the domain.
In addition to location, it is important to capture the nature of transition of the
geological property across the boundary. The nature of this transition has impor-
tant modeling implications. The boundary nature can be determined qualitatively
by considering a contact plot.
References
1. Cuba M, Leuangthong O (2009) On the use of the semivariogram to detect sources of non-
stationarity in mineral grades. In: Proceedings of APCOM
2. Hosseini AH (2009) Probabilistic modeling of natural attenuation of petroleum hydrocarbons.
PhD Thesis, University of Alberta, 359 pp
3. Larrondo P, Deutsch CV (2004) Accounting for geological boundaries in geostatistical model-
ing of multiple rock types. In: Leuangthong O, Deutsch CV (eds) Geostats 2004: proceedings
of the seventh international geostatistics congress. Springer, Berlin, pp 312
4. McLennan J (2008) The decision of stationarity. PhD Thesis, University of Alberta, 191 pp
5. Munroe MJ, Deutsch CV (2008) A methodology for modeling vein-type deposit tonnage uncer-
tainty. Center for Computational Geostatistics Annual Report 10, University of Alberta, 10 pp
6. Munroe MJ, Deutsch CV (2008) Full calibration of C and in the framework of vein-type deposit
tonnage uncertainty. Center for Computational Geostatistics Annual Report 10, University of
Alberta, 16 pp
7. Ortiz JM, Emery X (2006) Geostatistical estimation of mineral resources with soft geological
boundaries: a comparative study. J S Afr Inst Min Metall 106(8):577584

You might also like