You are on page 1of 18

This article was downloaded by: [SLU Library]

On: 20 February 2015, At: 05:49


Publisher: Taylor & Francis
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered
office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

International Journal of Geographical


Information Science
Publication details, including instructions for authors and
subscription information:
http://www.tandfonline.com/loi/tgis20

Scale effects in uncertainty modeling of


presettlement vegetation distribution
a

E.-H. Yoo & A.B. Trgovac

Department of Geography , University at Buffalo , Buffalo, NY,


USA
Published online: 23 May 2011.

To cite this article: E.-H. Yoo & A.B. Trgovac (2011) Scale effects in uncertainty modeling of
presettlement vegetation distribution, International Journal of Geographical Information Science,
25:3, 405-421, DOI: 10.1080/13658816.2010.518390
To link to this article: http://dx.doi.org/10.1080/13658816.2010.518390

PLEASE SCROLL DOWN FOR ARTICLE


Taylor & Francis makes every effort to ensure the accuracy of all the information (the
Content) contained in the publications on our platform. However, Taylor & Francis,
our agents, and our licensors make no representations or warranties whatsoever as to
the accuracy, completeness, or suitability for any purpose of the Content. Any opinions
and views expressed in this publication are the opinions and views of the authors,
and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content
should not be relied upon and should be independently verified with primary sources
of information. Taylor and Francis shall not be liable for any losses, actions, claims,
proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or
howsoever caused arising directly or indirectly in connection with, in relation to or arising
out of the use of the Content.
This article may be used for research, teaching, and private study purposes. Any
substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,
systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &
Conditions of access and use can be found at http://www.tandfonline.com/page/termsand-conditions

International Journal of Geographical Information Science


Vol. 25, No. 3, March 2011, 405421

Scale effects in uncertainty modeling of presettlement vegetation


distribution
E.-H. Yoo and A.B. Trgovac
Department of Geography, University at Buffalo, Buffalo, NY, USA

Downloaded by [SLU Library] at 05:49 20 February 2015

(Received 2 August 2009; final version received 22 August 2010)


This article proposes a geostatistical interpolation method, namely, block indicator
kriging, as a means to model uncertainty-quantified presettlement vegetation surfaces.
We demonstrate its potential in landscape ecology for improving the quality of the
resulting surfaces using indicator-coded presettlement land survey record data and the
areal proportion of each species. The geostatistical interpolation method presented in
this study explicitly models support differences between the source data and the prediction surface, while taking into account spatial dependence present in multiscale
presettlement land survey record data. In this case study, we demonstrate that block indicator kriging with areal proportion data substantially increases the prediction accuracy
and coherence using witness tree species data recorded in the northeast of Minnesota.
The relative merit of the proposed geostatistical interpolation method to other conventional geostatistical approaches is illustrated through a comparative analysis, where
continuous vegetation surfaces obtained from other approaches are compared with
those obtained from the block indicator kriging with areal proportion data in terms
of the prediction accuracy and consistency.
Keywords: block indicator kriging; presettlement land survey records; spatial dependence; support differences

1. Introduction
Survey notes from the US General Land Office as well as private land companies provide a record of the vegetation found across the United States before European settlement.
Historically, these presettlement land survey records (PLSR) have been used to reconstruct
the spatial distribution of individual tree species as well as vegetation communities (Howell
and Kucera 1956, Delcourt and Delcourt 1974, Grimm 1984). With many applications of
such distributions, from providing a baseline for the ecological restoration of disturbed
areas to examining forest changes over time, the task of creating a continuous distribution
of presettlement vegetation often employs spatial interpolation as evidenced by the existing
literature (Brown 1998, Batek et al. 1999, He et al. 2000, Cogbill et al. 2002, Rathbun and
Black 2006, Wang 2007).
Generally speaking, spatial interpolation falls into two categories depending on the
nature of the resulting surface: deterministic and probabilistic. Deterministic interpolation
methods are preferred in a situation where sufficient knowledge about the phenomenon

*Corresponding author. Email: eunhye@buffalo.edu


ISSN 1365-8816 print/ISSN 1362-3087 online
2011 Taylor & Francis
DOI: 10.1080/13658816.2010.518390
http://www.informaworld.com

Downloaded by [SLU Library] at 05:49 20 February 2015

406

E.-H. Yoo and A.B. Trgovac

of interest is available. Under such circumstances, a thorough understanding of the phenomenon allows for a complete description of the sample data as well as a robust prediction
of unknown values. When such spatial interpolation methods are applied in ecological
studies (Batek et al. 1999, He et al. 2000, Cogbill et al. 2002), potential errors associated
with the reconstructed vegetation surfaces are implicitly assumed to be negligible. Unlike
deterministic methods, probabilistic approaches, which include geostatistical models and
Bayesian hierarchical spatial models, typically acknowledge a lack of understanding and
provide mappable uncertainty-quantified probabilities associated with the predicted presettlement vegetation surface (Hershey 2000, Rathbun and Black 2006, He et al. 2007,
Wang 2007). Recognizing that the available information regarding presettlement vegetation is incomplete, it is necessary to quantify the spatial uncertainty associated with the
reconstructed vegetation distribution obtained using PLSR. Despite its significance, uncertainty assessment of the reconstructed presettlement surfaces has often been overlooked in
existing studies.
In addition to reporting the uncertainty associated with the prediction surface, another
important issue in mapping the spatial distribution of presettlement tree species is the selection and change of support associated with the sampled point data and the reconstructed
surface. Here, support pertains to the areal extent of a datum and a sought-after prediction
(Journal and Huijbregts 1978, Olea 1991), and it has been referred to as spatial resolution,
spatial scale, or grain in the relevant literature (Turner 1989, Quattrochi and Goodchild
1997, Atkinson and Tate 2000, Dungan 2001). From the vast research conducted using
PLSR, it is clear that issues relating to spatial scale have drawn researchers attentions.
Some investigators have focused on the spatial scale of the source data and have explored
how a selectively sampled subset of the point data affects the quality of the reconstructed
surface or the landscape pattern (He et al. 2000, Manies and Mladenoff 2000, Wang and
Larsen 2006). Others have focused on the target surfaces and have examined how the
choice of the spatial scale of the prediction surface affects the landscape pattern analysis (Delcourt and Delcourt 1996, Wu 2004). Both questions are valid as the quality of the
prediction surface is affected by both the spatial scale of the observation (source data) and
the spatial scale of the prediction (target surfaces). In fact, the consideration of support
effects in the context of spatial interpolation is a part of a larger framework of scaling
issues previously identified in various disciplines including geography (Atkinson and Tate
2000, Dungan 2001, Gotway and Young 2002).
To the authors knowledge, however, none of the existing literature on the process of
mapping presettlement tree species explicitly accounts for the change of support from point
survey data to the predicted vegetation surface. Spatial interpolation changes the support
of a variable, creating a new variable that is related to the original data but has different statistical and spatial properties (Chils and Delfiner 1999, Gotway and Young 2004).
Therefore, it is necessary to consider the effects that may arise due to changes of support
when creating interpolated surfaces of presettlement vegetation.
Given the significance of spatial scale on the model prediction, another intriguing line
of research is to describe the complex spatial structure of the forest using the PLSR data
at multiple spatial scales. Geostatistical applications in the earth sciences typically involve
area (or volume) averages of the primary variable being estimated, mainly because data
are defined over areal support rather than point support (Journel 1999). Although point
data provide high-resolution information at a single location, areal average data provide
a general description of the data at a coarser resolution, for example, as a nonstationary
locally varying mean. In addition, the kriging prediction at an unsampled location retains
only the nearest conditioning data when large sample data are used, which may result in the

Downloaded by [SLU Library] at 05:49 20 February 2015

International Journal of Geographical Information Science

407

loss of valuable large scale information (Goovaerts 1997). Therefore, conditioning the prediction of unsampled value to both local information and areal data will provide valuable
information to build a numerical model.
The challenge of incorporating such areal average data in the task of mapping the presettlement tree species is to design a set of meaningful areal units over which the areal
proportion of each species is determined. This problem has long been identified as a fundamental aspect of data analysis in geography, known as the modifiable areal unit problem
(Openshaw and Taylor 1979). One approach to address this question is to design areal units
specifically to meet some goals (Flowerdew and Green 2001). In the context of mapping
of the spatial distribution of presettlement vegetation, such a goal would be to capture
the regional effects in tree species distribution that may not be captured using PLSR data.
Recognizing the influence that environmental factors have on the spatial distribution of tree
species and the overall forest mosaic, the utilization of PLSR data at areal units congruent
with relevant environmental characteristics may prove to be a meaningful covariate.
In summary, we aim to reconstruct uncertainty-quantified presettlement vegetation
surfaces using multiscale PLSR data. The objectives are threefold: (i) mapping the
uncertainty-quantified spatial distributions of common tree species using both indicatorcoded PLSR data and their spatial average values, (ii) accounting for the support
differences between the source data and the prediction surface through block indicator
kriging, and (iii) assessing the effects of the areal data on the predictive capabilities of the
geostatistical model across various spatial scales of prediction.
2. Background
The study area for this investigation is Lake County, MN, USA. Situated in the northeastern portion of the state, Lake County is bounded by Ontario, Canada, to the north
and Lake Superior to the south (Figure 1a). The county lies within the Laurentian Mixed
Forest ecological province, which is characterized by broad expanses of coniferous and
mixed hardwood forests as well as localized areas of bogs and swamps. Before European
settlement, the uplands were dominated by conifers, especially white (Pinus strobes),
red (Pinus resinosa), and jack (Pinus barksiana) pines, whereas quaking aspen (Populus
tremuloides)paper birch (Betula papyrifera) associations were found in disturbed areas
(Friedman et al. 2001). The lowland forests were populated with tamarack (Larix laricina)
and black spruce (Picea mariana).
The original bearing tree records for Minnesota have been published online (Minnesota
DNR GIS Data Deli 2009) and contain a wealth of ecological information including species
type, location, distance, and direction from the survey corners and tree diameter. In this
study, we focus on the species attribute of the 16,192 witness trees located within the study
region. The data were transformed into an indicator (or binary) variable depicting the presence or absence of individual tree species at each survey location. For illustrative purposes,
only three common tree species aspen (Populus sp.), jack pine, and paper birch were
selected for this study considering that they have sufficient stature as witness trees during
a land survey and distinct spatial distributions widespread with small clustered patches,
highly concentrated at the center, and evenly spread all over the study region, respectively
(see Figure 1df).
The spatial patterns of the presettlement forest are likely to be related to the patches
of environmental conditions, that the individual tree species located within the same area
are commonly exposed to (Friedman et al. 2001). The geologic features of Lake County
show the strong influence of the regions glacial history. The southern portion of the

E.-H. Yoo and A.B. Trgovac

Downloaded by [SLU Library] at 05:49 20 February 2015

408

Figure 1. (a) The study area of Lake County encompasses the northeastern portion of Minnesota.
Within the study area, a total of 1846 four tree-plots, that is, survey corner locations where four
individual species were present, were identified; (b) contour map for elevation; (c) soil texture of
the study area; (d), (e), and (f) show survey locations revealing the presence of aspen (a total of
1281), jack pine (a total of 1831), and paper birch (a total of 2104) with point symbols; (g), (h), and
(i) are proportions of each tree species, aspen, jack pine, and paper birch, over 35 environmentally
homogeneous zones.

Downloaded by [SLU Library] at 05:49 20 February 2015

International Journal of Geographical Information Science

409

county is covered by the remains of the terminal and ground moraine of the Superior
lobe (Hobbs and Goebel 1982). Till plains, drumlin fields, and peat bogs dominate the
midsection of the county whereas scoured bedrock and glacial lakes are prevalent to the
north. The soils of the county also reflect the glaciation of the region with coarse loamy
soils dominating the northern portion, sandy and hemic soils found in the center, and
a thin covering of soils of various textures that comprise the moraine landforms in the
south.
Recognizing that such diverse landscape and edaphic conditions of the study area may
afford an opportunity to study the spatial distribution of presettlement tree species across
various environmental gradients, we identified 35 zones sharing similar environmental
characteristics. More specifically, two environmental covariates, elevation and soil texture
shown in Figure 1b and c, were considered as significant drivers to influence the distribution of the selected tree species (Friedman et al. 2001, Lichstein et al. 2002, Bolliger
and Mladenoff 2005, He et al. 2007), and a set of areal units sharing a relatively homogeneous soil texture and elevation is identified from them. Both the elevation and the soil
texture data are from Minnesota DNR GIS Data Deli (2009). The areal units are used
as a spatial basis for the areal proportion data, whose attribute values are calculated as
the ratio between the number of individuals of the species (aspen, jack pine, and paper
birch) and the number of individuals of all species occurring within each areal unit. These
areal proportion values amount to the relative abundance determined for each tree species
at a regional scale, which complements the fine-scale composition and local variation of
the individual tree records by accounting for environmental phenomena that take place or
are present at a larger spatial scale. Figure 1gi, respectively, illustrates the spatial distribution of areal proportions occupied by the three tree species. These areal proportions
for each tree species are used as auxiliary data in the uncertainty-quantified vegetation
surface mapping as well as informed areal data in the model validation. Such areal data
have rarely been used in previous studies as a source of additional information, although
they can be substantially effective to increase the accuracy of predicted vegetation surface
(see Section 4.2). We chose the coherence property of the predicted probabilities as a criterion to evaluate the model performances, that is, if the informed areal proportion value at
each zone is reproduced by the spatial average of predicted probabilities within that zone
(see Section 4.3).
Finally, it is known that plot-scale species aggregation often provides useful information in addition to the absolute value obtained from the indicator transformed PLSR at
survey locations (Cottam and Curtis 1956). For example, various measures of aggregation,
such as relative frequency and numerical dominance, observed at the plot level may suggest patterns of spatial dependence associated with the reproductive or ecological nature of
the species (Friedman et al. 2001). Although spatial distributions of individual tree species
occurrence at a single location may be of value in economic estimates, they are of limited utility in landscape ecology and more likely to be subject to confounding factors and
measurement errors. In this study, the degree of aggregation is quantified by the number
of individuals at each plot site, and this information is used as a source in the model validation (see Section 4.2). To increase the reliability of plot-level data, such validation data
are taken only at survey corner locations in the PLSR data where four individuals were
present, referred to as four tree-plot hereafter (Friedman et al. 2001). The imposition of
this additional constraint identified only a total of 1846 four tree-plots (Figure 1a) from the
original 6769 survey corner sites.
In what follows, we will illustrate a geostatistical framework that integrates the multiscale PLSR data for model predictions, while taking into account their spatial correlations

410

E.-H. Yoo and A.B. Trgovac

across various spatial scales and the support differences among multiscale PLSR data and
prediction surfaces.

Downloaded by [SLU Library] at 05:49 20 February 2015

3. Methods
Consider the problem of modeling the uncertainty about the tree species attribute at an
unsampled location. Let S(u) denote a discrete random variable (RV) associated with a
presettlement tree type {sk , k = 1, . . ., K} present at any location u within a study domain A.
Here u denotes a vector of spatial coordinates, that is, u = (x, y). The uncertainty intrinsic
to the spatial classification of a tree species at location u can be modeled by the conditional
probability distribution function of the discrete RV S(u) derived from survey data at n
locations {u , = 1, . . ., n} (Journel 1983, Goovaerts 1997). At each survey location, the
species attribute is transformed into a vector of K local prior probabilities corresponding
to the K tree species as P{S(u ) = sk } = 1, if the kth tree species was witnessed at u
and 0 otherwise. For the kth tree species, indicator-coded data can be arranged in a (n
1) vector ik = [ik (u ), = 1, . . . , n]T , where ik (u ) denotes a realization of the indicator
RV I k (u ) corresponding to the kth tree species at the th survey location u with I k (u )
= 1, if S(u ) = sk and 0, otherwise. Here, the superscript T denotes the transposition of a
vector or matrix.
Although the indicator data associated with the individual tree species occurrence at the
survey locations provide local details of individual species distribution, it would be useful
to consider spatial variations at a larger spatial scale to understand the complex spatially
structured properties of the forest. The areal proportions of individual tree species over
selectively chosen spatial units may provide a landscape spatial pattern that is reflective of
the environmental factors which take place at that scale. More specifically, consider a set
of areal units that yield a (M 1) vector fk = [f k (w ), = 1, . . ., M]T of areal proportions
of the kth tree species occurring over all survey sites within that areal unit. The th areal
attribute value is calculated as a
unit is denoted as w = w(u ) centered at u and
 its
w
ik (u ), u w for k = 1, . . ., K,
linear average of the indicator data as fk (w ) = n1w n=1
where nw denotes the total survey sites within the areal unit. This transformation of point
data to areal proportion values amounts to the rescaling of PLSR data to a process-based
component observed at a coarser spatial scale.
Given a set of point indicator data and areal proportions {ik , fk , k = 1, . . . , K} for the
individual tree species, K conditional probabilities at an unsampled location u0 equal a set
of K conditional expectation values of the corresponding indicator RVs, that is, pk (u0 ) =
P{S(u0 ) = sk |ik , fk } = E{ik (u0 )|ik , fk } for k = 1, . . ., K. The conditional probability for the
k-th tree species at unvisited location u0 can be predicted as a linear combination of n point
indicator data ik and M areal proportion values fk as
p k (u0 ) =

n


k (u0 )ik (u ) +

=1

iTk k (u0 )

M


=1
fTk k (u0 )

k (u0 ) fk (w )

(1)

where k (u0 ) = [k (u0 ), = 1, . . . , n]T denotes a (n 1) vector of weights assigned to


point indicator data ik and k (u0 ) = [k (u0 ), = 1, . . . , M]T denotes a (M 1) vector of
weights assigned to areal proportion values fk . These two sets of weights k (u0 ), k (u0 )
for the kth tree species are the solution of (n + M + 1) linear equations, often referred to
as an indicator ordinary kriging (IOK) system (see Appendix 2). Note that the above IOK

Downloaded by [SLU Library] at 05:49 20 February 2015

International Journal of Geographical Information Science

411

system with areal proportion data does not necessarily call for regular supports for areal
data. The extended IOK system is valid as long as its attribute values are a linear average
of point data. When the unknown value is related to different attributes, for example, point
indicator or areal proportions of co-occurrent species, the IOK system is easily extended
to cokriging. Refer to Journel (1983), Goovaerts (1997), and Liu (2007) for further details
on IOK without/with area average values or indicator ordinary cokriging.
Here we assume that the necessary conditions for IOK have been met, that is, point
indicator data ik and areal proportions values fk are linked to a realization of an intrinsic
stationary point random function {I k (u), u A} with a constant but unknown mean and
a stationary variogram model k (h) as a function of the separation vector h =||u u ||
between any two pairs of locations u, u A within the study region. The consistent IOK
predictions with the areal proportion values used are guaranteed by the adequate modeling
of area-to-area variogram between any pair of areal proportion values and area-to-point
variogram between any pair of areal proportion values and point indicator data as well as
point predictions. The coherence, referred to as mass-preserving or pycnophylactic property (Tobler 1979, Lam 1983, Wahba 1990), of the prediction probabilities, that is, the areal
proportion value is reproduced when point conditional probability predictions are reaggregated within the support of the areal datum considered, is guaranteed through a consistent
modeling of the area-to-area and area-to-point variograms (see Appendix 1).
In the application of indicator kriging to presettlement forest studies, the support of
the reconstructed vegetation surfaces is typically larger than the point support of the witness tree data. The continuous surface of each tree species distribution is approximated
by a finite collection of block IOK prediction values. The task is to model the uncertainty
about the proportion of a block v occupied by the kth tree species by a block conditional
probability P{S(v) = sk |ik , fk }. Although various techniques are available to derive a block
conditional probability from point predictions (Journel and Huijbregts 1978, Isaaks and
Srivastava 1989, Saito and Goovaerts 2002), here we use a numerical approximation represented by the arithmetic average of point or pseudo-point prediction values within the
block. Then, the pth block IOK prediction, denoted as p k (vp ), is calculated from a weighted
average of the discrete point probabilities within the support vp as
p k (vp )

nv
1 
p k (uj ),
nv j=1

uj vp

(2)

where nv denotes the number of pseudo-points within the pth target block vp .
4. Results and discussion
In the following subsections, the indicator approach is used to predict the conditional probabilities of each tree species occurring at various spatial scales both with and without the
areal proportion data. The influence of spatially aggregated data on the models predictive
capabilities is assessed in terms of the prediction accuracy to the degree of species aggregation at a tree-plot scale and the consistency of the predicted probabilities at multiple
scales of prediction.
4.1. Spatial continuity modeling
The spatial continuity (dependence) of the individual tree species is modeled through a variogram and the corresponding model parameters are summarized in Table 1. Experimental

412

E.-H. Yoo and A.B. Trgovac


Table 1. Parameters of point indicator variogram models.
Structure (I)
Taxa
Aspen
Jack pine
Paper birch

Structure (II)

Type

Nugget

Sill

Rangea

Sill

Range

Exponential
Exponential
Spherical

0.62
0.28
0.53

0.22
0.47
0.36

1.60
1.40
1.20

0.16
0.25
0.11

12.70
24.00
15.00

Note: a The distance is measured on the kilometer scale (km).

Downloaded by [SLU Library] at 05:49 20 February 2015

variograms computed from the 16,192 indicator-coded data are fitted using either an exponential or a spherical model with two nested structures. A lag distance of approximately
0.6 km was used and variograms were computed to a maximum distance of 30 km, given
that the dimensions of the study area are approximately 58 km (EW) and 138 km (NS).
The fitted point variogram models for all the tree species have relatively high sill
values with various ranges reflecting the spatial continuity of each tree species. Among
three species considered, aspen has the smallest population size (1281 occurrences among
16,192 witness trees) and occurred at a lower level of aggregation, which is also evidenced
in the highest nugget effect (62%) of the fitted variogram model. Ecologically, paper birch
is capable of colonizing disturbed areas rapidly on a wide variety of soil types but is relatively short-lived and susceptible to fire. Such traits may yield the highest abundance
among the three species considered but may not necessarily yield in aggregation in its spatial distribution, as evidenced by the high nugget effects (53%) and medium level (36%)
of spatial dependence at a short distance (1.2 km). Jack pine, the most clustered among
the three species, is capable of forming selective stands, particularly on sandy soils. Once
past the sapling stage, the bark of the jack pine affords it some protection from fire allowing mature individuals to survive through this particular disturbance type. Additionally,
jack pine exhibits fire-mediated serotiny allowing it to be among the first to establish itself
on the recently disturbed landscape, and potentially form monospecific stands. Although
this study did not include an investigation into edaphic conditions or disturbance regimes,
it might be assumed that jack pines clustered distribution with a strong spatial continuity (approximately 50% of the spatial variability up to 1.4 km) may be due to these
characteristics.
The point-to-point, point-to-area, and area-to-area variogram associated with the areal
proportion values for the IOK prediction are computed from the fitted point variogram
models of each tree species (Table 1).
4.2. Spatial prediction with areal proportion data
Given the model of spatial continuity, uncertainty associated with the individual
tree species occurrence at any location is assessed through indicator kriging models
without/with areal proportion data. For both cases, the unknown probability of each tree
species occurring is predicted using point indicator data, but the areal proportion values
are considered simultaneously only in one of the kriging systems.
The two IOK models are evaluated in terms of their prediction capabilities to the degree
of species aggregation at the plot scale by comparing the number of individuals per species
at each plot with the local average of predicted probabilities at the collocated locations.
Given that the plot data are collected using a certain distance measure from a corner
location to a bearing tree, generally refered to as point to plant methods (Cottam and Curtis

Downloaded by [SLU Library] at 05:49 20 February 2015

International Journal of Geographical Information Science

413

1956), the comparative analysis calls for the identification of the collocated locations (pixels) of the two IOK prediction surfaces with the four tree-plot sites along with the distance
used to measure from sampling point to the tree closest to it. More specifically, the spatial
setting used to derive the metrics at the four tree-plots is approximated from the calculation of the average distance between sampling points and their nearest four trees and the
observation that the four tree-plots are situated approximately at 1.6 km intervals, whereas
the IOK prediction surfaces are available over a regular grid with size 0.4 0.4 km2 . In
summary, at each pixel collocated with the four tree-plot sites the local average of the predicted probabilities of any pixel located within 0.8 km is computed and compared with the
number of individuals of species derived at the corresponding plot.
Table 2 shows the summary statistics, mean, and variance (within parenthesis), of the
local average conditional probabilities of tree occurrences obtained through IOK without
and with areal proportion data, respectively. The number of conspecific species present at
a four tree-plot varies between 0 and 4, which indicates the level of aggregation occupied
by each of the three tree species. Measures of the degree of species aggregation at this plot
scale often provide useful information to characterize the spatial distribution of each tree
species over the study region, although the primary interest of this study is in assessing
the reproducibility of such measures between the two prediction models. To facilitate the
comparison between the number of individuals per plot and the local average of predicted
conditional probabilities, the range of the expected proportion corresponding to each number of individuals per plot is adjusted by smoothing (or averaging) the local average of
conditional probabilities (0.00.9).
In general, both IOK model predictions locate within the expected range of proportions. However, through all categories (04) and for all three species, the local average
of IOK predictions with areal data is closer to the center of the range with smaller variances. The instances where the local average of IOK predictions without areal proportions
underestimates the degree of aggregation at tree-plot scale outnumbered those obtained
using areal proportions for both aspen and paper birch. For aspen, the smaller population
size and less clustered spatial distribution may result in the underestimation of IOK predictions both without and with areal data, although the incorporation of areal proportion
data improves the quality of prediction by reducing the number of understimated instances
and decreasing the variances of the predictions. The aggregation level of paper birch is
relatively well described in both IOK models except at the highest aggregation (four individuals per plot). This is likely due to its widespread distribution and large population size.
The species aggregation pattern of jack pine is well reproduced regardless of the use of
areal data, although the incorporation of areal proportion improves the precision, as the
mean of the local average probabilities is closer to the center of the range and the variance
is smaller for each category.
The incorporation of areal data in IOK prediction clearly improves the model capabilities to reproduce the species aggregation pattern at a tree-plot scale, although the degree of
contribution of areal data depends on the spatial pattern and the abundance of each species.
The more dispersed and scarce is a species, the more informative the areal data are in the
IOK predictions.

4.3. Support differences in spatial prediction


The prediction of a conditional probability of individual tree species occurrence at a single
location is rarely a goal of ecological studies per se, but it is typically a preliminary step

0.0507 (0.0047)
0.0283 (0.0017)
0.0216 (0.0029)
0.0179 (0.0018)
0.0542 (0.0034)
0.0449 (0.0019)

woc
with
wo
with
wo
with

Aspen

0.1362 (0.0091)
0.1804 (0.0028)
0.2714 (0.0099)
0.2663 (0.0069)
0.2086 (0.0051)
0.2113 (0.0033)

1 (0.10.3)
0.1933d (0.0075)
0.3294 (0.0037)
0.4293 (0.0105)
0.4402 (0.0074)
0.3467 (0.0072)
0.3753 (0.0042)

2 (0.30.5)

0.2840 (0.0175)
0.5001 (0.0070)
0.6105 (0.0109)
0.6344 (0.0069)
0.4694 (0.0091)
0.5167 (0.0057)

3 (0.50.7)

0.3741 (0.0156)
0.6530 (0.0082)
0.7834 (0.0107)
0.8167 (0.0072)
0.6044 (0.0054)
0.6763 (0.0026)

4 (0.70.9)

Notes: a taken only at the four tree-plots; b the adjusted range of proportions (00.9) corresponding to the number of individuals per plot.; c without areal proportion data; d bold
type indicates instances where a given degree of aggregation is underestimated by the corresponding local average of IOK predictions.

Paper
birch

Jack pine

0 (00.1)b

Areal data

The number of individuals per plota

The summary of IOK predictions at the tree-plot level without (wo)/with areal data.

Species

Table 2.

Downloaded by [SLU Library] at 05:49 20 February 2015

414
E.-H. Yoo and A.B. Trgovac

Downloaded by [SLU Library] at 05:49 20 February 2015

International Journal of Geographical Information Science

415

in creating a map of vegetation distributions or in a subsequent landscape pattern analysis (Delcourt and Delcourt 1996, Manies and Mladenoff 2000, Friedman et al. 2001, Wu
2004, Wang and Larsen 2006). In various applications of spatial interpolation to produce
presettlement vegetation surfaces, however, support differences between the source data
and the sought-after prediction surfaces are often overlooked.
A comparative analysis is conducted to demonstrate the consequences of ignoring
support differences on indicator kriging predictions, where three kriging models centroidbased IOK, block IOK without areal data, block IOK with areal data are used to
reconstruct the surface over a regular grid of four different cell sizes (0.8, 1.6, 2.4, 3.6 km).
At each scale of prediction, the coherence property of the three model predictions is
assessed to determine whether the informed areal proportion value of each areal unit is
reproduced when the predicted conditional probabilities within the corresponding areal
unit are reaggregated. Here, it is assumed that the PLSR data-driven areal proportion values are representative of the true proportions of each tree species within that areal unit. The
prediction errors
are summarized
through
 mean absolute prediction error (MAPR), defined

M
fk (vp ) p k (vp ), where vp denotes the pth block constituting the
as MAPR = 100
p=1
M
prediction surface and M denotes the number of areal proportion data.
Note that the objectives of this comparative analysis are not only to evaluate the consequences of ignoring support differences between source data and target surface but also to
quantify the influence of areal data. The contribution of the areal proportion data is assessed
through a sensitivity analysis where kriging prediction is obtained using a random sample
of various sizes (30100% by 10% increments) of indicator point data accompanied by
areal proportion data, which are derived from the exhaustive indicator point data.
Centroid-based IOK mimics the conventional approach where the prediction support
is collapsed into the centroid (or grid node) of the corresponding block, whereas the other
two block indicator kriging models account for support differences between the source
data and the prediction surface. The block prediction probability at each scale of analysis is numerically approximated by the spatial average of pseudo-point IOK or block IOK
predictions without/with areal proportion data on grid cells at spatial resolution 0.4 0.4
km2 . The only difference between the two block IOK models is that the latter incorporates
areal proportion values as auxiliary data. Figure 2 shows the maps of the paper birch distribution obtained through three different kriging models at the scale of prediction 1.6 km.
The centroid-based IOK yields a smooth surface mixed with lots of salt and pepper as
shown in Figure 2a, whereas block IOK prediction without areal proportion data in Figure
2b yields the smoothest surface among the three results. The abrupt changes shown in the
centroid-based IOK prediction surface are attributed to the consequence of ignoring the
support of prediction, where the unknown at the centroid of each pixel constituting the
prediction surface is directly influenced by the nearest local point indicator data. In contrast, the prediction surfaces obtained from block IOK tend to smooth out those artificial
variations of predicted conditional probabilities, where the influence of areal data in the
block IOK (Figure 2c) enhances local details while describing a spatially varying pattern
of species occurrences.
The results of coherence assessments are summarized in Figure 3. The coherence property quantified by MAPR is summarized per species, where the three kriging models are
evaluated at various spatial scales of prediction using a subset of point indicator data
without/with areal proportion data. Paper birch (Figure 3c) has the highest MAPR among
three species, which is related with its widely distributed and moderately concentrated spatial pattern. In contrast, MAPR of jack pine is consistently low regardless of the spatial
scales of prediction or the kriging model considered. The incorporation of areal proportion

Downloaded by [SLU Library] at 05:49 20 February 2015

416

E.-H. Yoo and A.B. Trgovac

Figure 2. Conditional probability prediction surface of paper birch at the scale of prediction 1.6
km through (a) centroid-based IOK, (b) block IOK without areal proportion data, and (c) block IOK
with areal proportion data.

data reduces the MAPR for all species, but the effects of areal data are different among
them due to their distinct spatial patterns. As expected, the smaller the sample of the point
indicator data used, the more influential the areal proportion data are when incorporated
into the kriging prediction. This can be seen particularly at finer spatial scales (0.8 km),
evidenced by the larger gaps among three lines in Figure 3ac.
Another interesting result is that a critical spatial scale of prediction is found and
above this threshold the consideration of support differences between source data and target surface affects the coherence of prediction surfaces. For example, when the surface is
constructed at a relatively coarse scale, a grid of cell size 3.6 km, the consideration of the
differences between point and block support of the prediction surface affects the coherence
property of the prediction surface, which is summarized using MAPR. This is shown as a
significant difference between the two lines with the symbol circles and that with asterisk
at the target scale of 3.6 km for all species. Caution should be taken in this generalization,
however, as the results may vary depending on the spatial extent of the study and the spatial
continuity of species.
A factor closely related to support differences in the assessment of the conditional probability surface is the effects of the areal proportion data, shown in the differences between
the line with the square symbols and other two lines. Across all conditions considered, the
incorporation of the areal proportion values in the kriging prediction improves the coherence quality of the prediction surface. The degree of influence of the areal proportion data,
however, is different depending on the spatial patterns of species, the spatial scale of target
surface, and the percent portion of the point sample data used. When the full set of point
indicator data is used (100%) to predict the conditional probabilities at a finest spatial scale
(0.8 km), the influence of areal data is minimal and the spatial structure of the individual
species can account for the differences [see the differences among the three symbols on
the right axis (100%) of the three graphs in the first column of Figure 3]. The incorporation of areal data is the most effective at reducing the prediction errors (MAPR) for aspen,
the tree species with the lowest level of spatial dependence. The opposite is also true as
the prediction errors associated with jack pine, the species with the highest level of spatial
concentration, are reduced minimally across all spatial scales of prediction. At a coarser

International Journal of Geographical Information Science

417

(a)

1.6 km

0.8 km

30

30 50 70 100
Percentage
(c)

MAPR

1
30

50 70 100
Percentage

2
1
0

30 50 70 100
Percentage

MAPR

1
30

Centroid IOK

30 50 70 100
Percentage

2
1
0

50 70 100
Percentage

30 50 70 100
Percentage

2
1
0

30 50 70 100
Percentage

3
MAPR

MAPR

50 70 100
Percentage

3
MAPR

MAPR

MAPR

3
MAPR

Downloaded by [SLU Library] at 05:49 20 February 2015

30 50 70 100
Percentage
(b)

1
0

30 50 70 100
Percentage

MAPR

3.6 km

3
MAPR

MAPR

MAPR

2.4 km

30

50 70 100
Percentage

Block IOK without areal data

2
1
0

30 50 70 100
Percentage

Block IOK with areal data

Figure 3. Coherence assessments of three kriging models (denoted by different symbols) per
species [(a) aspen, (b) jack pine, and (c) paper birch] at various spatial scales of prediction (0.8,
1.6, 2.4, 3.6 km) using a random sample of point indicator data of different sizes (30100% of the
16,192 indicator-coded witness tree records). The line with circles () denotes mean absolute prediction error (MAPR) associated with the centroid-based IOK predictions, and the line with asterisks
() and squares (), respectively, denote MAPR of block IOK without/with areal proportion data.

scale of prediction, the effect of areal data is mixed with the support differences, although
sensitivity analyses support the preliminary conclusion that the inclusion of areal proportion data in IOK improves the quality of the prediction surface as shown by lower overall
MAPR.
Finally, the coherence of the prediction surface deteriorates as the target support
increases. In particular, the IOK with areal data at a coarser scale (3.6 km) yields larger
MAPR for three species. The increase in the MAPR is consistent regardless of the sample
size of the indicator point data. It should be noted that the MAPR of block IOK predictions
with areal proportion data reported in this study are not exactly 0 due to the approximation
of survey locations to quasi-point support using a grid with 0.4 km resoulution. The MAPR
of any block IOK with areal proportion data will be smaller if a very dense grid is imposed
on the study region or if the point indicator data are moved to the nearest discretization
grid nodes.

5. Conclusion
Geostatistical approaches have been used in several ecological studies to reconstruct the
spatial distributions of individual tree species and historical vegetation communities using

Downloaded by [SLU Library] at 05:49 20 February 2015

418

E.-H. Yoo and A.B. Trgovac

PLSR data. In these previous studies, geostatistics allows for the interpolation of the probability distribution of a tree species occurrence at unvisited locations. However, the change
of support from the PLSR data to the prediction surfaces has been ignored by assuming
that support of the prediction surfaces is sufficiently represented by the set of centroids of
a grid overlaid in the study region, regardless of the size of the grid. In addition, the multiscale properties of PLSR data, such as the regional distribution of a single tree species
described by the areal proportion, are rarely explored.
In this article, we demonstrate the use of block indicator kriging as a means to model
the unknown presettlement vegetation surface, while accounting for support differences
between prediction surfaces and point-support PLSR. We also consider area-averaged indicator data as an alternative source of information, which has the potential for improving the
quality of the prediction in the recostruction of the presettlement vegetation surface. The
use of these areal proportion data in the endeavor of constructing uncertainty-quantified
probability surfaces is not free from limitation. The design of such areal units is controversial in that the proportion taken by each species defined over a set of such areal units varies
in how the units are constituted and may, consequently, affect the quality of the resulting surface (Flowerdew and Green 2001). In this article, a set of areal units is designed
to maximize within-area homogeneity for the purpose of capturing the broadscale spatial
variation of species distribution under the assumption that tree species that are exposed
to similar environmental conditions tend to have similar abundance. This may be arguable
and further investigation with alternative areal units should be considered.
In the case study, we illustrated the contribution of areal proportion values on the prediction and a potential loss of accuracy (higher uncertainty) by ignoring support differences
between source data and prediction surfaces through a comparative analysis. PLSR data
collected in northeast Minnesota are used as a primary point source data and area-averaged
indicator values of individual tree species are used as auxiliary data. An analysis of the
coherence property shows that incorporating auxiliary data in the prediction depends on the
quality, quantity, and spatial configuration of existing data, but the quality of predictions
is improved as long as spatial dependence is present in the data. It was also clearly shown
that the indicator kriging prediction, which ignores the support differences (centroid-based
kriging), results in larger errors than block kriging through the comparative study over
various spatial scales of prediction.
The uncertainty assessment of presettlement tree species distribution presented in this
study can be further extended by investigating (i) the spatial uncertainty associated with
individual tree species occurring at several locations, (ii) the block conditional probabilities
simulated at multiple spatial resolutions, and (iii) the uncertainty propagation in the subsequent analysis such as spatial classification or landscape pattern analysis, using indicator
stochastic simulation. Perhaps a further extension of this work would be to incorporate relevant environmental covariates, such as topographical and edaphic variables, into the spatial
model to further refine the method of determining the spatial distribution of presettlement
tree species.

Appendix 1. Indicator variogram with areal data


The variogram of areal proportions occupied by the kth tree species between any two areal
supports w , w  is derived from the point support variogram model k (u, u + h) = k (h)
as:

International Journal of Geographical Information Science

k (w , w  ) =

1
|w ||w  |


uw

u w 

k (u, u ) dudu

419

(A-1)

where the variogram between any two pairs of areal units is the average point variogram
k (u, u ) corresponding to all possible separation vectors formed by any two locations u
w and u w  . Similarly, the variogram of the kth tree species occurrence at the th
survey location u with respect to its areal proportion value determined over the th areal
unit w is derived from the point variogram model k (u, u ) as


Downloaded by [SLU Library] at 05:49 20 February 2015

k (u, w ) =

u w 

k (u, u ) du

(A-2)

These regularized area-to-area and point-to-area variograms are stationary (for an areal
unit with any size, i.e., |w |, ,  1, . . ., M) under the assumption that the stationary
point support variogram k (h) = k (u, u ) of the kth tree species occurrence exists and the
attribute value of the areal data is defined as a linear average of the point data.
This entails that the average variogram between informed point indicator-coded data
and areal average of indicator data is a function of the separation vector h = ||u w ||
between the point u and any point located within the th areal unit w , that is, k (u, w ) =
k (h). Refer to Journel and Huijbregts (1978), Armstrong (1998), Boucher and Kyriakidis
(2006), and Remy et al. (2009) for further discussion.

Appendix 2. Indicator ordinary kriging system with areal data


Data taken at different spatial scales, both point data and its linear average over areal supports, can be considered simultaneously in the kriging system. The only constraint is that
the areal data are defined as a linear average of point support data within the areal support (Remy et al. 2009). The weights of IOK with areal proportions k (u0 ) and k (u0 ) are
obtained by solving the following (n + M + 1) ordinary kriging system of equations:
 uu

k
 wu
k
1Tn

kuw
 ww
k
1TM


1n   k (u0 )   uu
k
tu
1M
k (u0 ) = k
0
1
k (u0 )


where  uu
k = [k (u , u  ), , = 1, . . . , n] denotes a (n n) matrix of autovariogram
values between
any pair of point indicator RVs corresponding
to the kth tree species,


=

(u,
w
),

=
1,
.
.
.
,
n,

=
1,
.
.
.
,
M
denotes
a
(nM)
matrix of point-toand  uw
k

k
area variogram
values
between
any
pair
of
point
indicator
RVs
and
areal
proportion RVs,




=

(w
,
w
),
,

=
1,
.
.
.
,
M
denotes
a
(MM)
matrix
of
area-to-area
varand  ww
k

k
iogram values between any pair in areal proportion RVs. The Lagrange parameter k (u0 )
accounts for the unbiased constraint on the weights. Note that here, a single constraint
replaces the two nonbias conditions, such as one for the point indicator data and another for
areal proportions, that is, 1Tn k (u0 ) + 1TM k (u0 ) = 1. Such a single unbiasedness constraint
is allowed in the kriging system, when two variables have the same meaning and the same
expectation, but different spatial structures (Journel and Huijbregts 1978, p. 325). In this
study, the areal proportion values are derived from point indicator data, which implies that
their expected values are the same but they may have different spatial continuity models.

420

E.-H. Yoo and A.B. Trgovac

Downloaded by [SLU Library] at 05:49 20 February 2015

References
Armstrong, M., 1998. Basic linear geostatistics. Berlin: Springer-Verlag.
Atkinson, P. and Tate, N., 2000. Spatial scale problems and geostatistical solutions: a review.
Professional Geographer, 52 (4), 607623.
Batek, M., et al., 1999. Reconstruction of early nineteenth-century vegetation and fire regimes in the
Missouri Ozarks. Journal of Biogeography, 26 (2), 397412.
Bolliger, J. and Mladenoff, D., 2005. Quantifying spatial classification uncertainties of the historical
Wisconsin. Ecography, 28, 141156.
Boucher, A. and Kyriakidis, P.C., 2006. Super-resolution land cover mapping with indicator
geostatistics. Remote Sensing of Environment, 104, 264282.
Brown, D., 1998. Classification and boundary vagueness in mapping presettlement forest types.
International Journal of Geographical Information Science, 12 (2), 105129.
Chils, J.P. and Delfiner, P., 1999. Geostatistics: modeling spatial uncertainty. New York: John Wiley.
Cogbill, C., Burk, J., and Motzkin, G., 2002. The forests of presettlement New England, USA:
spatial and compositional patterns based on twon proprietor surveys. Journal of Biogeography,
29 (1011), 12791304.
Cottam, G. and Curtis, J.T., 1956. The use of distance measures in phytosociological sampling.
Ecology, 37 (3), 451460.
Delcourt, H. and Delcourt, P., 1974. Primeval magnolia-holly-beech climax in Louisiana. Ecology,
55 638644.
Delcourt, H. and Delcourt, P., 1996. Presettlement landscape heterogeneity: Evaluating grain of
resolution using general land office survey data. Landscape Ecology, 11 (6), 363381.
Dungan, J., 2001. Scaling up and scaling down: The relevance of the support effect on remote sensing
of vegetation. In: N. Tate and P. Atkinson, eds. Modeling scale in geographical information
science. New York: Wiley, chap. 12, 221235.
Flowerdew, R. and Green, M., 2001. Behavior of regression models under random aggregation. In:
N.J. Tate and P.M. Atkinson, eds. Modeling scale in geographical information science. London:
Wiley, chap. 5, 239247.
Friedman, S., Reich, P., and Frelich, L., 2001. Multiple scale composition and spatial distribution
patterns of the north-eastern Minnesota presettlement forest. Journal of Ecology, 89 (4),538
554.
Goovaerts, P., 1997. Geostatistics for natural resources evaluation. New York: Oxford University
Press.
Gotway, C. and Young, L., 2002. Combining incompatible spatial data. Journal of the American
Statistical Association, 97 (458), 632648.
Gotway, C. and Young, L., 2004. A spatial view of the ecological inference problem. In: G. King,
O. Rosen and M. Tanner, eds. Ecological Inference. Cambridge: Cambridge University Press,
233244.
Grimm, E.C., 1984. Fire and other factors controlling the big woods vegetation of Minnesota in the
mid-nineteenth century. Ecological Monographs, 54 (3), 291311.
He, H., et al., 2000. GIS interpolations of witness tree records (18391866) for northern Wisconsin
at multiple scales. Journal of Biogeography, 27, 10311042.
He, H.S., et al., 2007. Mapping pre-european settlement vegetation at fine resolutions using a
hierarchical Bayesian model and GIS. Plant Ecology, 11 (6), 8594.
Hershey, R.R., 2000. Modeling the spatial distribution of ten tree species in Pennsylvania. In: H.T.
Mowrer and R.G. Congalton, eds. Quantifying spatial uncertainty in natural resources. Chelsea,
Michigan: Ann Arbor Press, chap. 10, 119135.
Hobbs, H. and Goebel, J., 1982. Geologic map of Minnesota: Quarternary geology. St. Paul:
Minnesota Geological Survey.
Howell, D.L. and Kucera, C.L., 1956. Composition of pre-settlement forests in three counties of
Missouri. Bulletin of the Torrey Botanical Club, 83 (3), 207217.
Isaaks, E. and Srivastava, R., 1989. An introduction to applied Geostatistics. New York: Oxford
University Press.
Journel, A.G., 1983. Nonparametric estimation of spatial distributions. Mathematical Geology, 15
(3), 445468.
Journel, A.G., 1999. Conditioning geostatistical operations to nonlinear volume averages.
Mathematical Geology, 31 (8), 931953.
Journel, A.G. and Huijbregts, C.J., 1978. Mining geostatistics. New York: Academic Press.

Downloaded by [SLU Library] at 05:49 20 February 2015

International Journal of Geographical Information Science

421

Lam, N.S.N., 1983. Spatial interpolation methods: a review. The American Cartographer, 10 (2),
129149.
Lichstein, J.W., et al., 2002. Spatial autocorrelation and autoregressive models in ecology. Ecological
Monographs, 72 (3), 445463.
Liu, Y., 2007. Geostatistical integration of linear coarse scale and fine scale data. Thesis (PhD).
Stanford University.
Manies, K. and Mladenoff, D., 2000. Testing methods to produce landscape-scale presettlement
vegetation maps from the U.S. public land survey records. Landscape Ecology, 15 (8), 741754.
Minnesota DNR GIS Data Deli, The Department of Natural Resources (DNR) Data Deli 2009.
[online] Available from: http://deli.dnr.state.mn.us (accessed 28 July 2009).
Olea, R., ed., 1991. Geostatistical glossary and multilingual dictionary. New York: Oxford
University Press.
Openshaw, S. and Taylor, P., 1979. Million or so correlation coefficients: three experiments on the
modifiable areal unit problem. In: N. Wrigley, ed. Statistical applications in the spatial sciences.
London: Pion, 127144.
Quattrochi, D.A. and Goodchild, M.F., eds., 1997. Scale in remote sensing and GIS. Boca Rater FL:
CRC Press.
Rathbun, S. and Black, B., 2006. Modeling and spatial prediction of pre-settlement patterns of forest
distribution using witness tree data. Environmental and Ecological Statistics, 13 (4), 427448.
Remy, N., Boucher, A., and Wu, J., 2009. Applied geostatistics with SGeMS: a users guide.
Cambridge University Press.
Saito, H. and Goovaerts, P., 2002. Accounting for measurement error in uncertainty modeling
and decision-making using indicator kriging and P-field simulation: application to a Dioxin
contaminated site. Environmetrics, 13, 555567.
Tobler, W.R., 1979. Smooth pychnophylactic interpolation for geographical regions. Journal of the
American Statistical Association, 74 (367), 519530.
Turner, M., 1989. Landscape ecology: The effect of pattern on process. Annual Review of Ecology
and Systematics, 20, 171197.
Wahba, G., 1990. Spline models for observational data. CBMS-Regional Conference Series in
Applied Mathematics. Philadelphia, PA: Society for Industrial and Applied Mathematics
(SIAM).
Wang, Y.C., 2007. Spatial patterns and vegetation-site relationships of the pre-settlement forests in
western New York, USA. Journal of Biogeography, 34, 500513.
Wang, Y.C. and Larsen, C.P., 2006. Do coarse resolution U.S. presettlement land survey records
adequately represent the spatial pattern of individual tree species? Landscape Ecology, 21 (7),
10031017.
Wu, J., 2004. Effects of changing scale on landscape pattern analysis: scaling relations. Landscape
Ecology, 19 (2), 125138.

You might also like