You are on page 1of 50

Sampling for soil survey

D G Rossiter Department of Earth Systems Analysis International Institute for Geo-information Science & Earth Observation (ITC) <http://www.itc.nl/personal/rossiter> December 28, 2008

Copyright 2008 ITC. All rights reserved. Reproduction and dissemination of the work as a whole (not parts) freely permitted if this original copyright notice is included. Sale or placement on a web site where payment must be made to access this document is strictly prohibited. To adapt or translate please contact the author (http://www.itc.nl/personal/rossiter).

Soil Sampling

Topic: Sampling for soil survey


1. Sampling in routine survey 2. Sampling for detailed survey 3. Sampling for detailed survey, with prior information 4. Sampling for environmental correlation

D G Rossiter

Soil Sampling

1 Soil sampling in routine soil survey


Routine soil survey follows the Discrete Model of Spatial Variability (DMSV): homogeneous soil bodies mapped as polygons conceptually-sharp boundaries So the main aim of sampling is to characterize the soils in each map unit, i.e. legend category set of polygons with the same soil type

D G Rossiter

Soil Sampling

An area-class map

State Soil Geographic Database, Schuyler County, NY (USA)


D G Rossiter

Soil Sampling

Sample is a very small proportion of the population


A soil pit is about 1x2 m surface area; a typical soil borehole (auger hole is 10 cm diameter, so 0.00157 m2 So in 1 ha there are 10000/2 = 5000 potential pit sites, or 10000/(0.052 ) 1273240 potential bore hole sites! Sampling density is usually specied as one eld observation per 1 4 cm2 of map (regardless of map scale) Example: at 1:25 000, 1cm2 = 250 mg 250 mg = 62 500 m2 = 6.25 ha m g So, one observation per 6.2525 ha This is a tiny sampling fraction! How can we make a map with such a low sampling density?

D G Rossiter

Soil Sampling

Representative sampling
Solution: the surveyor uses expert opinion of the soil-landscape model: Soils occur in specic positions because of the specic combination of soil-forming factors (Jenny equation) So, place observations in the most representative (typical, modal, central concept) sites, where the soil class is expected to be best-expressed * Some observations nearby to get an idea of heterogeneity * Maybe some quick observations (not full samples) near boundaries to improve their location

D G Rossiter

Soil Sampling

Block diagram of a soil landscape

Wysocki, D. A., Schoeneberger, P. J., & LaGarry, H. E. (2005). Soil surveys: a window to the subsurface. Geoderma, 126(1-2), 167-180
D G Rossiter

Soil Sampling

Some landscapes to sample

Dorchester, England (GB)


D G Rossiter

Soil Sampling

Truxton, Cortland County, NY (USA)

D G Rossiter

Soil Sampling

Herikhuizerveld, Rheden (NL)

D G Rossiter

Soil Sampling

10

Sampling for associations


In smaller-scale maps (depending on landscape, from 1:50 000 down) we usually expect more than one soil type in each map unit. The map units are usually associations of related soils (e.g. hillslope catena). Then the surveyor observes at the central concept of each component. The proportion of components is estimated by landscape analysis.

D G Rossiter

Soil Sampling

11

2 Soil sampling in detailed soil survey


These are usually grid samples, to completely cover an area of interest. Example: an area of suspected soil pollution. The grid is then interpolated into a raster map, usually by kriging. Two-step sampling: 1. For modelling the variogram 2. For kriging, once the variogram is known Note: the success of kriging depends on a correct variogram model! Note: the variogram may be known from similar studies

D G Rossiter

Soil Sampling

12

Sampling to model spatial dependence


Must have several separations to estimate structure Especially important are some closely-separated observations, to estimate nugget Can use a transect with variable spacing or a 2-D scheme (random directions, xed separations in a hierarchy) Webster, R., Welham, S. J., Potts, J. M., & Oliver, M. A. (2006). Estimating the spatial scales of regionalized variables by nested sampling, hierarchical analysis of variance and residual maximum likelihood. Computers & Geosciences, 32(9), 1320-1333. Lark, R. M. (2002). Optimized spatial sampling of soil for estimation of the variogram by maximum likelihood. Geoderma, 105(1-2), 49-80.

D G Rossiter

Soil Sampling

13

What sample size to t a variogram model?


Stochastic simulation from an assumed random eld with a known variogram suggests: 1. < 50 points: not at all reliable 2. 100 to 150 points: more or less acceptable 3. > 250 points: almost certaintly reliable More points are needed to estimate an anisotropic variogram. This is very worrying for many environmental datasets (soil cores, vegetation plots, . . . ) especially from short-term eldwork, where sample sizes of 40 60 are typical. Should variograms even be attempted on such small samples?

D G Rossiter

Soil Sampling

14

How to design the nested sample


Widest spacing s1 is the station, which are assumed so far away from each other as to be spatially independent * furthest expected dependence . . . * . . . based on the landscape . . . * . . . and expected range of process to be modelled Closest spacing sn is the shortest distance whose dependence we want to know

D G Rossiter

Soil Sampling

15

Geometric series
A geometric series increases terms by multiplication It allows us to cover a wide range of distances (possible ranges) with a few stages. Increase spacing in geometric series: s = s1 sn Fill in series with further geometric means

D G Rossiter

Soil Sampling

16

Geometric series: example


First series: s1 = 600m (stations), s5 = 6m (closest) Intermediate spacing: s3 = 6m 600m = 60m

Series now {600m, 60m, 6m} Fill in with the geometric means * s2 = 600m 60m 190m * s4 = 60m 6m 19m Final series {600m, 190m, 60m, 19m, 6m}

D G Rossiter

Soil Sampling

17

Locating the sample points


Objective: cover the landscape, while avoiding systematic or periodic features Method: random bearings from centres at each stage Stations can be along a transect if desired (no spatial dependence) From a centre at stage i (Ei, Ni), to nd a point (Ei+1, Ni+1) at the next spacing si+1: * = random uniform[0 . . . 2 ] * Ei+1 = Ei + (si+1 sin ) * Ni+1 = Ni + (si+1 cos )

D G Rossiter

Soil Sampling

18

Number of sample points


Number of stations selected to cover the area of interest At each stage Si, the next stage Si+1 has in principle double the samples One is for all the previous centres from stage S1 . . . Si1 and one is for the new centre from stage Si So the total number doubles: half old, half new centres

D G Rossiter

Soil Sampling

19

Unbalanced sampling
After the rst 4 stages, use an unbalanced design Only half the centres at Si (i 4) are further sampled at Si+1 This still covers the area, but only uses half the samples at the shortest ranges Number of pairs is still enough estimate short-range dependence

D G Rossiter

Soil Sampling

20

Number of sample points: example


Five stages {600m, 190m, 60m, 19m, 6m} Nine stations: n1 = 9 Double at stages 2 . . . 4: n2 = 18, n3 = 36, n4 = 72 At stage 5, only use half the 72 centres, i.e. 36 Total at stage 5: 72 + 36 = 108 (would have been 144 with balanced sampling)

D G Rossiter

Soil Sampling

21

Nested ANOVA : Partition Variability by sampling level


Linear model: zijk...m = + Ai + Bij + Cijk + + Qijk...m + ijk...m

Link with regional variable theory (semivariances): m stages; d1 shortest distance at mth stage; dm largest distance at rst stage
2 m 2 2 m1 + m

= = . . . =

(d1) (d2)

2 2 1 + . . . + m

(dm )

F-test from ANOVA table; for stage m + 1 : F = MS m /MS m+1

D G Rossiter

Soil Sampling

22

Nested ANOVA : Interpretation


There is spatial dependence from the closest spacing until the F-ratio is not signicant. Samples from this distance are independent To take advantage of spatial interpolation, must sample closer than this Can estimate how much of the variation is accounted for at each spacing

D G Rossiter

Soil Sampling

23

Grid sampling for kriging


This assumes the Continuous Model of Spatial Variaility (CMSV). So the soil property is modelled as a random eld and the map is made by kriging.

D G Rossiter

Soil Sampling

24

Kriging prediction

Kriging prediction variance

Note: Prediction variance depends only on the spatial conguration of the observations, not on the data value.
D G Rossiter

Soil Sampling

25

Sampling designs with the CMSV: objectives


1. Maximize information Cover the largest possible area at minimum cost Minimize some optimization criterion 2. Minimize costs 3. (Incorporate any existing sample see next subtopic)

D G Rossiter

Soil Sampling

26

What is to be optimized?
An optimization criterion is some numerical measure of the quality of the sampling design. Some possibilities: 1. Minimize the maximum kriging variance in the area: nowhere is more poorly predicted than this maximum 2. Minimize the average kriging variance over the entire area

D G Rossiter

Soil Sampling

27

Optimal point conguration (CMSV)


In a square area to be mapped, given a xed number of points that can be sampled, in the case of bounded spatial dependence: Points should in on some regular pattern; otherwise some points duplicate information at others (in kriging, will share weights) Optimal (for both the minimal maximum and minimal average criteria): equilateral triangles (If the triangle is 12, max. distance to a point = 7/4 0.661) Sub-optimal but close: square grid (max. distance = 2/2 0.707)

* Grid should be slightly perturbed so samples do not line up exactly; avoids unexpected periodic eects (Problems: edge eects in small areas; irregular areas.)

D G Rossiter

Soil Sampling

28

Optimal point conguration in the presence of anisotropy


Optimal designs are easily adjusted for anisotropy (dierent range of spatial dependence in two orthogonal axes) The regular grid may be adapted for ane or geometric anisotropy: stretch it in the direction of maximum dependence, based on the anisotropy ratio. E.g. for a ratio of 0.5, squares become rectangles, with the distance in the direction with the longest range twice that of the shortest range.

D G Rossiter

Soil Sampling

29

Computing an optimal grid size


Reference: McBratney, A. B. & Webster, R. (1981) The design of optimal sampling schemes for local estimation and mapping of regionalized variables I and II. Computers and Geosciences, 7(4), 331-334 and 335-365; also in Webster & Oliver. Key point: In kriging, the estimation error is based only on the sample conguration and the chosen model of spatial dependence, not the actual data values So, if we know the spatial structure (variogram model), we can compute the maximum or average kriging variances before sampling, i.e. before we know any data values. This is known as OSSFIM from the original articles.

D G Rossiter

Soil Sampling

30

Error variance
Recall: The kriging variance at a point is given by: 2 (x 0 ) = = bT
N N N

2
i=1

i(xi, x0)
i=1 j=1

ij (xi, xj )

This depends only on the sample distribution (what we want to optimise) and the spatial structure (modelled by the semivariogram) In a block this will be lowered by the within-block variance (B, B)

D G Rossiter

Soil Sampling

31

Reducing kriging error


Once a regular sampling pattern is decided upon (triangles, rectangles, . . . ), the kriging variance is decreased in two ways: 1. reduce the spacing (ner grid) to reduce semivariances; or 2. increase the block size of the prediction These can be traded o; but usually the largest possible block size is selected, based on the mimimum decision area.

D G Rossiter

Soil Sampling

32

Error as a function of increasing grid resolution


Consider 4 sample points in a square To estimate is one prediction point in the middle (furthest from samples highest kriging variance) Criterion is minimize the maximum prediction error If the variogram is close-range, high nugget, low sill, we need a ne grid to take advantage of spatial dependence; high cost If the variogram is long-range, low nugget, high sill, a coarse grid will give similar results

D G Rossiter

Soil Sampling

33

Kriging variances at centre point

120
1

120
1

100

0.9

100

0.9

0.8

0.8

80 block.size
0.7

80 block.size
0.7

60

0.6

60

0.6

40

0.5

40

0.5

0.4

0.4

20
0.3

20
0.3

100

200 spacing

300

400

100

200 spacing

300

400

long range variogram (1200 m)

short range variogram (600 m)

D G Rossiter

Soil Sampling

34

3 Sampling with prior information


Problem: how to optimally place a limited number of observations in a study area in order to extract the maximum information at minimum cost. We consider here the information to be a map over some study area, made by ordinary kriging from the sample points; so the assumptions of the CMSV must be met. Reference: van Groenigen, J.-W. (2000). The inuence of variogram parameters on optimal sampling schemes for mapping by kriging. Geoderma, 97(3-4), 223-236. also contained in the PhD thesis: van Groenigen, J.-W. Constrained optimisation of spatial sampling Enschede, NL: ITC.

D G Rossiter

Soil Sampling

35

Problems with the optimal grid


The optimal grid presented in the previous section is optimal only in restricted circumstances. There are many reasons that approach might not apply: Edge eects: study area is not innite Irregularly-shaped areas, e.g. a ood plain along a river O-limits or uninteristing areas, e.g. in a soils study: buildings, rock outcrops, ditches . . . Existing samples, maybe from a preliminary survey; dont duplicate the eort! Impossible to compute an optimum analytically (as for the regular grid on an innite plane).

D G Rossiter

Soil Sampling

36

Annealing
Slowly cooling a molten mixture of metals into a stable crystal structure. During annealing the temperature is slowly lowered. At high temperatures, molecules move around rapidly and long distances At low temperatures the system stabilizes. Critical factor: speed with which temperature is lowered too fast: stabilize in a sub-optimal conguration too slow: waste of time

D G Rossiter

Soil Sampling

37

Simulated annealing
This is a numerical analogy to actual annealing: Some aspect of a numerical system is perturbed The conguration should approach an optimum The amount of perturbation is controlled by a temperature

D G Rossiter

Soil Sampling

38

Outline of SSA
1. Decide on an optimality criterion 2. Place the desired number of sample points anywhere in the study area (grid, random . . . ); compute tness according to optimality criterion 3. Repeat (iterate): (a) (b) (c) (d) Select a point to move; move it a random distance and direction If outside study area, try again Compute new tness If better, accept new plan; if worse also accept with a certain probability

4. Stop according to some stopping criterion

D G Rossiter

Soil Sampling

39

Example of a single step


Colour ramp is from blue (low kriging variance) to red (high). Point at lower right is moved to middle-bottom:

A large hot area (high kriging variance) is now cooler.

D G Rossiter

Soil Sampling

40

Temperature
The distance to move a point is controlled by the temperature; this is used to multiply some distance. Tk+1 = Tk (1)

where k is the step number and < 1 is an empirical factor that reduces the temperature; we must also specify an initial temperature T0.

D G Rossiter

Soil Sampling

41

Fitness
Several choices, all based on the kriging variance: Mean over the study area (MEAN OK) * appropriate when estimating spatial averages to a given precision Maximum anywhere in the study area (MAX OK) * appropriate when the entire area must be mapped to a given precision, e.g. to guarantee there is no health risk in a polluted area.

D G Rossiter

Soil Sampling

42

Stopping criterion
Possiblities: xed number of iterations reach a certain (low) temperature after a certain number of iterations with no change.

D G Rossiter

Soil Sampling

43

Acceptance criterion
Metropolis criterion: the probability P (S0 S1) of accepting the new scheme is: P (S0 S1) P (S0 S1) = = 1, if(S1) (S0) exp (S0) (S1) , if(S1) > (S0) c (2)

where S0 is the tness of the current scheme, S1 is the tness of the proposed new scheme, and c is the temperature. This can also be written: p = ef /Tk (3)

where Tk is the current temperature and f is the change in tness due to the proposed new scheme. Note that this will be positive for a poorer solution, so its complement is used for the exponent.

D G Rossiter

Soil Sampling

44

A real example
Industrial area, existing samples; more must be taken to lower the prediction variance to a target level everywhere; where to place the new samples?

Reference: van Groenigen, J. W., Stein, A., & Zuurbier, R. (1997). Optimization of environmental sampling using interactive GIS. Soil Technology, 10(2), 83-97
D G Rossiter

Soil Sampling

45

4 Soil sampling for environmental correlation


We want to make a set of observations of soil properties, from which to build regression models from a set of environmental covariables, e.g. terrain parameters digital imagery climate-related layers (elevation, aspect . . . ) For example, if z is some soil property: z = f( z , CTI, z, . . . ) xy

So the soil observations must somehow represent this feature-space, as well as geographic space, eciently.

D G Rossiter

Soil Sampling

46

Regression modelling
1. Simple (one dominant factor) 2. Multiple 3. (Stepwise: automatic selection of predictor set dangerous!) 4. Standardized principal components: removes multi-colinearity (inter-correlated predictors), measurements on dierent scales Generally linear models are used; may linearize some predictors if necessary.

z = o +
i=1

iqi

(See standard regression textbooks)


D G Rossiter

Soil Sampling

47

Feature-space sampling schemes


The aim is to eciently sample combinations of feature-space predictors. But: not all combinations are found in nature (e.g. steep slope + high TWI) combinations occupy dierent proportions of the area Latin hypercube: Minasny, B., & McBratney, A. B. (2006). A conditioned Latin hypercube method for sampling in the presence of ancillary information. Computers & Geosciences, 32(9), 1378-1388.

D G Rossiter

Soil Sampling

48

Mixed sampling schemes


Try to optimize sample placement in both feature and geographic space. Simulated annealing Brus, D. J., & Heuvelink, G. B. M. Optimization of sample patterns for universal kriging of environmental variables. Geoderma, 138(1-2), 8695

D G Rossiter

Soil Sampling

49

Designing a sampling plan


1. Dene the study area 2. Determine the objectives of the sampling inferring spatial processes? mapping? decision support? 3. Dene costs (budget) vs. benets (precision needed) 4. Decide on any stratication by dierential objectives, costs, benets Good luck!

D G Rossiter

You might also like