You are on page 1of 6

TUTORIAL

DATA COVARIANCE MATRICES IN


SEISMIC SIGNAL PROCESSING
By Rodney Lynn Kirlin
Electrical and Computer Engineering Department, University of Victoria, Victoria, British Columbia

INTRODUCTION vector windows taken from the analysis window. The sample vec-
Seismic signals are sensed by geophones in land acquisition or tors yield a sample covariance matrix from which attributes of the
by hydrophones in marine acquisition. The recorded data is volu- center point of the analysis window are derived. A region of 2-D
minous and much processing is applied to order the data and data from which information is to be obtained is called an analysis
reduce it to interpretable displays. Data covariance matrix analysis region. When this process is applied in 3D, the moving analysis
has many uses in the processing of geophysical signals. We will window is an incremental volume spanning 3 to 20 traces or so and
specifically present its use in linearly transforming seismic traces 10 to 30 time samples.
into information having a much lower dimensionality and greater
significance. We will demonstrate the methodology of covariance Figure 2 displays various shapes of analysis regions: (a) two
matrix analysis and relate the covariance matrix structure to the
physical parameters of interest in just a few simple examples, but
numerous applications are given in our major reference (Kirlin and
Done, 1999) and the open literature. For example the semblance
coefficient is a well-known measure of coherence, normally dis-
played in the velocity spectrum. However, recent uses of covari-
ance analysis have produced 3D images of coherence that have
been quite successful in data interpretation and visualization
(Marfurt et al., 1998).

DATA VECTORS
To begin to discuss covariance matrix analysis we introduce
analysis regions, data windows, data vectors, and covariance matri-
ces as shown in Figure 1 where we find 1) an analysis region span-
Figure 2: (a) Two NMO-corrected seismic gathers with windows that character-
ize, (b) reflected energy, (c) outgoing surface wave noise, (d) backscattering noise,
and (e) pump jack noise.

NMO-corrected seismic gathers with windows that characterize (b)


reflected energy, (c) outgoing surface wave noise, (d) backscattering
noise, and (e) pump jack noise. These exemplify regions in which
data vectors may have varying degrees of maximally overlapping
vector windows. It must be realized that smaller windows will
allow more overall spatial or temporal adaptivity in the output
parameter estimates, but give larger statistical variance in those
estimates because fewer vector windows of a fixed size will fit in
the analysis window. The larger the vector window, the greater the
spatial or temporal frequency resolution will be.
Figure 1: An analysis region spanning many traces and time samples; in 3D this
would be a data cube. The moving analysis window centers sequentially at each Within each analysis window, data vectors are composed of the
point within the region. Sample vector windows within the moving window are samples from sub-windows of the running analysis window.
used to create a covariance matrix from which attributes of the current center Commonly, the vector is M x1 in size, covering either M time points
point are derived.
from the same trace (down the trace)allowing only temporal analy-
sis, or M points taken from M traces at the same time (time slice,
ning many traces and time samples, 2) a moving analysis window snapshot, or “across traces’’), allowing only spatial analysis. The
centered sequentially on every point in the region and 3) sample

Continued on Page 19

0DWHULDODGDSWHGIURP´&RYDULDQFH$QDO\VLVIRU6HLVPLF6LJQDO3URFHVVLQJµ
18 CSEG Recorder April, 2001
‹6RFLHW\RI([SORUDWLRQ*HRSK\VLFLVWV8VHGZLWKSHUPLVVLRQ
TUTORIAL Cont’d
DATA COVARIANCE
Continued from Page 18

vector window may also be two dimensional,


such as N x M, resulting in a length NM vector
With any waves
and allowing spatio-temporal analysis.

The mapping of vector windowed data


points to the vector elements is arbitrary, for
example, a 2 x 5 vector window surrounding
you need power
points of data x(i,j), i = 10 to 11 and j = -2 to 2
where i is time index and j is trace index, simi-
lar to that shown in Figure 2.1, would become
the 2M = 10 by 1 vector

x = (x (10, –2), x (10, –1), ..., x (10, 2), x


(11, –2), ..., x (11, 2))T
and control.
T
x = (x1 x2 x3 ...x 10) ,

where [ . ]T indicates transpose. Subsequently,


we will use [ . ]H to indicate complex conjugate
transpose. In Figure 1 the 2 x 5 vector window
creates the 10 x 1 vector xi.

The Sample Data Covariance Matrix


Within the analysis window and if mean x =
0, the averaged sum of vector outer products,
xi x Hi , i = 1, 2, ..., L, gives the sample covariance
matrix Cx. However, the vectors may not
always have come from a time space window
as discussed above. A collection of L such vec-
tors from within a data window may be aver-
aged in the outer product to give the sample
covariance matrix

(1.1)

where we have written the vectors xi each into


distinct columns of a data matrix, X. When the
vector mean is not zero, the mean must be first
WinPICS Pro
subtracted from x before forming the outer 2D/3D Seismic Interpretation
products. Otherwise the result is the sample
correlation matrix. Because in practice seismic
data are zero mean, we generally have no need
to estimate or remove any mean.
Sail through data from start to finish.
The rank of the sample covariance matrix Cx
is the same as the number of independent vec-
tors x from which it was created, up to a maxi-
mum of M. The rank of a covariance matrix R
is the same as the number of eigenvalues
greater than zero. SEE OUR EXHIBIT AT 2001 CSEG CONVENTION

Rationale for Sample Covariance Analysis Phone 403 264-5454 kernelsales@kerneltech.com www.kerneltech.com
The sample covariance matrix arises from
many areas of science, engineering, and statis- Stratigraphic Analysis • Facies Determination • Inversion • Reservoir Characterization
tics: it is needed in multivariate data analysis,

Continued on Page 20
April, 2001 CSEG Recorder 19
TUTORIAL Cont’d
DATA COVARIANCE MATRICES IN SEISMIC SIGNAL PROCESSING
Continued from Page 19

pattern recognition, least square problems, hypothesis testing, (1.7)


parameter estimation, etc. For example, if we draw one such length
M vector x from an N (m, R) distribution (normal Gaussian multi- where V is the matrix of eigenvectors, V = (v1 v2 ... vm), and Λ is a
variate vectors x having mean m and covariance R), the probabili- diagonal matrix of eigenvalues, i.e., Λ = diag (1, 2, ..., M).
ty density function of the samples x is (Eaton, 1983). Expression (1.7) contains both the eigenvector expansion
and the factored form VV of the covariance matrix. Noise and a
finite number of sample vectors cause Cx to have eigenstructure
(1.2) that is almost never identical to the that of the true covariance
matrix Rx, but the eigenstructure of Cx is asymptotically the same
Because data is often Gaussian or nearly so because of various phys-
for SNR or the number of samples L approaching infinity. In prac-
ical reasons (for example computed Fourier components are nearly
tice the effects of the differences do appear as we will see , and we
Gaussian in almost all cases), it is satisfying that only the mean vec-
will distinguish the two when necessary.
tor and covariance matrix are required to describe the data distri-
bution. Because we don’t know the covariance matrix, it may be
The vectors are orthonormal, meaning that
estimated with the sample covariance. Suppose we have L inde-
pendent samples of x, indexed with i, each with density f (xi) = N(m, viH vj = ij, (1.8)
R). The joint density of these L vector samples is their product:
or that the set of eigenvectors is orthonormal: each has unit energy
and any two are orthogonal.
(1.3)

If L < M, the number of vectors from which Cx is created will


We refer the reader to (Kirlin and Done, 1999) for a more detailed
only produce no more than r = L eigenvalues that are greater than
presentation of the statistics of the sample mean and the sample
0. The rank r of a matrix is the number of independent vectors that
covariance matrix, the normal distribution when x is complex, and
may have created it through outer product summation such as
various robust methods for enhancing the estimate of the covari-
given in (1.1). When L > M, there can be at most M independent vec-
ance matrix. Hereafter we generally assume m = 0.
tors and r ≤ M. Note that the eigenvectors vi are generally different
from the vectors xi.
Eigenstructure and Least-Squares Fit of a Random Vector
To review eigenstructure and simultaneously demonstrate one Rectangular matrices may have been created with similar r outer
of its uses, first consider that we have samples xk ,k = 0, 1, ..., L - 1, products of pairs of vectors. Singular Value Decomposition allows
from a distribution of zero-mean real vectors and that each vector is determination of such vector outer products and their associated
M x 1 elements. Suppose we wish to find the best vector v of unit scalar weights.
length such that the projection onto v of any of the vectors x chosen
at random will be closest to a scalar multiple of v in the mean- Singular-Value Decomposition and the Karhunen Loeve
squared-error sense. That is, we need to find v such that E {(x - x)2}is Transform
ˆ
minimized, where x = cv, and c = vTx, where c is a constant scalar,
ˆ Singular-Value decomposition (SVD) allows the factorization of
while constraining vTv = 1. The solution of this constrained mini-
any M x P, P < M, rectangular matrix X into orthogonal components.
mization shows that v is the eigenvector associated with the largest
These orthogonal or orthonormal components of the columns of U
eigenvalue  of Rx,
and rows of V, where U is M x P and V is P x P, and a rectangular
Rxv =  v, (1.4) matrix S which is P x P:

then for the largest value  and associated v satisfying equation


(1.9)
(1.4)

Rxv =  v, (1.5) It may be shown (Kirlin and Done, 1999) that U and V can be
found using the partitioned matrix notations
gives minimum E {(x - x)2}. Note that the scale factor on v is c = vTx.
ˆ
When the L samples of xk are drawn from an infinite set, the sample (1.10)
covariance Cx replaces Rx . There are M eigenvalues and M asso-
ciated eigenvectors v that satisfy equation (1.4). We will hereafter
assume that the eigenvectors are ordered such that.
(1.11)
1 ≥ 2 ≥ ... ≥ M (1.6)

The covariance matrix or the sample covariance matrix may be Equations (1.10) and (1.11) comprise the SVD of X. The P
expanded or factored into its eigenstructure forms: columns of U and V are respectively the left and right hand singu-
lar vectors ui and vi, and the λi are the singular values of X. The last

Continued on Page 22
20 CSEG Recorder April, 2001
TUTORIAL Cont’d
DATA COVARIANCE MATRICES IN SEISMIC SIGNAL PROCESSING
Continued from Page 20

m-r of the λi are zero indicating zero energy contribution from the to the others except for an independent additive noise and interfer-
higher dimensions. The vectors in U and V are the eigenvectors of ence vector ni. That is,
XXH ~ Cx and XHX respectively, but after finding the r non-zero xi = s + ni, i = 1, 2, ..., L (1.15)
eigenvalues from XHX and their associated eigenvectors, the
columns in V1, the singular vectors U1 which are the eigenvectors where s is a signal reflection vector, and ni is a zero-mean spatially
of XXH ~ Cx can more easily be found from the equation and temporally white random noise vector, independent from sam-
ple to sample and trace to trace. We now show that the first singu-
U1 = XV1Λ1-1 (1.12) lar vector u1 of the data matrix X, whose columns are the xi From X
we find the singular vectors and singular values of XXH = U1Λ1U1H.
The process demonstrates that the SVD provides a great com- However, XXH is composed of singla matrix S = (s s ... s) = s (1 1
... 1) and noise matrix N = (n1 n2 ... nL) parts, giving
putational advantagae for finding the significant eigenvectors of
XXH ~ Cx when M > P because we are not generally interested in
finding the eigenvectors whose eigenvalues are zero. The squares of
the singular values λi are the eigenvalues of XHX and XXH. Another (1.16)
use of SVD is that it allows any of the columns xi in X to be written
as a linear combination of the singular vectors u. Thus
The matrix SSH = LssH is clearly a rank one matrix. Now the sta-
tistical mean of the cross terms 2Re {SNH} is zero, while SSH = LssH
(1.13) and NNH = o2I, where o2 is the variance of noise on each trace.
Thus, as the number of traces L increases,

The transformation UH on any x constitutes the Karhunen Loeve (1.17)


Transform (KLT), and the vector UH x contains the principal compo-
nents of xi. Principal components are seismic attributes themselves It is easily shown from (1.17) that s is an eigenvector of R:
and can often display meaningful information. If the data matrix X
has rank P and only the first r singular vectors are used in its (1.18)
approximation, then the rank r approximation of X is given through

(1.14) where Es is the energy in the signal trace. We note that the first
eigenvector v1 of R is the least-squares fit to the set of xi and that 1
where Ur, Λr, and Vr are composed of the eigenvectors from U and = (Es + o2 ) is the associated and largest eigenvalue. It can be shown
V, associated with the r largest magnitude eigenvalues of Λ. Now that the eigenvalues are
we consider the following example which indicates a process that
utilizes the concept of signal subspace and noise subspace. (1.19)

Example 1 The question of how close the first eigenvector v1 of the sample XXH
is to s/(sHs)1/2 is answered in the literature and discussed in Chapter
Suppose neighboring stacked data has been ideally flattened as 4 of (Kirlin and Done, 1999).
shown in figure 3. Thus ideally each length M vector x is identical
We point out that the rank r = 1 of SSH is the dimension of the signal
subspace, and that all vectors in the signal subspace of the data X are
a scaling of just one eigenvector, v1. The other M-1 dimensions in
the data described by the other eigenvectors carry only noise. The
above seismic example typifies a situation for which the major
eigenvector of XXH or equivalently the left singular vector u1 of X is
proportional to the signal that is identical in a set of horizon flat-
tened traces.

Example 2
In the second example vectors are taken across traces as shown
in Figure 3. There are M traces of length P so that the vectors are of
length M and there are at least P sample vectors. Note that the cre-
Figure 3: Five nearly flattened traces having the same signal content but some
independent noise. Each trace has samples that become elements of a vector sam- ated vectors are again column vectors even though we have taken
ple, xi = s + ni. Example 1 uses parallel data vectors taken down traces. Example them across traces. As before, we first assume that the traces have
2 uses parallel data vectors taken across traces (time slices). been well flattened to some true event, so that each of P data vec-
Continued on Page 23
22 CSEG Recorder April, 2001
TUTORIAL Cont’d
DATA COVARIANCE MATRICES IN SEISMIC SIGNAL PROCESSING
Continued from Page 22

tors xi is composed of the same random constant s(ti) plus a spatial- Keeping in mind that the sum of all eigenvalues is the same as
ly and temporally white noise vector, i.e. the trace (Tr) of the matrix, we can rewrite the denominator of the
first right hand expression in (1.23) as Tr [Cx]. Further if we utilize
(1.20) the knowledge that the reflections have been properly flattened,
then we know that v1 = 1, so we can find the eigenvalue 1
where 1 = (1 1... 1)T of length M. With all xi, composing the columns
using 1 = . Thus we have in matrix form a coherence mea-
of X, we now find the eigenstructure of XXH, an M x M matrix by
sure.
writing

+ cross terms (1.21) (1.24)

where Ds = diag (s(t1), s(t2), ..., s(tp)), possibly complex valued, and We have shown (Kirlin, 1992) that this is identical to the con-
the ith column of N is (n1(ti), n2 (ti) ..., nM(ti))T, for i = 1, 2, ..., P. We ventional semblance algorithm (Neidell and Taner, 1971), wherein
again see that with large P, P-1 XXH (M x M) approaches the data matrix X at each two way travel time is sequentially pro-
= MR and the eigenvalues of R are as example 1: duced from values in a suite of trial rms velocities around which a
time window is formed (see figure 4). When the correct trial veloc-
(1.22) ity is used, the data is properly flattened and will correspond to the
situation in example 2, but otherwise, the first eigenvector is not
The only difference is that in (1.22) the expected signal energy 1, and the numerator in (1.24) decreases in value. The numera-
in a vector is specifically given by Es = , where is the tor can be interpreted as correlating a presumed eigenvector 1
variance of the signal in the traces. Also in this example the major with the eigenvectors of the covariance matrix. When there is a per-
eigenvector v1 = (1, 1, ..., 1)T, rather than a scaling
of s as in example 1. Continued on Page 24

Now note that SVD of the down trace data


vectors laid out as in example 1 can simultane-
ously find eigenstructures for both example 1
and example 2. That is, if the time sample
length of a trace were greater than the number
2 H
of traces, X = U V would yield the eigenvec-
tors U for example 1 and the eigenvectors V for
example 2, and only the first singular value and
associated left and right singular vectors need
to be found. The first singular vector u1 is ideal-
ly the normalized s in example 1, and the first 2001 CSPG CONVENTION
singular vector v1 is ideally 1 in example
2.
ROCK
A Coherence Measure
From example 2 we have seen that if a sig- THE FOUNDATION
nal is perfectly flattened, then the covariance
analysis of XXH will give us a first eigenvector NEW AD
v1, that is, 1. The associated eigenvalue is

but all the other eigenvalues


FILM SUPPLIED
have value . The ratio of the first eigenval-
ue to the sum of all is

(1.23)

April, 2001 CSEG Recorder 23


TUTORIAL Cont’d
DATA COVARIANCE MATRICES IN SEISMIC SIGNAL PROCESSING
Continued from Page 23

fect match with v1 of the windowed data covariance matrix, the References
result is maximized. Another interpretation is that the true signal Eaton, M.L., 1983, Multivariate statistics: John Wiley & Sons, Inc.
dimensions become more than one (rank r > 1) when the event is
not correctly flattened, and the signal energy is spread out into Kirlin, R.L. and W.D. Done editors, 1999, Covariance Analysis for
those dimensions, causing the associated eigenvalues to be more Seismic Signal Processing, Society of Exploration Geophysicists,
like the first, so that the eigenvalue ratio or coherence measure 2 Tulsa.
in (1.23) must decrease. We note that 2 in (1.23) can never go to
zero, but can be 1/M. Modifications (Kirlin, 1992) can make it Kirlin, R.L., 1992, The relationship between semblance and eigenstruc-
range in [0, 1]. ture velocity estimators, Geophysics, 57, 1027-1033.

Marfurt, J.J., R.L. Kirlin, S.L. Farmer and M.S. Bahorich, 1998, 3-
D seismic attributes using a semblance based coherencey algorithm,
Geophysics, Vol. 63, No. 4, 1150-1165.

Neidell, N.S., and Taner, M.T., 1971, Semblance and other coherence
measures for multichannel data: Geophysics, 36, 482-497. R

Figure 4: Time windowed traces from which data X is obtained for use in the sem-
blance algorithm or its equivalent eigenvalue version. X varies with trial rms Biography
velocities.
R. Lynn Kirlin received his B.Sc.
(1962) an M.Sc. (1963), from the
An example output of an eigenvalue based coherence measure University of Wyoming and a Ph.D.
is shown in figure 5. The eigenvalue ratio as in the last expression (1968) from Utah State. His indus-
of (1.24) is easy to find and forms the basis of the simplest and trial experience includes advanced
fastest and in some major respects the most effective coherence space communications systems at
measure for use in the coherence cube (Marfurt, et al., 1998). The Marlin-Marietta and Boeing from
reason coherence is effective is that at most points of reflection, 1963-1966, data communications
neighboring traces are very similar and have high coherence. It is and computer peripherals at Datel
exactly points where neighboring traces are quite different that in 1969, and signal and image pro-
are of considerable interest to the interpreter. cessing applications software at Floating Point Systems, in
1979. He was with the EE Department at University of
Wyoming from 1969 to 1986 prior to joining the ECE
Department at the University of Victoria, where he has con-
tinued research and contract work in many areas of appli-
cations of statistical signal processing, but most
concentration has been on seismic and array signal pro-
cessing, collaborating with Amoco since 1983.

Figure 5: Example of coherence slice at a horizon of interest. Darker elements


indicate lowest coherence and denote faults or other lateral discontinuities.

24 CSEG Recorder April, 2001

You might also like