Professional Documents
Culture Documents
Neural networks can be viewed as applications that map one space, the input space, into some
output space. In order to simulate the desired mapping the network has to go through a learning
processconsistingof an iterative change of the internal parameters, through the presentation of
many input patterns and their corresponding output patterns. The training processis accom-
plished if the error between the computed output and the desired output pattern is minimal for
all examples in the training set. The network will then simulate the desired mapping on the
restricted domain of the training examples. We describe an experiment where a neural network
is designedto accept a syntheticcommonshot gather (i.e., a set of seismogramsobtained from a
single source), as its input pattern and to compute the correspondingone-dimensionallarge-scale
velocity model as its output. The subsurfacemodels are built up of eight layers with constant layer
thicknessover a homogeneoushalf-space, 450 examples are used to train the network. After the
training processthe network never computes a subsurface model which perfectly fits the desired
one, but the approximation of the network is sufficient to take this model as starting model for
further 'seismicimaging algorithms. The trained network computes satisfactory velocity profiles
for 80% of the new seismicgathers not included in the training set. Although the network gives
results that are stable when the input is contaminated with white noise, the network is not robust
against strong, i.e., correlated, noise. This application proves that neural networks are able to
solve nontrivial inverse problems.
requires that the arrival times are picked with high accu- layer cannot be connected and that there are only connec-
racy, which is impossiblein the presenceof noiseor complex tions between neurons of successivelayers. However, there
subsurface structures. The exact arrival is often hidden in may be an arbitrary number of intermediate or hidden lay-
the noise, and signals can overlap. ers, eachcontainingan arbitrary numberof neurons(Figure
Other methods, based on the use of full waveforms, like 1). Each neuron of our network can be consideredan opera-
velocity analysis, can be used to determine the background tor, receivingreal numbers as input and transforming them
velocity model. Velocity analysis starts with a reasonable into one output value. The output is transmitted by the
estimation of the velocity model, obtained through informa- links, to connect the neurons. On each link a real number,
tion of borehole logs and additional geologicalknowledge. the weight, is defined. Before an output value is transmit-
The theoretical arrival times of the seismicsignalsare com- ted, it is multiplied by the correspondingweight. Thus the
puted for this model and superimposedon the observedseis- weight reflects the strength of the individual connections
mic section. If the theoretically predicted quasi-hyperbolas [Hebb,1949]. Modifying the weightvaluesby repeatedap-
match the observed ones in the seismic record, the veloc- plication of learningrules allowsthe network to approximate
ity model is correct, otherwise the velocity model has to be the function mapping the input patterns on the desiredout-
changedby trial and error, until a sufficientfit between the put patterns.
theoretical and observedhyperbolas is achieved. This anal- Hornik et al. [1989] rigorouslyestablishedthat feedfor-
ysis does not always give an accurate velocity estimation of ward neural network, build up of one hidden layer, with an
the subsurface.More sophisticatedtechniques,for instance, arbitrary output function and an output layer with a lin-
the iterative migration velocity analysis[AI-Yah•la,1989], ear output function, is capableto approximateany ordinary
also require human intervention at each step and the sub- (Borel measurable)function. The accuracyof the approxi-
jective interpretation may lead to errors in the background mation is determined by the number of hidden neurons.
velocity. Their resultsremain valid if we replacethe linear output
While the inverse problem posesdifficulties, the forward function in the output layer by a sigmoidaloutput function
problem of computing synthetic seismogrmnsfrom given [Lapedesand Father, 1988]. This is mostly done when the
depth velocity profiles is numerically simple. It is possible network is used for classification tasks. In this case the out-
to generatea large number of Earth modelsand to compute put value of the neuronsin the last layer are I (or closeto
their synthetic seismograms. These seismic section-Earth 1) if a featureis presentin the input pattern or 0 (or close
model pairs can constitute a training set for neural networks. to 0) if not.
We will describe an experiment where the neural network Although we do not apply neural networks to classifica-
is set up to accept as input a commonshot gather (i.e., a set tion problems,but to retrieve continuousoutput values, a
of seismogramsobtained from a singlesource)and to com- sigmoidalfunction is usedin the output layer. If the desired
pute as output an Earth model consistingof eight layersover output valuesare scaledto be in the quasi-linearrange of
a half-space.As any arbitrary (one-dimensional)subsurface the sigmoidalfunction, experimentsperformeddo not show
modelcan be approximatedby a stackof (sufficientlysmall) significant differences between neural networks with a lin-
layerswith constantthickness,we assumein our application ear or a sigmoidaloutput function (M.M. Poulton,personal
that the layer thicknessof the eight layers is constant and communication, 1993). In our tests we have observedthat
known. the training processof a networkwith linear output neurons
After the training processthe network is not only able is less robust.
to retrieve the desired velocity values from the learned ex- We want to retrieve seismic velocities which are bounded.
amples, but also to estimate the correct velocity profile for One of the smallest velocities is found in the weathered layer,
new seismic sections, not present in the training set, with about 100 m/s, and can increasefor stony salt to about
an 80% rate of success.This application provesthat neural 6500 m/s. The use of the sigmoidalfunction boundsthe
networks can be applied to data setswith a large number of output values and eliminates the possibility of computing
input parameters and can estimate the desired output values nonphysicalsolutions.However,we haveto scalethe desired
with satisfactory accuracy. output valuesto be approximatelyin the linear rangeof the
We were not able to decipher the internal behavior of the sigmoidal function.
network and the way it analyzes seismicdata. It is not clear In our simulations, we fix the output values of neurons
whether the neural network picks travel times, computes in the input layer by the input pattern itself. All input
differential seismograms,etc.. Much more detailed analysis values of the neurons in the following layers are computed
should be carried out before any seriousattempt to answer by summing up all incoming values, which are all output
that question is made. valuesof the previouslayer multiplied by the weightingfac-
In the present context it is difficult to compare neural tor defined on the correspondingconnections. The output
networks to classicalanalysis techniquesin terms of speed value of a neuron is then computed by applying a sigmoidal
and accuracy. Velocity analysis is mostly done by hand, thresholdfunctionto the input value(seeFigure2). In this
while trained neural networks are stand alone systemsand ;vay information flows forward from the first or input layer
do not allow any inferenceof an interpreter. throughthe hidden layer(s) to the last or output layer. This
forward propagation of an input pattern is describedin de-
INTRODUCTION TO MULTILAYERED NEURAL NETWORKS
tail in Appendix A.
Once the architecture and forward propagation rules are
Neural networks are dynamic systemsof a large number defined, it is necessaryto train the network. The network
of connectedsimple processingunits, called neurons (see has to computefrom an input pattern presentedto the input
Figure 1). In our approach,we work with multilayered,par- layer an output pattern which is closeto the desired one.
tially connected, neural networks. This means that the neu- The general idea of supervisedlearning is that the user has a
rons are arranged in layers such that two units in the same largenumber (say E ) training patterns,whichmeanspairs
R•TH AND TARANTOLA: NEURAL NETWORKS AND INVERSION OF SEISMIC DATA 6755
1 1
0 =I
i i
2 1 1
I =EW 0
i j ij j
2 2
o --g(I )
i i
3 2 2
I =EW 0
i j ij j
3 3
o i =g(I.• )
Fig. 1. Multilayered, partially connected,neural network. The first layer (on the top) is the input layer and the
last layer (on the bottom) is the output layer. In between there is one hidden layer. On the right-hand side,
the forwardpropagation
rulesfor this particularnetworkare shown.The inputsfor the inputlayerI• arefixed
bytheinputpattern
vectorIP. I• '2'adenote
theinputfora neuron
i in theinput,hidden
andoutputlayer,
respectively.
Theoutputvaluefor a neuroni in the inputlayerO• equalsthe corresponding
inputvalues,while
the outputvaluesOy andO• for neurons in the hiddenor in the outputlayer,respectively,
are computedby
applyinga sigmoidal
functiong(.) to theinputvalue.W.•'.2 denotetheweightvaluesbetweenthe neuroni andj
in the previous layer, between the input to hidden and hidden to output layer, respectively.
of input patterns and their corresponding desired output mapping correspondsto the solutionof the inverseproblem.
patterns. Let O/•, be the outputvalue,for the ath example,of
The training processcan be formulated as an optimization the ith neuron of the output layer L, and let OPi, be the
problem, consistingof finding a set of weights W which desired component of the output pattern for this neuron.
minimize the error between the computed and desired out- The misfit value M, depending only of the weights W, is
put patterns for all the examples presented to the network. defined to be
In other words, the goal is to train the network to perform a E N(L)
mapping, fi'om the input space,containingthe input pattern
vectorsas elements,to the output space,containingthe out-
- c•=l i=1
, (1)
we can, at least in theory, constructneural networkscapable velocitymodel of the subsurface.We also applied neural
of complex interpretations. networksto retrievean entireone-dimensional (l-D) depth-
velocityprofilefrom a zerooffset(singletrace)seismogram
RETRIEVING THE BACKGROUND VELOCITY FROM SEISMIC [RSthand Tarantola,1991]. Both cases,as interesting as
DATA they may be to study the behaviorof neural networks,are
not realistic approximations of seismicfield data. Real seis-
In previous applications we have shown that neural net- lnic data are neither an assemblyof travel time valuesnor
works can be trained to associate arrival time data to a only recordedon one receiver.Field data are denselytime
Seismogram 1
Desired vs Learned Model I
0.5 I 1.5 2 2.5 0 ' •6o ' 4•o - •o ' •o ',o'oo-•ioo', 4'oo-,
Time (s) Depth (m)
Seismogram 2
Desired vs Learned Model
0.5 1 1.5 2 2.5 480 ' 6•0 - S•O -I0'00-i2'00' 14'00'1•00' 1600
Time (s) Depth (m)
Seismogram 3
Desired vs Learned Model 3
0.5 1 1.5 2 2.5 400 •oo s8o ' ,o'oo'•'oo' ,4'oo' ,•'oo' ,coo
Time (s) Depth (m)
recordedat differentlocations(offsets) small for a large number of examples, then the trained net-
sampledseismograms
relative to the sourceposition. Information about the sub- work can be used to give reliable interpolations.
surfaceis not only in the arrival times and the amplitudes As for the training set we generated two more general-
of the signalbut alsoin the curvatureof the hyperbolasover ization sets by adding 10 % and 30 % white noise on the
all traces. noiselessdata set. This allows us to make a rough measure
Figure 3 showsfour typical examplesof (l-D) Earth mod- of the stability of the results.
els and the correspondingseismograms.We want to train a
neural network that is set up to accept as input a seismic Training Processof the Neural Network
section shown at the left of the figure and gives, as output,
Each seismic section consistsof 20 traces, sampled at 271
a depth-velocitymodel shownat the right of the figure. Our
points each. Thus each section consistsof 20 x 271 = 5420
velocity models consistof eight layers over a half=space.As
points in time. The input layer of our network will then have
all layers have a fixed thickness, each model is defined by
5420 neurons, while the output layer will have 9 neurons.
nine values(the velocity valuesof the eight layers and that
From a practical point of view, the number of hidden neu-
of the half=space).
rons is essentiallyused to control the number of weights in
The Data Sets
the network. If the number of weights is larger than the
number of examples of the training set, the optimization
The training set. We generate a large number of Earth problem is underconstrained. Then, the network will pre-
models, all with eight fiat layersover a half-space. All layers cisely retrieve the desired output values of the training set,
have 200 m of thickness. The velocity of the first layer is but will not compute reasonableoutput patterns for input
generatedpseudorandomlywith a velocity of 1500 m]s 4- patterns excluded from the training set. If the number of
150 m]s (box-cardistribution). Oncethe velocity vt of the weights is too small, the network has not enough free pa-
œthlayer is given, the velocity vt+• in the œq-lstlayer is rameters to describe the desired mapping and the network
generated psedourandomly as cannot retrieve the desired output patterns of the training
set.
vt+• - (vt q- 190 m/s) 4- 380 m/s, (2) After some trim and error, we have chosen one hidden
layer with 25 neurons. This leads to the neural network
where,again,the plusor minussignrefersto a box-cardis- represented in Figure 5. In our network, all the neurons of
tribution. This yields an averageincreaseof velocity of 190 each layer are connected with all the neurons of the layer
m/s per layerbut allowslocalnegativevelocityjumps. This below, which makes a total of 5420 x 25 + 25 x 9 = 135,725
rule takesinto accountthat in generalvelocityincreaseswith connections.
depth but locally there might exist low-velocity zones. Let via be the ith desired velocity value corresponding
In all our velocity plots, the velocity is normalized with to the ath example,and let Oia, the output valuecalcu-
respectto a maximal valueof 4000 m/s. This normalization lated by the network in output layer (œ= 3). To train the
is necessaryas the choiceof the sigmoidaloutput function network, we minimize the least squaresexpression(depend-
only permits values between zero and one. ing on the weights W)
Once a laterally invariant velocity model is defined, syn- 9
Seismogram 1
Desired vs Learned Model I
Seismogram 2
Desired vs Learned Model 2
0.5 I 1.5 2 2.5 200 400 see s•o ' lo'oo' ,•'oo' ,ioo' ,gee' ,see
Time (s) Depth (m)
Seismogram 3
Desired vs Learned Model 3
Seismogram 4
Desired vs Learned Model 4
.. ,
[-
0.5 I 1.5 2 2.5 6 - 250 - 460 - sSo - sSo - ,6oo' •2•o- ,ioo', •'oo' ,s'oo
Time (s) Depth (m}
Fig. 4. Four typical syntheticseismicsections,(left) corruptedwith 10% noiseand (right) their corresponding
desiredmodels(solidline) usedto train the neuralnetworks.Two networksweretrained,oneusing325 examples
as the four shownin this figure,and the other using450 examples.The dottedand dashedlinesat the right are
the outputsgiven by the networktrained with 325 and 450 examples,respectively.The networkis able to extract
the necessaryinformation out of the noiseand to retrieve quite reliable velocity profiles. As in the noise-free
example.the networktrained with 450 examplesperformsonly marginallybetter than the networktrained with
325 examples.
We successively trained 19 networksby upgradingthe the error committed by the neural network for the exam-
training set with 25 new examples each time. After each ples in the training set, and a secondcurve describingthe
weightupdate,we present50 noiseless examplesof the gen- error on the 50 examplesof the generalizationset. These
eralizationsetto the networkand computedthe correspond- two curves are shown in Figure 6 for networks trained with
ing misfit value definedby equation(3). Thus we have two 100, 225, 325, and 450 examples, respectively.
misfitcurvesfor the neuralnetwork,the firstoneindicating Figure 7 showsthe optimum misfit for the generalization
R•TH AND TARANTOLA: NEURAL NETWORKS AND INVERSION OF SEISMIC DATA 6759
Time
Hidden Layer
Output Layer
Fig. 5. The neural network used in our experiment (connectionsomitted). The input layer has 20 x 271 -- 5420
neurons,which correspondsto the number of samplesin the seismicsection. The output layer has 9 neurons (the
nine velocities building up the model). A hidden layer has been chosen with 25 neurons. All the neurons of each
layer are connected with all the neurons of the layer below, which makes a total of 5420 x 25 + 25 x 9 -- 135,725
connections.
set of 50 models for networks trained with training sets of cally high values are probably due to convergenceproblems
different size, containing 25, 50, 75, ... 450, and 475 exam- (local optima).
ples, respectively. The misfits are shown for the 50 models We do not know what the behavior of our network would
from noiselessseismicsections,then with 5% and 10% white have been if we had increased the number of training ex-
noise. amples as suggestedby a detailed mathematical investiga-
As we can see, fairly good interpolations are obtained tion [Baum and Haussler,1989]giving the upper and lower
when the training sets consistof approximately 200 exam- bounds on the number of training patterns versus network
pies, and that increasingthe size of the training set doesnot size, in order to obtain a trained network which correctly
increasethe interpolation capability of the network. The lo- generalizes a certain fraction of new input patterns. The
Network Trained with 100 Ex. Network Trained with 225 Ex.
O'
Network Trained with 325 Ex. Network Trained with 450 Ex.
o •o mo t•o a•o..... •o
Iteration Iteration
Fig. 6. Dependenceof the misfit valueon the iteration numberfor networkstrained with 100, 225, 325, and 450
examples,respectively.The solidcurvecorresponds to the misfit valueof the actual training set, while the dotted
curveshowsthe misfit valuefor a "generalizationset" of examplesnot usedin the training. All seismograms here
are noise-free.Notice that the misfit value for the generalizationset of the network trained with 225 examples
has a minimumfor iteration 150. Further iterationsonly improvethe performances on the training set and the
network is overtrained. The optimization algorithm has to be stoppedat this iteration.
6760 R•TH AND TARANTOLA: NEURAL NETWORKSAND INVERSIONOF SEISMIC DATA
Minimum Misfits
o + + ß o
o o
ß •2 • * *
ß !
Number of Examples
Fig. 7. Minimum misfit values for the generalization set of 50 models for networks trained with training sets of
different size, containing 25, 50, 75, ..., 450, and 475 examples, respectively. The misfits are shown for the 50
models without noise, then with 5% and 10% white noise.
proof is done under the assumptionthat all neuronshave a the fact that not many velocity inversions were present in
linear output function or a Heavyside function. The lower the training set for this specificcase.
bound of the number of examples, for networks with one Figure 3 showsthe results obtained when the network
hidden layer, is of the order of n/C, where n is the number is trained with noise-corruptedseismograms(10% white
of weights in the network and I - C the fraction of cor- noise). Apparently, the network is still able to extract the
rectly generalizedinput patterns, e.g., if 90% (C - 0.1) of necessaryinformation even in the presenceof noise and to
the generalizationpatterns have to be correctlyinterpreted, retrieve quite reliable velocity profiles. As in the noise-free
the number of training patterns has to be 10 times higher example, the network trained with 450 examples performs
than the number of weights. If the lower bounds are not only marginally better than the network trained with 325
respected, any learning algorithm will often fail to find a examples.
set of weights that successfullyanalyzesinput patterns ex- Generalization capacity of the network. In the above sec-
cluded from the training set. Similar bounds should hold tion we have seen that the networks trained with 325 and
for networks with sigmoidal output functions. 450 examples are able to provide reasonableinterpretations
In our casethis means that 1,300,000 seismicsectionshave for the seismicsectionsof the training set. Furthermore,
to be usedto train our network to get 90% correctvelocity Figure 7 indicatesthat thesenetworksperform well with re-
profiles, for data sets not present in the training set. We spect to the interpolation of new depth-velocity profiles. We
have experimentally shownthat a network which performs can now present all 150 examples of the generalizationset
reasonablywell on new data setscan be obtainedusingsig- to the trained networks and compare the computed output
nificantly less examples than theoretically required. One patterns to the desired ones.
common shot gather is defined by 5420 numerical values. Figure 8 illustrates some of the seismogramsof the gen-
Using 450 examplesgives a training set of about 2,000,000 eralization set and their correspondingdesiredvelocity pro-
values,i.e., more than 10 times the number of weightsto files. The computed output of the network trained with
adjust, and the problem is overconstrained. 450 noise-freeseismogramsis superimposed.The compari-
son of the computed and the desired velocity profiles shows
that the network is able to proposereasonably good inter-
Results After the Training Process
pretations for the new seismograms.Some examples in the
Retrieving Learned Seismograms. The left of Figure 3 generalization set give wrong results.
showsfour typical seismicsectionsused in the training pro- The trained neural network givesa better interpolation of
cess,and the right of the figure shows,in solid lines, the velocitiesfrom the upper part of the model than for deeper
corresponding("true") Earth models. layers. It has some difficulty detecting negative velocity
The dotted and dashed lines correspondto the output of jumps. The network apparently has some tendency to bias
a network, for these four examples, when trained with 325 the interpolated output towards the mean Earth model of
and 450 examples,respectively.Although the networksare the training set. This means that the network takes into
not able to retrieve the Earth models exactly, the obtained account the general rule that velocity increaseswith depth,
accuracy will sufficefor most practical applications. Essen- given by equation (2), and fails to recognizelow-velocity
tially, this figure shows that a neural network like the one zones. Models number 6 and 7 in Figure 8 illustrate this
representedin Figure 5 is able to invert seismicsections. behavior. The velocity profile of model number 6 has a low-
The network performs better for the uppermost layers, velocity zone between 600 m and 1800 m in depth. There
but the results are meaningful for all layers. It has some is nearly no velocity increasein this region. The computed
difficulty in retrieving velocity decreases.This is related to output of the network cannot fit the desiredmodel. The net-
RSTH AND TARANTOLA:NEURALNETWORKSAND INVERSIONOF SEISMICDATA 6761
Seismogram 3
Desired vs Generalised Model 3
0.5 1 1.5 2 2.5 200 400 600 800 1000 i200 1400 1600 1800
Time (s) Dopth (m)
Seismogram 4
Desired vs Generalised Model 4
work alwayscomputesa positivevelocityjump from layer to velocitiesin the low-velocityzone. The velocity of the half-
layer, following the overall rule that velocity increaseswith space,only constrainedthroughthe amplitudevaluesof the
depth. This gives an overestimationof the desiredveloci- last hyperbola, is, in most models, wrong.
ties. Model 7 is characterizedby a low-velocityzone between The network computes a satisfactory estimate of the
600 m and 1400 m in depth. The computed output of the Earth model for about 120 of the 150 examples. Among
network indicatesa (small) positivevelocity jump, also for the 120 there are 50 models less accurately retrieved but
6762 R•TH AND TARANTOLA' NEURAL NETWORKS AND INVERSION OF SEISMIC DATA
Seisrnogram õ
Desired vs Generalised Model
Seismogram 6
Desired vs Generalised Model 6
Seisrnogram 7
Desired vs Generallsed Model 7
0./5 1 1 ./5 P. 2./5 260 ' 460 ' o6o ' a6o - lo'oo-liOO' lioo' 16'00'lo'oo
Time (s) Depth (m)
Seismogram 8
Desired vs Generalised Model 8
0.5 1 1.5 2 2.5 460 ' s•o ' s•o - lo'oo'lioo' lioo' l s'oo'looo
Time (s) Depth (m)
Fig. 8. (continued.)
where the computed model follows the trend of the desired mograms but also must "filter out" different perturbations.
model. Finally, there are 30 among 150 models which are Figure 9 showssome of the output patterns of the network
wrong. This yields, for the generalization set, an overall trained with 450 noise-freeexampleswhen we offer as input
scoreof 50% correctly computed models, 30% models that seislnogramscorrupted with 10% white noise.
follow the trend, but show clear differences,and 20% erro- The network is not violently unstablein the presenceof
neous models. noiseand it seemsthat it has someaptitude for removing
Let us now seewhat happenswhen, after training the net- unwanted noise. The errors of the network are, of course,
work with noise-free seismograms,we present seismograms larger than thosefor the generalizationwith noise-freedata,
with some white noise added. In this case the network is but they are often still acceptable. Fourty five percent of
not only confronted with the task of interpreting new seis- the models are retrieved with a satisfactoryaccuracy,27%
R•JTHANDTARANTOLA'
NEURALNETWORKS
ANDINVERSION
OF SEISMICDATA 6763
0.5 1 1.5 • •.5 z6o - a6o s6o - a6o - •6oo l;0o •,0o •s0o-ls'0o
Time (s) Depth (m)
0.5 1 1.5 2 2.5 0 - z6o - a6o ' s6o - a6o - 10'0o-li0o- li0o ld0o 1doe
Time (s) Depth (m)
0.5 1 1.5 2 2.5 6 - z6o ' ,•6o - s6o - 86o - •o'0o-lioo' •ioo' l •0o' •s'0o
Time (s) Depth (m)
showsignificant
deviations
but followthe irend,and28% such noisy data. The network appears to give models that
are wrong. are never too far off but that do not approximate well the
When we presentthe 150 seismograms
with 30% noise correct ones.
Seismogram 515
Desired vs Ceneralised Model 55
Seismogram 56
Desired vs Ceneralised Model 56
0.5 I 1.5 2 2.5 eoo 400 sod sod •o'oo' •ioo' •'oo' •oo' is'Do
Time (s) Depth (m)
Seismogram 57
Desired vs (•eneralised Model 57
$eismogram 58
Desired vs Ceneralised Model 58
Fig. 9. (continued.)
present in the data offered as input. Our seismicsection The susceptibility to "missing" information is satisfac-
consists of 20 traces with 8 reflections on each of them. Thus
tory: When zeroingout up to four traces,randomly chosen
the seismic section contains 8 x 20 = 160 wavelets. amongthe 20 of the original input data, the computedout-
Switchesof polarity of arrivals, up to 10% of the wavelets put is still acceptable. Zeroing five or more adjacent traces
and randomly placed, or a modificationof someof the arrival anywherein the seismicsectionsleads to unpredictablere-
times, of the order of 4- 0.04 s of about the same number suits. Sometimesthe computedmodel fits the desiredone,
of wavelets, do not have much influenceon the computed sometimes not.
output. More systematic perturbations, as for instance when the
R•TH AND TARANTOLA' NEURAL NETWORKS AND INVERSION OF SEISMIC DATA 6765
Seismogram 1O0
Desired vs Genera!ised Uode] 100
1.5 2 2.5 0 200 400 600 800 1000 1200 1400 1500 1800
Time (s) Dopth (m)
Seisrnogram 101
Desired vs Generalised Model 101
Fig. 10. (Left) Syntheticseismograms with 30% white noiseand (right) their corresponding
desiredmodels(solid
line) of the generalizationset of 150 examples. Superimposedare the interpolated modelscomputed by network
trained with 450 noiselessseismograms.The signal is swampedin the noiseand the network is not longer able to
recognizethe original signal. The computedvelocity profilesdo not closelyapproximate the desiredmodels.
polarity of the first and the third hyperbola is switchedover each containing an arbitrary number of neurons. Figure 1
all traces, or if the arrival times of pulsesreflected from one showsan exampleof suchtype of neural networks. Informa-
layer are perturbated, also gives unpredictable computed tion flows forward from the first or input layer through the
Earth models. We have not made any attempt to quan- hidden layer(s) to the last or output layer.
tify this behavior and to find out which degreeof correlated Defining a partially connected, multilayered neural net-
perturbation can be compensatedby the neural network. work by the number of layers L and the number of neurons
in eachlayer N(œ), (œ= 1,..., L), we canformulatethe rules
CONCLUSIONS to downpropagate an input pattern, from the input, through
the hidden, to the output layer.
We have trained networks with a relatively small training
Consider a set of E input pattern vectors IP•, and their
set to perform a nontrivial interpretation task. Our network
corresponding output pattern vectors OPt, c• - 1,... ,E.
is able to invert both, low noise and noise-free seismic sec-
An input pattern vector IP• is presentedto the input layer.
tions, when it has been trained with an adequate training
Thismeansthat theinputvalueI/t• of a neuroni in theinput
set. The results are nearly identical for the two cases. When
layer (œ- 1) is equal to the correspondingith component
more noise is present in the generalization set than in the
of the input pattern vector of the selectedexample c•. Thus
training set, the network performs fairly well for low and we can write
uncorrelated noise, and unpredictably for correlated noise.
The results shown here are preliminary and a detailed
Ii•o= IPi,, i= l,...,N(1) . (4)
investigation still needsto be done. Further researchshould
concentrate on possible presentations of seismic data, for This requiresthat the number of input neuronsN(1) is
instance, its r-p transform, as input patterns to a neural equal to the number of components of the input pattern.
network, and other network designs. The output function of the neurons in the first layer is the
linearfunctionandwe obtainthe outputvaluesO• for all
APPENDIX A' MULTILAYERED NEURAL NETWORKS neurons i of the input layer
The main effectof the additionalparameterO (or thresh- The rules to downpropagatethe ath input vector of the
old value), is to translatethe output functionhorizontally. training set (consistingof E examples)are given in Ap-
The amount of this shift is the value of O. Thus all neurons pendix A and will not be repeatedhere. The upper right
can individuallytranslatetheir output function,in orderto index, indicating the consideredlayer, is suppressed,in or-
increase or decreasethe output value independently of the der not to overcomplicate our notation. But we will always
weights. indicate in which layer the indices have to be taken.
A practical way to optimize the threshold for each neu- The goal is to minimize the error between the computed
ron is to considerit as a weight on the connectionrunning output of the network at the output layer and the desired
from an additional neuron, with constant output of +1 , to target output (OP,), corresponding
to the giveninput pat-
all other neurons. This means that the threshold for each tern for all E examples. Thus we chooseto minimize the
neuronis optimized togetherwith the weights,and a second misfit function
algorithm to determine them is not needed.
The goal is to minimize the error between the output
computed
by thenetworkat theoutputlayer O•, andthe M(W)- Z Z (Op•
- OVi'•)2'
a=l i
(12)
desiredtarget output OPt, associatedwith a given input
pattern, for all the examplesin the training set. where i runs over the number of output neurons, i.e., the
Thus we may chooseto minimize the misfit function length of the target output vector.
We now have to compute the gradients of the misfit func-
tion, equation(15), with respectto the weights.The trick is
that we determine the gradientsof weights connectingneu-
, (9) rons in the layer œ< L and the neighboringlayer œ+ 1 _<L,
where the tria are variable weights given to different data after we compute the gradients for all weights running be-
(in our examplesthey are all equM to 1). The mriable a tween the layers "below," i.e., all gradients from the layer
runs over the number of examplesand i over the number of L, the output layer, to the layer œq- 1. We will decompose
output neurons, i.e., the length of the target output vector. the computation in two steps. In a first step we will com-
The problem so defined becomes an optimization problem, pute the gradientsfor weightsrunning betweenthe output
and standard techniques can be applied to find an optimal layer L and the hidden layer L- 1. In a secondstep we will
set of weights W. computethe gradientsfor the weightsfrom the hidden layer
The choiceof a least squarescriterion for our misfit func- L- 1 up to the input layer of the network
tion definedin equation (9), insteadof the more robust le•t Let us determine the gradient for a weight Wit, connecting
absolutecriterion simplifiesthe gradient algorithm. The op- a neuron i of the output layer to a neuron k in the hidden
tion to use the least squarescriterion in our algorithm h• layer L-I:
no significant influencein the results, as we are only dealing
with synthetic examples with little noise. E
0M(W)
Let this gradient (at iteration t ) be denotedby
OW•, =2 00)•' (13)
orv,}
ot-- I j
r•(t)- OM(W(t))
ow•(t) ' (10) •vhere j runs over the neurons of the output layer L. The
where i and j denote any weight in our network, i.e., decompositionof the secondterm, according to the chain
rule, in the sum of equation (16) gives
any weight betweenthe input and the hidden layer or the
hidden and the output layer. A steepestdescentalgorithm
updates the weights according to 00•,•= 00•,•
OWit,
OI.i,•
0I• OWit,'
(14)
w•(t + 1) = wo(t) - 6 r•(t), (11)
As the output is a functionof the input valuewe define
where •(t) is an arbitrary factor, as large as possible,but
small enoughto ensurethat the value of the misfit will ac- 00) ' (15)
tually decrease.We call it the steplength.
It is known that the successiveapplication of a gradient and replacingthe explicit expressionfor the input value
algorithm can be interpreted as a "backpropagation"of the (equation(3))in equation(17) gives
RSWH AND TARANWOLA: N•.URAL N•.WWOaKS ^ND INV•.aSION OF S•.ISM•½ D^WA 6767
0Oj• 0
OWik
=g,(Ij•)
OWik
Z Wit0• , (16) 00• = gt(Ij•)W)o
OWop
gt(Io•)
Op•' (27)
where the index I runs over the neurons in the L- 1st Substituting equation (27) into equation (22) givesthe ex-
layer. Equation (19) simplifiesto pression
E
00•=g,(I)o)
OW• Z Ji•• 0• , I
(17)0M(W)
i:914%, = 2 y• y•(o•-ov•) g,(•) %0g,(•o•)o•,
where 6•, is the Kronecker symbol. Thus we have as the (28)
final expression where j runs over the neuronsof the output layer L. This
00• can be rewritten using the definition of e in equation (20)
= at(I•) &• O}•. (•8)
OW•}
E
0M(W)
OW•} = 2 • • O}•. (21) With this definition, we can verify the recursive formula
to compute the gradients I'ij for the weights connectinga
neuron i with a neuron j.
For the weightsrunning between the hidden layers we com- E
pute the gradientin a similar way. The gradientfor a weight aM(w)
Wo• connectingthe neuron o in the L- 1st layer with the
neuron p in the L- 2nd layer is
OW
0 =E ei•Oj•, (31)
where
0•(w) 0o•
OWo• =• • •(o• -o•)0%'
(•) e• - (0• - OP•) g•(•) , (32)
if i is a neuron in the last layer and
where j runs over the neurons of the output layer L. The
decompositionof the secondterm in the sum of equation ei,•-gt(Ii)Z (33)
(22), by the chain rule, gives
00• =00•
OWo• 0I• 00•
0I• • 00• 0I• ' (23) if
0I• OWo• i is a neuron in the hidden layer(s) L.
k
OWop = k q r
' e(t+l) = e(t)+aifM(W(t)) <M(W(/-1)),
(2•) (35)
i.e.• •(• + •) = •(•)/•,
6768 RSTH AND TARANTOLA' NEURAL NETWORKS AND INVERSION OF SEISMIC DATA
where M(W(t)) is the misfit value for the set of weights Lapedes,A., and R. Farbet, How neural nets work, in Proceed-
at iteration t, a and b some suitable chosen constants. Nu- ingsof the 1987 IEEE Denver Conferenceon Neural Networks,
merical tests to our applications have shown that the initial Neural Information ProcessingSystems,edited by D. Z. An-
derson,American Institute of Physics,New York, 1988.
steplengthat iteration zerois of the orderof 4 x 10-3 and a Le Cun, Y., Mod[les connexionistes de l'apprentissage,Th[se de
canbe setto I x 10-3, whileb is setto 2. Thiskindof adap- doctorat, Univ. Pierre et Marie Curie, Paris, 1987.
tive schemecan be made more effectiveby allowingdifferent Le Cun, Y., I. Kanter, and S. A. Solla,Secondorderpropertiesof
steplengthfor the individual weights(see,for example, Le error surfaces;Learning time and generalization,in Advances
Cun et al., [1991]). in Neural Information ProcessingSystems, vol. 3, edited by
Lippman, Moody, D. Touretzky, pp. 918-924, Morgan Kauf-
man, San Mateo, 1991.
Acknowledgments. This research could not have been done Liu, X., P. Xue, and Y. Li, Neural network method for tracing
without the sponsorsof the Inversion Project (Amoco, Aramco, seismicevents, $EG-Expanded Abstracts, pp. 716-718, SEG
Arco, CGG, Conoco, Delaney Enterprises, Digital Equipment, Publications, Tulsa, 1989.
Elf, Exxon, IFP, Inverse Theory 2• Applications, Mobil, Schlum- McCormack, M., D. Zaucha, and D. Dushek, First-break refrac-
berger, Shell, Statoil, Texaco, Thinking Machines,Total and Un- tion event pickingand seismicdata trace editing usingneural
ocal). Researchfunded in part by the FrenchMinistry of Na- networks, Geophysics,58, 67-78, 1993.
tional Education (MEN), providingtime for preliminarytestson Murat, M. E., and J. R. Rudman, Automatedfirst arrival picking:
a ConnectionMachine CM2, and Cray ResearchFrancewho gave A neural networkapproach,Geophysi.Prospect.,40• 587-604,
usfree and unlimited accessto their Y-MP computer,wheremost 1992
of the computationswere performed. One of us (G.R.) particu- Poulton, M. M., C. E. Sternberg, and C. E. Glass, Location of
larly thanks Elf-Aquitaine for a grant to support this research. subsurfacetargets in geophysicaldata using neural networks,
Geophysics,57, 1534-1544, 1992.
REFERENCES RSth, G., and A. Tarantola, Use of neural networks for the in-
versionof seismicdata, SEG-ExpandedAbstracts,vol. 1, pp.
Al-¾ahya, K., Velocity analysis by iterative profile migration, 302-305, SEG Publications, Tulsa, 1991.
Geophysics,5•, 718-729, 1989. Rummelhart, D. E., G. E. Hinton, and R. J. Williams, Learn-
Angeniol,B., Applicationsindustriellesdesr•seaux de neurones, ing internalrepresentations
by backpropagating errors,Nature,
in Neural Networks,Proceedingsof the International Confer- 332, 533-536, 1986
ence "Les Entretiens de Lyon," pp. 65-69, SpringerVerlag, Veezhinathan, J., D. Wagner, and J. Ehlers, First break pick-
New York, 1990. ing usinga neuralnetwork,in ExpertSystemsin Exploration,
Baum, E., and D. Haussler,What sizenet givesvalid generaliza- edited by F. Aminzadeh and M. Simaan, SEG Publications,
tion?, in Advancesin Neural Information Processing Systems, Tulsa, 1991.
vol. 1, edited by D. Touretzky, pp. 81-90, Morgan Kaufman, Wang, L. X., and J. M. Mendel, Adaptive minimum prediction-
San Mateo, 1989. error deconvolutionand sourcewaveletestimationusingHop-
Hebb, D. O., The Organizationof Behavior,John Wiley, New field neural networks,Geophysics,57, 670-679, 1992
York, 1949.
Hornik, K., M. Stinchcombe,and H. White, Multilayeredfeedfor- G. RSth and A. Tarantola, Institut de Physiquedu Globe de
ward networks are universal approximators,Neural Networks Paris, 4 place Jussieu F-75252 Paris Cedex 05, France.
Cornput., 2, 359-366, 1989.
Huang, K., Liu, W. H. Liu, and I. C. Chang,Hopfieldmodelsof (Received
October23, 1992;
neural networksfor detectionof bright spots,$EG-Expanded revised April 27, 1993;
Abstracts,pp. 444-446, SEG Publications,Tulsa, 1989. acceptedJune9, 1993.)