Neural Networks and Inversion of Seismic Data

JOURNAL OF GEOPHYSICAL RESEARCH, VOL. 99, NO.
B4, PAGES 6753-6768, APRIL 10, 1994
Neural networks and inversion of seismic data
Gunter R6th and Albert Tarantola

Institutde Physiquedu Globe de Paris, Paris, France
Neural networks can be viewed as applications that map one space, the input space, into some
output space. In order to simulate the desired mapping the network has to go through a learning
processconsistingof an iterative change of the internal parameters, through the presentation of
many input patterns and their corresponding output patterns. The training processis accom-
plished if the error between the computed output and the desired output pattern is minimal for
all examples in the training set. The network will then simulate the desired mapping on the
restricted domain of the training examples. We describe an experiment where a neural network
is designedto accept a syntheticcommonshot gather (i.e., a set of seismogramsobtained from a
single source), as its input pattern and to compute the correspondingone-dimensionallarge-scale
velocity model as its output. The subsurfacemodels are built up of eight layers with constant layer
thicknessover a homogeneoushalf-space, 450 examples are used to train the network. After the
training processthe network never computes a subsurface model which perfectly fits the desired
one, but the approximation of the network is sufficient to take this model as starting model for
further 'seismicimaging algorithms. The trained network computes satisfactory velocity profiles
for 80% of the new seismicgathers not included in the training set. Although the network gives
results that are stable when the input is contaminated with white noise, the network is not robust
against strong, i.e., correlated, noise. This application proves that neural networks are able to
solve nontrivial inverse problems.
INTRODUCTION Trained neural networks do not start fi'om scratch but

profit from the training processand appear, in general, ro-
While inverse theory is well understood, it often leads
bust against input patterns that are noisy or incomplete.
to cumbersomenumerical methods which, in general, work
A trained neural network has tendency to suppress un-
very differently from human interpreters. Manual interpre- wanted noise and extract valid parts of the input pattern.
tation combines the human capability to remember already
If information is missing, the network will sometimescom-
interpreted data and to take into account available a priori
pensatefor it. This can be an advantagefor data setswhich
geologicalinformation. In this context an experiencedin-
might have quickly changingnoise level from one example
terpreter might ignore some of the information in the data to the next or where data sets are often incomplete.
in some circumstances,while emphasizingthem in others. Successfulapplications of neural networks can be found
The application of traditional programming techniques to in many researchfields and some of them are already used
automate this process,however, has enjoyed limited success.
in industrial applications[Angeniol, 1990]. In geophysics
Data analysis always starts from scratch, and, in general, we encounter two principal industrial applications of neural
there is no easy way to implement a priori information in
networksusedfor trace editing [McCormacket al., 1993],
processingalgorithms. first break pickingproposedby the sameauthors (and, with
Neural networks are adaptive systemsthat map a vector different approaches,by Murat and Rudman [1992] or by
of an input space to the correspondingvector in an output Veezhinathanet al. [1991]). Other research'ershave success-
space. In order to approximate the desired mapping, the fully applied neural networkson deconvolutiontasks [Wang
network goes through a training processconsisting of an and Mendel,1992],recognitionof electro-magneticelliptictry
iterative change of the internal parameters, until a set of patterns[Poultonet al., 1992]and featuredetection[Huang
parameters is found which minimizes the difference between et al., 1989; Liu et al., 1989]. Despite this success,there
the computed and the desiredouput pattern, for each of the is almost no experiencehandling large data sets, commonly
input patterns. Then, the neural network approximates the present in seismicexploration, with neural networks.
function on the restricted domain of the examples of the The seismicinverse problem to recover the velocity model
training process. The fact that a neural network retrieves from recorded seismogramsis quite difficult to solve. An es-
the functional relationship between the input and the output sential first step is to obtain a reliable smooth estimation of
space,through the presentationof patterns, suggeststheir the underlying depth-velocityprofile, called the background
applicationsto caseswhere no clear understandingis present velocity. Information about the background velocity is of-
of the physicsthat maps one spaceonto another. ten associated with the arrival times of the seismic pulses
forming quasi-hyperbolasover the seismicrecord.
Traditional methods to retrieve a background velocity
Copyright 1994 by the American Geophysical Union. model may not directly use the full waveforms, but indirect
Paper number 93JB01563. information like the travel times obtained by picking the
0148-0227/94/93JB-01563505.00 times of arrival of the different seismicpulses. This method
6753
6754 RSTH AND TARANTOLA: NEURAL NETWORKSAND INVERSIONOF SEISMIC DATA
requires that the arrival times are picked with high accu- layer cannot be connected and that there are only connec-
racy, which is impossiblein the presenceof noiseor complex tions between neurons of successivelayers. However, there
subsurface structures. The exact arrival is often hidden in may be an arbitrary number of intermediate or hidden lay-
the noise, and signals can overlap. ers, eachcontainingan arbitrary numberof neurons(Figure
Other methods, based on the use of full waveforms, like 1). Each neuron of our network can be consideredan opera-
velocity analysis, can be used to determine the background tor, receivingreal numbers as input and transforming them
velocity model. Velocity analysis starts with a reasonable into one output value. The output is transmitted by the
estimation of the velocity model, obtained through informa- links, to connect the neurons. On each link a real number,
tion of borehole logs and additional geologicalknowledge. the weight, is defined. Before an output value is transmit-
The theoretical arrival times of the seismicsignalsare com- ted, it is multiplied by the correspondingweight. Thus the
puted for this model and superimposedon the observedseis- weight reflects the strength of the individual connections
mic section. If the theoretically predicted quasi-hyperbolas [Hebb,1949]. Modifying the weightvaluesby repeatedap-
match the observed ones in the seismic record, the veloc- plication of learningrules allowsthe network to approximate
ity model is correct, otherwise the velocity model has to be the function mapping the input patterns on the desiredout-
changedby trial and error, until a sufficientfit between the put patterns.
theoretical and observedhyperbolas is achieved. This anal- Hornik et al. [1989] rigorouslyestablishedthat feedfor-
ysis does not always give an accurate velocity estimation of ward neural network, build up of one hidden layer, with an
the subsurface.More sophisticatedtechniques,for instance, arbitrary output function and an output layer with a lin-
the iterative migration velocity analysis[AI-Yah•la,1989], ear output function, is capableto approximateany ordinary
also require human intervention at each step and the sub- (Borel measurable)function. The accuracyof the approxi-
jective interpretation may lead to errors in the background mation is determined by the number of hidden neurons.
velocity. Their resultsremain valid if we replacethe linear output
While the inverse problem posesdifficulties, the forward function in the output layer by a sigmoidaloutput function
problem of computing synthetic seismogrmnsfrom given [Lapedesand Father, 1988]. This is mostly done when the
depth velocity profiles is numerically simple. It is possible network is used for classification tasks. In this case the out-
to generatea large number of Earth modelsand to compute put value of the neuronsin the last layer are I (or closeto
their synthetic seismograms. These seismic section-Earth 1) if a featureis presentin the input pattern or 0 (or close
model pairs can constitute a training set for neural networks. to 0) if not.
We will describe an experiment where the neural network Although we do not apply neural networks to classifica-
is set up to accept as input a commonshot gather (i.e., a set tion problems,but to retrieve continuousoutput values, a
of seismogramsobtained from a singlesource)and to com- sigmoidalfunction is usedin the output layer. If the desired
pute as output an Earth model consistingof eight layersover output valuesare scaledto be in the quasi-linearrange of
a half-space.As any arbitrary (one-dimensional)subsurface the sigmoidalfunction, experimentsperformeddo not show
modelcan be approximatedby a stackof (sufficientlysmall) significant differences between neural networks with a lin-
layerswith constantthickness,we assumein our application ear or a sigmoidaloutput function (M.M. Poulton,personal
that the layer thicknessof the eight layers is constant and communication, 1993). In our tests we have observedthat
known. the training processof a networkwith linear output neurons
After the training processthe network is not only able is less robust.
to retrieve the desired velocity values from the learned ex- We want to retrieve seismic velocities which are bounded.
amples, but also to estimate the correct velocity profile for One of the smallest velocities is found in the weathered layer,
new seismic sections, not present in the training set, with about 100 m/s, and can increasefor stony salt to about
an 80% rate of success.This application provesthat neural 6500 m/s. The use of the sigmoidalfunction boundsthe
networks can be applied to data setswith a large number of output values and eliminates the possibility of computing
input parameters and can estimate the desired output values nonphysicalsolutions.However,we haveto scalethe desired
with satisfactory accuracy. output valuesto be approximatelyin the linear rangeof the
We were not able to decipher the internal behavior of the sigmoidal function.
network and the way it analyzes seismicdata. It is not clear In our simulations, we fix the output values of neurons
whether the neural network picks travel times, computes in the input layer by the input pattern itself. All input
differential seismograms,etc.. Much more detailed analysis values of the neurons in the following layers are computed
should be carried out before any seriousattempt to answer by summing up all incoming values, which are all output
that question is made. valuesof the previouslayer multiplied by the weightingfac-
In the present context it is difficult to compare neural tor defined on the correspondingconnections. The output
networks to classicalanalysis techniquesin terms of speed value of a neuron is then computed by applying a sigmoidal
and accuracy. Velocity analysis is mostly done by hand, thresholdfunctionto the input value(seeFigure2). In this
while trained neural networks are stand alone systemsand ;vay information flows forward from the first or input layer
do not allow any inferenceof an interpreter. throughthe hidden layer(s) to the last or output layer. This
forward propagation of an input pattern is describedin de-
INTRODUCTION TO MULTILAYERED NEURAL NETWORKS
tail in Appendix A.
Once the architecture and forward propagation rules are
Neural networks are dynamic systemsof a large number defined, it is necessaryto train the network. The network
of connectedsimple processingunits, called neurons (see has to computefrom an input pattern presentedto the input
Figure 1). In our approach,we work with multilayered,par- layer an output pattern which is closeto the desired one.
tially connected, neural networks. This means that the neu- The general idea of supervisedlearning is that the user has a
rons are arranged in layers such that two units in the same largenumber (say E ) training patterns,whichmeanspairs
R•TH AND TARANTOLA: NEURAL NETWORKS AND INVERSION OF SEISMIC DATA 6755
1 1
0 =I
i i
2 1 1
I =EW 0
i j ij j
2 2
o --g(I )
i i
3 2 2
I =EW 0
i j ij j
3 3
o i =g(I.• )
Fig. 1. Multilayered, partially connected,neural network. The first layer (on the top) is the input layer and the
last layer (on the bottom) is the output layer. In between there is one hidden layer. On the right-hand side,
the forwardpropagation
rulesfor this particularnetworkare shown.The inputsfor the inputlayerI• arefixed
bytheinputpattern
vectorIP. I• '2'adenote
theinputfora neuron
i in theinput,hidden
andoutputlayer,
respectively.
Theoutputvaluefor a neuroni in the inputlayerO• equalsthe corresponding
inputvalues,while
the outputvaluesOy andO• for neurons in the hiddenor in the outputlayer,respectively,
are computedby
applyinga sigmoidal
functiong(.) to theinputvalue.W.•'.2 denotetheweightvaluesbetweenthe neuroni andj
in the previous layer, between the input to hidden and hidden to output layer, respectively.
of input patterns and their corresponding desired output mapping correspondsto the solutionof the inverseproblem.
patterns. Let O/•, be the outputvalue,for the ath example,of
The training processcan be formulated as an optimization the ith neuron of the output layer L, and let OPi, be the
problem, consistingof finding a set of weights W which desired component of the output pattern for this neuron.
minimize the error between the computed and desired out- The misfit value M, depending only of the weights W, is
put patterns for all the examples presented to the network. defined to be
In other words, the goal is to train the network to perform a E N(L)
mapping, fi'om the input space,containingthe input pattern
vectorsas elements,to the output space,containingthe out-
- c•=l i=1
, (1)
put pattern vectors. In more conventional approaches,this

where N(L) is the number of neuronsin the output layer.
Note that this formulation implies that the number of output
Response of a Neuron neurons N(L) is equal to the number of componentsof
the output pattern vector. The choice of a least squares
norm in equation (1) is not essential,and, dependingon the
0.8 examples, other norms may perform better.
With this formalization, the training problem becomesa
•0.6 standard optimization problem. The successiveapplication
of a gradient algorithm can be interpreted as a "backprop-
o0.4 agation" of the errors into the network [Rumrnelhartet al.,
1986; Le Cun, 1987]. SeeAppendix B for a detailed descrip-
0.2 tion of the backpropagation of the error.
Once we found such a set of weights that minimizes the
0
error between the computed and the desired output for all
0-6 0-4 6)-2 © 6)+2 0+4 0+6 training examples, the network is trained and the training
Input processis stopped. The network will now simulate the map-
ping from the input to the output pattern space.
Fig. 2. Relationship between input and output in a neural net- Once the learning process is finished, the system is not
work. A smooth and continuoussigmoidal function x --> f(x)
used in our example is the function f(x) = 1/(1 + e-•) . More only able to recognize data present in the training set, but
precisely, the function x-• f(x-O) is used, where O is a also to compute output patterns for new input patterns.
threshoht value: for inputs much smaller or much larger than O, If the computed output patterns are consistently reliable,
the output equals 0 or +1, respectively, while for inputs of the trained neural networks can even be used in industrial ap-
order of O, the neuron has an almost linear response. It is this
plications to quickly analyze new data sets. If the computed
possibility of saturating a neuron by extreme values of the input
that gives a neural network the necessarynonlinearity to solve output patterns are wrong the network has to be retrained
nontrivial problems. and upgradedwith a more complextraining set. In this way
6756 R•3TH AND TARANTOLA: NEURAL NETWORKSAND INVERSIONOF SEISMIC DATA
we can, at least in theory, constructneural networkscapable velocitymodel of the subsurface.We also applied neural
of complex interpretations. networksto retrievean entireone-dimensional (l-D) depth-
velocityprofilefrom a zerooffset(singletrace)seismogram
RETRIEVING THE BACKGROUND VELOCITY FROM SEISMIC [RSthand Tarantola,1991]. Both cases,as interesting as
DATA they may be to study the behaviorof neural networks,are
not realistic approximations of seismicfield data. Real seis-
In previous applications we have shown that neural net- lnic data are neither an assemblyof travel time valuesnor
works can be trained to associate arrival time data to a only recordedon one receiver.Field data are denselytime
Seismogram 1
Desired vs Learned Model I
0.5 I 1.5 2 2.5 0 ' •6o ' 4•o - •o ' •o ',o'oo-•ioo', 4'oo-,
Time (s) Depth (m)
Seismogram 2
Desired vs Learned Model
0.5 1 1.5 2 2.5 480 ' 6•0 - S•O -I0'00-i2'00' 14'00'1•00' 1600
Time (s) Depth (m)
Seismogram 3
Desired vs Learned Model 3
0.5 1 1.5 2 2.5 400 •oo s8o ' ,o'oo'•'oo' ,4'oo' ,•'oo' ,coo
Time (s) Depth (m)
Seismogram 4 Desired vs Learned Model 4
0 •00 400 BOO 800 iOO0 i200 i4'00' 16'00'i800

0.5 1 1.5 2 2.5
Depth (m)
Time (s)
Fig. 3. (Left) Fourtypicalsynthetic

seismic
sections
and (right)theircorresponding
desired
models(solidline)
usedto train the neural networks.Two networksweretrained,one using325 examplesas the four shownin this
figure,and the otherusing450 examples.The dottedand dashedlinesat the right are the outputsgivenby
the networktrainedwith 325 and 450 examples,
respectively.
Notethat the networksare ableto approximately
retrieve the correct model, i.e., to solvethe inverseproblem.
RSTH AND TARANTOLA: NEURAL NETWORKSAND INVERSIONOF SEISMIC DATA 6757
recordedat differentlocations(offsets) small for a large number of examples, then the trained net-
sampledseismograms
relative to the sourceposition. Information about the sub- work can be used to give reliable interpolations.
surfaceis not only in the arrival times and the amplitudes As for the training set we generated two more general-
of the signalbut alsoin the curvatureof the hyperbolasover ization sets by adding 10 % and 30 % white noise on the
all traces. noiselessdata set. This allows us to make a rough measure
Figure 3 showsfour typical examplesof (l-D) Earth mod- of the stability of the results.
els and the correspondingseismograms.We want to train a
neural network that is set up to accept as input a seismic Training Processof the Neural Network
section shown at the left of the figure and gives, as output,
Each seismic section consistsof 20 traces, sampled at 271
a depth-velocitymodel shownat the right of the figure. Our
points each. Thus each section consistsof 20 x 271 = 5420
velocity models consistof eight layers over a half=space.As
points in time. The input layer of our network will then have
all layers have a fixed thickness, each model is defined by
5420 neurons, while the output layer will have 9 neurons.
nine values(the velocity valuesof the eight layers and that
From a practical point of view, the number of hidden neu-
of the half=space).
rons is essentiallyused to control the number of weights in
The Data Sets
the network. If the number of weights is larger than the
number of examples of the training set, the optimization
The training set. We generate a large number of Earth problem is underconstrained. Then, the network will pre-
models, all with eight fiat layersover a half-space. All layers cisely retrieve the desired output values of the training set,
have 200 m of thickness. The velocity of the first layer is but will not compute reasonableoutput patterns for input
generatedpseudorandomlywith a velocity of 1500 m]s 4- patterns excluded from the training set. If the number of
150 m]s (box-cardistribution). Oncethe velocity vt of the weights is too small, the network has not enough free pa-
œthlayer is given, the velocity vt+• in the œq-lstlayer is rameters to describe the desired mapping and the network
generated psedourandomly as cannot retrieve the desired output patterns of the training
set.
vt+• - (vt q- 190 m/s) 4- 380 m/s, (2) After some trim and error, we have chosen one hidden
layer with 25 neurons. This leads to the neural network
where,again,the plusor minussignrefersto a box-cardis- represented in Figure 5. In our network, all the neurons of
tribution. This yields an averageincreaseof velocity of 190 each layer are connected with all the neurons of the layer
m/s per layerbut allowslocalnegativevelocityjumps. This below, which makes a total of 5420 x 25 + 25 x 9 = 135,725
rule takesinto accountthat in generalvelocityincreaseswith connections.
depth but locally there might exist low-velocity zones. Let via be the ith desired velocity value corresponding
In all our velocity plots, the velocity is normalized with to the ath example,and let Oia, the output valuecalcu-
respectto a maximal valueof 4000 m/s. This normalization lated by the network in output layer (œ= 3). To train the
is necessaryas the choiceof the sigmoidaloutput function network, we minimize the least squaresexpression(depend-
only permits values between zero and one. ing on the weights W)
Once a laterally invariant velocity model is defined, syn- 9
thetic seismograms are computedsolvingthe waveequation

with a ray-tracingapproximation.The sourceis a standard
=
a i:1
- ) , (3)
Ricker wavelet with a dominant frequencyof 8 Hz. The sig-
where ct runs over the number of examples of the training
nai is recorded on 20 receivers at the surface during 2.71 set.
s, sampledevery 0.1 ms. The first receiver is placed 140
The optimization was made using the gradient technique
m away from the source,then the remaining receiversare
described in Appendix B. In all our experiments conver-
placedevery 90 m which givesa farest offset 1850 m. The
genceis achievedafter about 100 iterations. The learning
resultingset of seismicsections-Earthmodelsconstitutethe
process is actually studied up to 2000 iterations but only
noiselesstraining set. From this set, two others are gen-
minor changesare observed. Figure 6 showsthe decrease
erated by adding 5% and 10% white noise to all seismic
of misfit functions defined by equation (3), for the training
sections(Figure 4).
and for the generalization set. After about 100 iterations
The generalizationset. A trained neural network must
the misfit curves (solid curves)for the training set are flat.
be capableof computingthe right output patterns for the
Superimposed in dotted curves are the misfit curves for the
corresponding input patternsof the training set and alsoin-
generalization set. Note that the misfit value for the gener-
terpolating new input patterns. The network has to propose
alization set for the network trained with 225 goes through
reasonableoutput patterns for new input patterns that are a minimum after 150 iterations. As we are interested in
not presentin the training set. This is termed the "general-
a network which performs well for data sets not present in
ization" capacity of the network, although, in most cases,it
the training set, the training processhas to be stoppedhere.
is essentiallyan interpolation task that the network is asked
Further iterations will only improve the performanceson the
to perform.
training set but may deteriorate its generalization capabili-
We computed 150 additional noiselessseismic sections- ties.
Earth model pairs with the same specification as for the
training set describedin the previoussection. These pairs
Estimating the Size of the Training Set
are never used in the training process,but are presentedto
the network when the training processis accomplished.We A neural network has to be trained using many examples.
can then evaluate the error committed by the network in But what does "many" mean in our application, are these
interpretingthe new seismograms. If the error is sufficiently one hundred or one million examples?
6758 R•TH AND TARANTOLA: NEURAL NETWORKS AND INVERSION OF SEISMIC DATA
Seismogram 1
Desired vs Learned Model I
0.5 1 1.5 2 2.5 o zoo 400 see 8oo

Time (s) Depth (m)
Seismogram 2
0.5 I 1.5 2 2.5 200 400 see s•o ' lo'oo' ,•'oo' ,ioo' ,gee' ,see
Time (s) Depth (m)
Seismogram 3
0.5 I 1.5 2 2.5 200 400 see 8oo ,6oo' ,•'oo'

Time (s) Depth (m)
Seismogram 4
.. ,
[-
0.5 I 1.5 2 2.5 6 - 250 - 460 - sSo - sSo - ,6oo' •2•o- ,ioo', •'oo' ,s'oo
Time (s) Depth (m}
Fig. 4. Four typical syntheticseismicsections,(left) corruptedwith 10% noiseand (right) their corresponding
desiredmodels(solidline) usedto train the neuralnetworks.Two networksweretrained,oneusing325 examples
as the four shownin this figure,and the other using450 examples.The dottedand dashedlinesat the right are
the outputsgiven by the networktrained with 325 and 450 examples,respectively.The networkis able to extract
the necessaryinformation out of the noiseand to retrieve quite reliable velocity profiles. As in the noise-free
example.the networktrained with 450 examplesperformsonly marginallybetter than the networktrained with
325 examples.
We successively trained 19 networksby upgradingthe the error committed by the neural network for the exam-
training set with 25 new examples each time. After each ples in the training set, and a secondcurve describingthe
weightupdate,we present50 noiseless examplesof the gen- error on the 50 examplesof the generalizationset. These
eralizationsetto the networkand computedthe correspond- two curves are shown in Figure 6 for networks trained with
ing misfit value definedby equation(3). Thus we have two 100, 225, 325, and 450 examples, respectively.
misfitcurvesfor the neuralnetwork,the firstoneindicating Figure 7 showsthe optimum misfit for the generalization
R•TH AND TARANTOLA: NEURAL NETWORKS AND INVERSION OF SEISMIC DATA 6759
Time
Trace Number Input Layer
Hidden Layer
Output Layer
Fig. 5. The neural network used in our experiment (connectionsomitted). The input layer has 20 x 271 -- 5420
neurons,which correspondsto the number of samplesin the seismicsection. The output layer has 9 neurons (the
nine velocities building up the model). A hidden layer has been chosen with 25 neurons. All the neurons of each
layer are connected with all the neurons of the layer below, which makes a total of 5420 x 25 + 25 x 9 -- 135,725
connections.
set of 50 models for networks trained with training sets of cally high values are probably due to convergenceproblems
different size, containing 25, 50, 75, ... 450, and 475 exam- (local optima).
ples, respectively. The misfits are shown for the 50 models We do not know what the behavior of our network would
from noiselessseismicsections,then with 5% and 10% white have been if we had increased the number of training ex-
noise. amples as suggestedby a detailed mathematical investiga-
As we can see, fairly good interpolations are obtained tion [Baum and Haussler,1989]giving the upper and lower
when the training sets consistof approximately 200 exam- bounds on the number of training patterns versus network
pies, and that increasingthe size of the training set doesnot size, in order to obtain a trained network which correctly
increasethe interpolation capability of the network. The lo- generalizes a certain fraction of new input patterns. The
Network Trained with 100 Ex. Network Trained with 225 Ex.
O'
0 50 100 150 200 250 0 100 200 300 400 500

Iteration Iteration
Network Trained with 325 Ex. Network Trained with 450 Ex.
o •o mo t•o a•o..... •o
Iteration Iteration
Fig. 6. Dependenceof the misfit valueon the iteration numberfor networkstrained with 100, 225, 325, and 450
examples,respectively.The solidcurvecorresponds to the misfit valueof the actual training set, while the dotted
curveshowsthe misfit valuefor a "generalizationset" of examplesnot usedin the training. All seismograms here
are noise-free.Notice that the misfit value for the generalizationset of the network trained with 225 examples
has a minimumfor iteration 150. Further iterationsonly improvethe performances on the training set and the
network is overtrained. The optimization algorithm has to be stoppedat this iteration.
6760 R•TH AND TARANTOLA: NEURAL NETWORKSAND INVERSIONOF SEISMIC DATA
Minimum Misfits
o + + ß o
o o
ß •2 • * *
ß !
Number of Examples
Fig. 7. Minimum misfit values for the generalization set of 50 models for networks trained with training sets of
different size, containing 25, 50, 75, ..., 450, and 475 examples, respectively. The misfits are shown for the 50
models without noise, then with 5% and 10% white noise.
proof is done under the assumptionthat all neuronshave a the fact that not many velocity inversions were present in
linear output function or a Heavyside function. The lower the training set for this specificcase.
bound of the number of examples, for networks with one Figure 3 showsthe results obtained when the network
hidden layer, is of the order of n/C, where n is the number is trained with noise-corruptedseismograms(10% white
of weights in the network and I - C the fraction of cor- noise). Apparently, the network is still able to extract the
rectly generalizedinput patterns, e.g., if 90% (C - 0.1) of necessaryinformation even in the presenceof noise and to
the generalizationpatterns have to be correctlyinterpreted, retrieve quite reliable velocity profiles. As in the noise-free
the number of training patterns has to be 10 times higher example, the network trained with 450 examples performs
than the number of weights. If the lower bounds are not only marginally better than the network trained with 325
respected, any learning algorithm will often fail to find a examples.
set of weights that successfullyanalyzesinput patterns ex- Generalization capacity of the network. In the above sec-
cluded from the training set. Similar bounds should hold tion we have seen that the networks trained with 325 and
for networks with sigmoidal output functions. 450 examples are able to provide reasonableinterpretations
In our casethis means that 1,300,000 seismicsectionshave for the seismicsectionsof the training set. Furthermore,
to be usedto train our network to get 90% correctvelocity Figure 7 indicatesthat thesenetworksperform well with re-
profiles, for data sets not present in the training set. We spect to the interpolation of new depth-velocity profiles. We
have experimentally shownthat a network which performs can now present all 150 examples of the generalizationset
reasonablywell on new data setscan be obtainedusingsig- to the trained networks and compare the computed output
nificantly less examples than theoretically required. One patterns to the desired ones.
common shot gather is defined by 5420 numerical values. Figure 8 illustrates some of the seismogramsof the gen-
Using 450 examplesgives a training set of about 2,000,000 eralization set and their correspondingdesiredvelocity pro-
values,i.e., more than 10 times the number of weightsto files. The computed output of the network trained with
adjust, and the problem is overconstrained. 450 noise-freeseismogramsis superimposed.The compari-
son of the computed and the desired velocity profiles shows
that the network is able to proposereasonably good inter-
Results After the Training Process
pretations for the new seismograms.Some examples in the
Retrieving Learned Seismograms. The left of Figure 3 generalization set give wrong results.
showsfour typical seismicsectionsused in the training pro- The trained neural network givesa better interpolation of
cess,and the right of the figure shows,in solid lines, the velocitiesfrom the upper part of the model than for deeper
corresponding("true") Earth models. layers. It has some difficulty detecting negative velocity
The dotted and dashed lines correspondto the output of jumps. The network apparently has some tendency to bias
a network, for these four examples, when trained with 325 the interpolated output towards the mean Earth model of
and 450 examples,respectively.Although the networksare the training set. This means that the network takes into
not able to retrieve the Earth models exactly, the obtained account the general rule that velocity increaseswith depth,
accuracy will sufficefor most practical applications. Essen- given by equation (2), and fails to recognizelow-velocity
tially, this figure shows that a neural network like the one zones. Models number 6 and 7 in Figure 8 illustrate this
representedin Figure 5 is able to invert seismicsections. behavior. The velocity profile of model number 6 has a low-
The network performs better for the uppermost layers, velocity zone between 600 m and 1800 m in depth. There
but the results are meaningful for all layers. It has some is nearly no velocity increasein this region. The computed
difficulty in retrieving velocity decreases.This is related to output of the network cannot fit the desiredmodel. The net-
RSTH AND TARANTOLA:NEURALNETWORKSAND INVERSIONOF SEISMICDATA 6761
Seismogram 1 Desired vs Generalised Model i
0.5 1 1.5 2 2.5

Time Depth (m)
Seismogram 2 Desired vs Generalised Model
0.5 1 1.5 2 2.5

Time (s) Depth (m)
Seismogram 3
Desired vs Generalised Model 3
0.5 1 1.5 2 2.5 200 400 600 800 1000 i200 1400 1600 1800
Time (s) Dopth (m)
Seismogram 4
0.5 1 1.5 2 2.5

Time (s) Depth (m)
Fig. 8. (Left) Noiseless syntheticseismogramsand (right) their corresponding

desiredmodels(solidline) of
the generalizationset of 150 examples. Superimposed are the interpolatedmodelscomputedby the network
trained xvith450 noiselessseismograms.A visual comparisonshowsthat models1, 2, 3, 4, and 8 are correctly
interpolated.Model 6 deeperlayersis wrong. Models5 and 7 showcleardifferencesbut the interpolatedprofile
is in correspondence
with the trend of the desiredprofile.
work alwayscomputesa positivevelocityjump from layer to velocitiesin the low-velocityzone. The velocity of the half-
layer, following the overall rule that velocity increaseswith space,only constrainedthroughthe amplitudevaluesof the
depth. This gives an overestimationof the desiredveloci- last hyperbola, is, in most models, wrong.
ties. Model 7 is characterizedby a low-velocityzone between The network computes a satisfactory estimate of the
600 m and 1400 m in depth. The computed output of the Earth model for about 120 of the 150 examples. Among
network indicatesa (small) positivevelocity jump, also for the 120 there are 50 models less accurately retrieved but
6762 R•TH AND TARANTOLA' NEURAL NETWORKS AND INVERSION OF SEISMIC DATA
Seisrnogram õ
Desired vs Generalised Model
0./5 1 1 ./5 2 2.õ

Time (s) Depth (m)
Seismogram 6
0.õ 1 1.õ P. 2.õ 260 ' 46o

Time (s) Depth (m)
Seisrnogram 7
Desired vs Generallsed Model 7
0./5 1 1 ./5 P. 2./5 260 ' 460 ' o6o ' a6o - lo'oo-liOO' lioo' 16'00'lo'oo
Time (s) Depth (m)
Seismogram 8
0.5 1 1.5 2 2.5 460 ' s•o ' s•o - lo'oo'lioo' lioo' l s'oo'looo
Time (s) Depth (m)
Fig. 8. (continued.)
where the computed model follows the trend of the desired mograms but also must "filter out" different perturbations.
model. Finally, there are 30 among 150 models which are Figure 9 showssome of the output patterns of the network
wrong. This yields, for the generalization set, an overall trained with 450 noise-freeexampleswhen we offer as input
scoreof 50% correctly computed models, 30% models that seislnogramscorrupted with 10% white noise.
follow the trend, but show clear differences,and 20% erro- The network is not violently unstablein the presenceof
neous models. noiseand it seemsthat it has someaptitude for removing
Let us now seewhat happenswhen, after training the net- unwanted noise. The errors of the network are, of course,
work with noise-free seismograms,we present seismograms larger than thosefor the generalizationwith noise-freedata,
with some white noise added. In this case the network is but they are often still acceptable. Fourty five percent of
not only confronted with the task of interpreting new seis- the models are retrieved with a satisfactoryaccuracy,27%
R•JTHANDTARANTOLA'
NEURALNETWORKS
ANDINVERSION
OF SEISMICDATA 6763
Seismogrnm 51 Desired vs Oeneralised Model 51
0.5 1 1.5 • •.5 z6o - a6o s6o - a6o - •6oo l;0o •,0o •s0o-ls'0o
Time (s) Depth (m)
Seismogram 52 Desired vs Oeneralised Model 52
0.5 1 1.5 • •.5

Time (s) Depth (m)
0.5 1 1.5 2 2.5 0 - z6o - a6o ' s6o - a6o - 10'0o-li0o- li0o ld0o 1doe
Time (s) Depth (m)
0.5 1 1.5 2 2.5 6 - z6o ' ,•6o - s6o - 86o - •o'0o-lioo' •ioo' l •0o' •s'0o
Time (s) Depth (m)
Fig.9. (Left)Syntheticseismogramswith 10%whitenoiseand(right)theircorresponding

desiredmodels(solid
line)of the generalization
setof 150examples.
Superimposedarethe interpolated
models
computed by network
trainedwith 450 noiseless
seismograms.It seemsthat the networkfiltersout the noiseand is ableto retrievethe
necessaryinformationin the data. Models51, 55, 56, 57, and 58 are correctlyinterpolated,whilethe remaining
models show clear differences.
showsignificant
deviations
but followthe irend,and28% such noisy data. The network appears to give models that
are wrong. are never too far off but that do not approximate well the
When we presentthe 150 seismograms
with 30% noise correct ones.
added to the same network (see Figure 10), much of the

signalis now swampedin the noiseand it will be difficult Susceptibility to Severe Noise
for the network (that has never seenany noise)to extract
valuable information out of the data. Even for a human in- It is important to understandthe behavior of the neu-
terpreteror classicalalgorithmsit will be difficultto analyze ral network when, after training, highly nonlinear noise is
6764 R6T}• AND TARANTOLA: NEURAl. NETWORKSAND INVERSIONOF SEISMICDATA
Seismogram 515
Desired vs Ceneralised Model 55
0.5 1 1.5 2 2.5

Time (s) Depth (m)
Seismogram 56
0.5 I 1.5 2 2.5 eoo 400 sod sod •o'oo' •ioo' •'oo' •oo' is'Do
Time (s) Depth (m)
Seismogram 57
Desired vs (•eneralised Model 57
0.5 I 1.5 2 2.5

Time (s) Depth (m)
$eismogram 58
0.5 I 1.5 2 2.5

Time (s) Depth (m)
Fig. 9. (continued.)
present in the data offered as input. Our seismicsection The susceptibility to "missing" information is satisfac-
consists of 20 traces with 8 reflections on each of them. Thus
tory: When zeroingout up to four traces,randomly chosen
the seismic section contains 8 x 20 = 160 wavelets. amongthe 20 of the original input data, the computedout-
Switchesof polarity of arrivals, up to 10% of the wavelets put is still acceptable. Zeroing five or more adjacent traces
and randomly placed, or a modificationof someof the arrival anywherein the seismicsectionsleads to unpredictablere-
times, of the order of 4- 0.04 s of about the same number suits. Sometimesthe computedmodel fits the desiredone,
of wavelets, do not have much influenceon the computed sometimes not.
output. More systematic perturbations, as for instance when the
R•TH AND TARANTOLA' NEURAL NETWORKS AND INVERSION OF SEISMIC DATA 6765
Seismogram 1O0
Desired vs Genera!ised Uode] 100
1.5 2 2.5 0 200 400 600 800 1000 1200 1400 1500 1800
Time (s) Dopth (m)
Seisrnogram 101
0.5 I 1.5 2 2.5

Time (s) DepLh (m)
Fig. 10. (Left) Syntheticseismograms with 30% white noiseand (right) their corresponding
desiredmodels(solid
line) of the generalizationset of 150 examples. Superimposedare the interpolated modelscomputed by network
trained with 450 noiselessseismograms.The signal is swampedin the noiseand the network is not longer able to
recognizethe original signal. The computedvelocity profilesdo not closelyapproximate the desiredmodels.
polarity of the first and the third hyperbola is switchedover each containing an arbitrary number of neurons. Figure 1
all traces, or if the arrival times of pulsesreflected from one showsan exampleof suchtype of neural networks. Informa-
layer are perturbated, also gives unpredictable computed tion flows forward from the first or input layer through the
Earth models. We have not made any attempt to quan- hidden layer(s) to the last or output layer.
tify this behavior and to find out which degreeof correlated Defining a partially connected, multilayered neural net-
perturbation can be compensatedby the neural network. work by the number of layers L and the number of neurons
in eachlayer N(œ), (œ= 1,..., L), we canformulatethe rules
CONCLUSIONS to downpropagate an input pattern, from the input, through
the hidden, to the output layer.
We have trained networks with a relatively small training
Consider a set of E input pattern vectors IP•, and their
set to perform a nontrivial interpretation task. Our network
corresponding output pattern vectors OPt, c• - 1,... ,E.
is able to invert both, low noise and noise-free seismic sec-
An input pattern vector IP• is presentedto the input layer.
tions, when it has been trained with an adequate training
Thismeansthat theinputvalueI/t• of a neuroni in theinput
set. The results are nearly identical for the two cases. When
layer (œ- 1) is equal to the correspondingith component
more noise is present in the generalization set than in the
of the input pattern vector of the selectedexample c•. Thus
training set, the network performs fairly well for low and we can write
uncorrelated noise, and unpredictably for correlated noise.
The results shown here are preliminary and a detailed
Ii•o= IPi,, i= l,...,N(1) . (4)
investigation still needsto be done. Further researchshould
concentrate on possible presentations of seismic data, for This requiresthat the number of input neuronsN(1) is
instance, its r-p transform, as input patterns to a neural equal to the number of components of the input pattern.
network, and other network designs. The output function of the neurons in the first layer is the
linearfunctionandwe obtainthe outputvaluesO• for all
APPENDIX A' MULTILAYERED NEURAL NETWORKS neurons i of the input layer
In our approach,we work with multilayered, partially con-

O•o=IL i=l,...,N(1). (5)
nected, neural networks. This means that the neurons are
arranged in layers such that two units in the same layer These output values are spread over the outgoing weights.
cannot be connected and that there are only connections The input valuesI/t• of the followinglayers,startingwith
between neurons of successivelayers. However, there may the first hidden layer œ- 2 down to the output layer L are
be an arbitrary number of intermediate or hidden layers, successivelycomputed by
6766 R6TH AND TARANTOLA:NEURALNETWORKSAND INVERSIONOr SEISMICDATA
•v(t-•) errorsinto the network [Rummelhartet al., 1986; Le Cun,

Iit, -- Z œ-1
wij 0•œ-1œ-2,...,L i- l,...,N(œ) . 1987]. For a detailed descriptionof the backpropagation
j=l of the error, see Appendix B. The learning processis ter-
(a) minated when the error between the desired target output
The output valuesare obtained by and the computedoutput for all examplesis smallerthan a
threshold value determined by the user.
Of,•-g(Iit,•+Of) œ-2,...,L i-1,...,N(œ), (7)
APPENDIX B: BACKWARD PROPAGATION ALGORITHM
whereO• is the thresholdvaluefor the ith neuronin layer
œ.
Consider a multilayered, partially connectedneural net-
The behavior of the network does not depend critically
work, as already described. Let the number of layers in
on the detailsof the sigmoidaloutputfunction.In our sim- the network be L, where the input layer is the first layer
ulations we use
( œ= 1 ) and the output layer the Lth layer. Each layer
g(x) = 1/(1 + exp(-x)) . (8) in the network can be built up of an arbitrary number of
neurons.
The main effectof the additionalparameterO (or thresh- The rules to downpropagatethe ath input vector of the
old value), is to translatethe output functionhorizontally. training set (consistingof E examples)are given in Ap-
The amount of this shift is the value of O. Thus all neurons pendix A and will not be repeatedhere. The upper right
can individuallytranslatetheir output function,in orderto index, indicating the consideredlayer, is suppressed,in or-
increase or decreasethe output value independently of the der not to overcomplicate our notation. But we will always
weights. indicate in which layer the indices have to be taken.
A practical way to optimize the threshold for each neu- The goal is to minimize the error between the computed
ron is to considerit as a weight on the connectionrunning output of the network at the output layer and the desired
from an additional neuron, with constant output of +1 , to target output (OP,), corresponding
to the giveninput pat-
all other neurons. This means that the threshold for each tern for all E examples. Thus we chooseto minimize the
neuronis optimized togetherwith the weights,and a second misfit function
algorithm to determine them is not needed.
The goal is to minimize the error between the output
computed
by thenetworkat theoutputlayer O•, andthe M(W)- Z Z (Op•
- OVi'•)2'
a=l i
(12)
desiredtarget output OPt, associatedwith a given input
pattern, for all the examplesin the training set. where i runs over the number of output neurons, i.e., the
Thus we may chooseto minimize the misfit function length of the target output vector.
We now have to compute the gradients of the misfit func-
tion, equation(15), with respectto the weights.The trick is
that we determine the gradientsof weights connectingneu-
, (9) rons in the layer œ< L and the neighboringlayer œ+ 1 _<L,
where the tria are variable weights given to different data after we compute the gradients for all weights running be-
(in our examplesthey are all equM to 1). The mriable a tween the layers "below," i.e., all gradients from the layer
runs over the number of examplesand i over the number of L, the output layer, to the layer œq- 1. We will decompose
output neurons, i.e., the length of the target output vector. the computation in two steps. In a first step we will com-
The problem so defined becomes an optimization problem, pute the gradientsfor weightsrunning betweenthe output
and standard techniques can be applied to find an optimal layer L and the hidden layer L- 1. In a secondstep we will
set of weights W. computethe gradientsfor the weightsfrom the hidden layer
The choiceof a least squarescriterion for our misfit func- L- 1 up to the input layer of the network
tion definedin equation (9), insteadof the more robust le•t Let us determine the gradient for a weight Wit, connecting
absolutecriterion simplifiesthe gradient algorithm. The op- a neuron i of the output layer to a neuron k in the hidden
tion to use the least squarescriterion in our algorithm h• layer L-I:
no significant influencein the results, as we are only dealing
with synthetic examples with little noise. E
0M(W)
Let this gradient (at iteration t ) be denotedby
OW•, =2 00)•' (13)
orv,}
ot-- I j
r•(t)- OM(W(t))
ow•(t) ' (10) •vhere j runs over the neurons of the output layer L. The
where i and j denote any weight in our network, i.e., decompositionof the secondterm, according to the chain
rule, in the sum of equation (16) gives
any weight betweenthe input and the hidden layer or the
hidden and the output layer. A steepestdescentalgorithm
updates the weights according to 00•,•= 00•,•
OWit,
OI.i,•
0I• OWit,'
(14)
w•(t + 1) = wo(t) - 6 r•(t), (11)
As the output is a functionof the input valuewe define
where •(t) is an arbitrary factor, as large as possible,but
small enoughto ensurethat the value of the misfit will ac- 00) ' (15)
tually decrease.We call it the steplength.
It is known that the successiveapplication of a gradient and replacingthe explicit expressionfor the input value
algorithm can be interpreted as a "backpropagation"of the (equation(3))in equation(17) gives
RSWH AND TARANWOLA: N•.URAL N•.WWOaKS ^ND INV•.aSION OF S•.ISM•½ D^WA 6767
0Oj• 0
OWik
=g,(Ij•)
OWik
Z Wit0• , (16) 00• = gt(Ij•)W)o
OWop
gt(Io•)
Op•' (27)
where the index I runs over the neurons in the L- 1st Substituting equation (27) into equation (22) givesthe ex-
layer. Equation (19) simplifiesto pression
E
00•=g,(I)o)
OW• Z Ji•• 0• , I
(17)0M(W)
i:914%, = 2 y• y•(o•-ov•) g,(•) %0g,(•o•)o•,
where 6•, is the Kronecker symbol. Thus we have as the (28)
final expression where j runs over the neuronsof the output layer L. This
00• can be rewritten using the definition of e in equation (20)
= at(I•) &• O}•. (•8)
OW•}
E
Substitutingequation(21) back into (16) gives 0M(W)

OWop = 2 • • • %0g,(•o•)o•,
0M(W)
Ow•=• •(0•- OP•)a•(k•)
0• , (•o) where j runs over the neuronsof the output layer L. In this
way we can continueto compute the gradient for the weights
for a weight running between the output and the L- 1st up to the input layer. Let us introduce a seconddefinition
hidden layer. Let us introduce a new variable if i is a neuron for ei, similar to (20), but now for a neuron i in a hidden
in the last layer L layer, i.e., i is a neuron in one of the layers L = 2, ..., L - 1,
,• - (o• - o•) a•(k•), (•0)
we can write equation (21) as
ei,•= gt(Ii)Z Wiie•,•. (30)
jtE Lq-1
0M(W)
OW•} = 2 • • O}•. (21) With this definition, we can verify the recursive formula
to compute the gradients I'ij for the weights connectinga
neuron i with a neuron j.
For the weightsrunning between the hidden layers we com- E
pute the gradientin a similar way. The gradientfor a weight aM(w)
Wo• connectingthe neuron o in the L- 1st layer with the
neuron p in the L- 2nd layer is
OW
0 =E ei•Oj•, (31)
where
0•(w) 0o•
OWo• =• • •(o• -o•)0%'
(•) e• - (0• - OP•) g•(•) , (32)
if i is a neuron in the last layer and
where j runs over the neurons of the output layer L. The
decompositionof the secondterm in the sum of equation ei,•-gt(Ii)Z (33)
(22), by the chain rule, gives
00• =00•
OWo• 0I• 00•
0I• • 00• 0I• ' (23) if
0I• OWo• i is a neuron in the hidden layer(s) L.
k
where k runs over the number of neurons in the L- 1st

APPENDIX C: ESTIMATION OF THE STEPLENGTH
layer. Using equation (18),
The weights are changedfrom iteration t to t + 1 by
00• =•(•) 00•
OWo• 0I• • •(•) OWo•
0I• ' (24) • (t+l)=
w,• • + •(t)
wo(t) • r,•(t), (34)
We can nowcomputethe partial derivativesby replacingthe withtheusualnotations

andwhere
e,t.•
isanarbitrary
factor,
input by equation (3), the "steplength," as large as possible,but still small enough
to ensure convergence.
00• It is not always easy to choose appropriate values for
OWo• the steplength. If the steplength is too large the misfit,
for a given gradient, overshoots. On the other hand, if the
g•(•)• k
o q
o
0% r
, steplength is too small, the gradient algorithm takes an un-
necessarilylong time to converge.
(2•) It appears
reasonable
to takee equalforall gradients
I'it•
where q runs over the neuronsin the layer L - I and r and to increaseit by a constant if the misfit value decreases
overthe neuronsin the layer L- 2. This expression
(25) consistently,but to decreaseit geometrically if the misfit
simplifies to value starts to increase. Thus we can formulate the change
of the steplengthfrom one iteration t to the next t + I by
OWop = k q r
' e(t+l) = e(t)+aifM(W(t)) <M(W(/-1)),
(2•) (35)
i.e.• •(• + •) = •(•)/•,
6768 RSTH AND TARANTOLA' NEURAL NETWORKS AND INVERSION OF SEISMIC DATA
where M(W(t)) is the misfit value for the set of weights Lapedes,A., and R. Farbet, How neural nets work, in Proceed-
at iteration t, a and b some suitable chosen constants. Nu- ingsof the 1987 IEEE Denver Conferenceon Neural Networks,
merical tests to our applications have shown that the initial Neural Information ProcessingSystems,edited by D. Z. An-
derson,American Institute of Physics,New York, 1988.
steplengthat iteration zerois of the orderof 4 x 10-3 and a Le Cun, Y., Mod[les connexionistes de l'apprentissage,Th[se de
canbe setto I x 10-3, whileb is setto 2. Thiskindof adap- doctorat, Univ. Pierre et Marie Curie, Paris, 1987.
tive schemecan be made more effectiveby allowingdifferent Le Cun, Y., I. Kanter, and S. A. Solla,Secondorderpropertiesof
steplengthfor the individual weights(see,for example, Le error surfaces;Learning time and generalization,in Advances
Cun et al., [1991]). in Neural Information ProcessingSystems, vol. 3, edited by
Lippman, Moody, D. Touretzky, pp. 918-924, Morgan Kauf-
man, San Mateo, 1991.
Acknowledgments. This research could not have been done Liu, X., P. Xue, and Y. Li, Neural network method for tracing
without the sponsorsof the Inversion Project (Amoco, Aramco, seismicevents, $EG-Expanded Abstracts, pp. 716-718, SEG
Arco, CGG, Conoco, Delaney Enterprises, Digital Equipment, Publications, Tulsa, 1989.
Elf, Exxon, IFP, Inverse Theory 2• Applications, Mobil, Schlum- McCormack, M., D. Zaucha, and D. Dushek, First-break refrac-
berger, Shell, Statoil, Texaco, Thinking Machines,Total and Un- tion event pickingand seismicdata trace editing usingneural
ocal). Researchfunded in part by the FrenchMinistry of Na- networks, Geophysics,58, 67-78, 1993.
tional Education (MEN), providingtime for preliminarytestson Murat, M. E., and J. R. Rudman, Automatedfirst arrival picking:
a ConnectionMachine CM2, and Cray ResearchFrancewho gave A neural networkapproach,Geophysi.Prospect.,40• 587-604,
usfree and unlimited accessto their Y-MP computer,wheremost 1992
of the computationswere performed. One of us (G.R.) particu- Poulton, M. M., C. E. Sternberg, and C. E. Glass, Location of
larly thanks Elf-Aquitaine for a grant to support this research. subsurfacetargets in geophysicaldata using neural networks,
Geophysics,57, 1534-1544, 1992.
REFERENCES RSth, G., and A. Tarantola, Use of neural networks for the in-
versionof seismicdata, SEG-ExpandedAbstracts,vol. 1, pp.
Al-¾ahya, K., Velocity analysis by iterative profile migration, 302-305, SEG Publications, Tulsa, 1991.
Geophysics,5•, 718-729, 1989. Rummelhart, D. E., G. E. Hinton, and R. J. Williams, Learn-
Angeniol,B., Applicationsindustriellesdesr•seaux de neurones, ing internalrepresentations
by backpropagating errors,Nature,
in Neural Networks,Proceedingsof the International Confer- 332, 533-536, 1986
ence "Les Entretiens de Lyon," pp. 65-69, SpringerVerlag, Veezhinathan, J., D. Wagner, and J. Ehlers, First break pick-
New York, 1990. ing usinga neuralnetwork,in ExpertSystemsin Exploration,
Baum, E., and D. Haussler,What sizenet givesvalid generaliza- edited by F. Aminzadeh and M. Simaan, SEG Publications,
tion?, in Advancesin Neural Information Processing Systems, Tulsa, 1991.
vol. 1, edited by D. Touretzky, pp. 81-90, Morgan Kaufman, Wang, L. X., and J. M. Mendel, Adaptive minimum prediction-
San Mateo, 1989. error deconvolutionand sourcewaveletestimationusingHop-
Hebb, D. O., The Organizationof Behavior,John Wiley, New field neural networks,Geophysics,57, 670-679, 1992
York, 1949.
Hornik, K., M. Stinchcombe,and H. White, Multilayeredfeedfor- G. RSth and A. Tarantola, Institut de Physiquedu Globe de
ward networks are universal approximators,Neural Networks Paris, 4 place Jussieu F-75252 Paris Cedex 05, France.
Cornput., 2, 359-366, 1989.
Huang, K., Liu, W. H. Liu, and I. C. Chang,Hopfieldmodelsof (Received
October23, 1992;
neural networksfor detectionof bright spots,$EG-Expanded revised April 27, 1993;
Abstracts,pp. 444-446, SEG Publications,Tulsa, 1989. acceptedJune9, 1993.)

Neural Networks and Inversion of Seismic Data

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Neural Networks and Inversion of Seismic Data

Uploaded by

Copyright:

Available Formats

JOURNAL OF GEOPHYSICAL RESEARCH, VOL. 99, NO.

B4, PAGES 6753-6768, APRIL 10, 1994

Neural networks and inversion of seismic data

Gunter R6th and Albert Tarantola

INTRODUCTION Trained neural networks do not start fi'om scratch but

put pattern vectors. In more conventional approaches,this

Seismogram 4 Desired vs Learned Model 4

0 •00 400 BOO 800 iOO0 i200 i4'00' 16'00'i800

Fig. 3. (Left) Fourtypicalsynthetic

thetic seismograms are computedsolvingthe waveequation

0.5 1 1.5 2 2.5 o zoo 400 see 8oo

0.5 I 1.5 2 2.5 200 400 see 8oo ,6oo' ,•'oo'

Trace Number Input Layer

0 50 100 150 200 250 0 100 200 300 400 500

Seismogram 1 Desired vs Generalised Model i

0.5 1 1.5 2 2.5

Seismogram 2 Desired vs Generalised Model

0.5 1 1.5 2 2.5

0.5 1 1.5 2 2.5

Fig. 8. (Left) Noiseless syntheticseismogramsand (right) their corresponding

0./5 1 1 ./5 2 2.õ

0.õ 1 1.õ P. 2.õ 260 ' 46o

Seismogrnm 51 Desired vs Oeneralised Model 51

Seismogram 52 Desired vs Oeneralised Model 52

0.5 1 1.5 • •.5

Seismogram 53 Desired vs Oeneralised Model 53

Seismogram 54 Desired vs Oeneralised Model 54

Fig.9. (Left)Syntheticseismogramswith 10%whitenoiseand(right)theircorresponding

added to the same network (see Figure 10), much of the

0.5 1 1.5 2 2.5

0.5 I 1.5 2 2.5

0.5 I 1.5 2 2.5

0.5 I 1.5 2 2.5

In our approach,we work with multilayered, partially con-

•v(t-•) errorsinto the network [Rummelhartet al., 1986; Le Cun,

Substitutingequation(21) back into (16) gives 0M(W)

where k runs over the number of neurons in the L- 1st

We can nowcomputethe partial derivativesby replacingthe withtheusualnotations

You might also like