You are on page 1of 8

Defect Prediction

With Neural Networks

Robert L. Stites, Bryan Ward*, and Robert V. Walters** *IKONIX *STR Corp. Inc and q

The industrial

and scientific

world

abound with problems

pies or derived from empirical the later method was required.

data. In this particular

case,

that are poorly un&rstood or for which apparent anomalous conditions exist. Artificial Neural Networks are utilized with conventional techniques to extract salient features and relationships which are non-linear in nature. Defect causality in a large continuous flow chemical process is investigated. Significant gains in the prediction of defects over traditional statistical methods are achieved.

At a later time and with an

adequate model, corrective actions may be taken and a better control system for the process cart be developed. The process is continuous with high capacitance, flow, raw materials are entered product on a

into the system, material flows through several stages, each and exits as finished continuous basis. The total lag through the system is on the order of a day. This system operates around the clock, three hundred sixty-five days per year. system, quality

INTRODUCTION
Data are available A chemical processing in defect plant experienced for unexplained a multi-stage mixing time of statistical from a data acquisition control records, and handwritten maintenance records. Lapses in recording of data and modification of the data prior to recording A description are known to be problems. of aspects of the chemical Minimal

fluctuations lags.

measurements to utilize

continuous liquid flow system with significant Repeated attempts traditional methods were largely unsuccessful any causal relationships. The ultimate system objective is the control

in the determimtion

of a number

process and the recording techniques is provided. detailed information to the neural network tions are critical developing The central artificial development team.

on the process itself has been provided These descripin of the difficulties

of quality

in

order to maximize the economic potential of the plant process. A precursor to the development of any control system is the development characteristics. twofold: develop a model capable of more accurate prediction of defect measurements, and identification of causal relationships withof a model of the plant operating The focus of the effort described herein is

to the understanding

any method of prediction. effort is to ascertain if it is feasible in the plant. for an

neural network to learn to associate a given defect

rate with only the observed conditions A neural network

is developed which produces better than improvement for prediction of the

an order of magnitude

in the process. A model can be developed from either known first princi-

defect rates than traditional statistical methods. Resulting predictions do not precisely match the actual statistics but are capable defects. The neural network provides a view into a unique combinaof explaining a significant portion of the

Permission to cnpy without fee atl or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that m ying is by permission of the Association for Computing Machinery. ! o copy otherwise, or to republish requires a fee and/or specific permission.

tion of the use of production factors, physical sensors, and human factors in the prediction of defect rates.

@ 1991 ACM

089791432-519110005/0199

$1.50

199

quality control sample location are estimated based upon limited measurements. Due to the continuous mixing of the The process includes a number of stages involving mixing and processing at each stage, each with capacitances. In a continuous large large product during the various stages, actual arrival times of the product at the quality control sample point follows some undetermined considering distribution. This can be best illustrated by a situation in which a large number of colored

flow process, raw materials

continuously enter into the system and the finished product exits on a continuous basis. No batches are involved. This system operates twenty-four sixty-five
Raw Materials --p WaW 1 .~~2* Mix ................. ..::*::%+ .,...:.,.,.: ...... .,:,:;:,:,:::,:::,:,:,:,:, .:::,:::,:,:,::!,:,:,:,:,:. ..,.::::,:,:,., .... 1 Stage 2 .0. Stage N
Figure 1 - Generalized process. schematic of the continuous

hours a day, three hundred

balls are simultaneously added to the process at the same point or location in the process. Depending on the number of mixing stages involved, the remaining they will become spread out over If the time of arrival length of the process.

days per year.

Continuous

Flow Chemical Process

of the balls is noted at the quality control station, it can be seen that all of the balls will not arrive at the same time but will follow some distribution. Since no information was available about the distribution of the lag times of the product through the process, singular arrival times are utilized for alignment.

Mix .......................... ....x..:$&X. w Mix _Finished IProduct

%+--l Stage

Data Acquisition
The acquisition the basic recording

and Recording System


rate for control purposes. of the recording The data are time unit with

system acquires data at a rate many times

averaged over the interval

the average value recorded with a time tag associated with the end of the interval. An immediate effect of this averaging over the recording interval is that the nyquist frequency is a variable between two and four recording time units. In addition, the non-linear filter effect of the average

flow chemical

Due to the proprietary minimal the ANN information team. This

nature of the process involved, essentially prohibited portion

only

about the process is made available to the use of Available in the design of the network.

operation leaves uncertain the power levels associated with the higher frequencies. Loss of data from the acquisition system occurs frequently. At times the loss is for all data for a particular time interval and at others it may be only for individual sensors. Time alignment of the sensor data to the quality control defect data indicates that a substantial number of the data do not

known relationships data recorded

by the acquisition

of the control

system, records of quality control measurements, and various operating parameters for the plant are provided.

Quality Control
Quality control measurements are based upon a number of selected as most representafactors with one measurement

have a complete

set of sensor data to be utilized

as input.

tive of overall quality for this effort. This quantitative laboratory measurement requires an extended period of time to evaluate and is sampled at regular intervals of two time units. All samples are time tagged for alignment with sensor data.

Maintenance Operations
Routine maintenance operations are performed regularly with no sw-up or shutdown sequences involved during the recorded data acquisition period. Time of maintenance occurrence is poorly, if at all, recorded. Most of the maintenance functions are estimated to be of short duration, on the order of one quarter to one half recording time unit. It is considered unlikely that the impact of maintenance is reflected directly in any of the sensor data.

Sensors
Sensors for the existing control system are located throughout the process. Over four hundred sensors are provided for all standard types of measurements: temperature, humidity, pressure, etc. Redundancy and computational constraints are used to

Human Factors
Information about team organization, shift assignments, and quality control assignments are provided. Initial information from the traditional statistical efforts indicate that these human factors have been eliminated as having any causal effect on the defect rate. A major finding of the study is to the the critical importance of these human factors.

reduce the number of sensors to just over 100. Approximate time offsets of the sensors relative

200

enhanced if different

segments of the data set are used.

Data are recorded over a long period onto magnetic media, and written form, The sensor data is time aligned to the defect quality measurements and the handwritten data is added. Data dropouts are noted with special out of bounds values. Due to the extremely determined large number of data dropouts, it is

Filtering
The sensor and the defect data are low pass filtered, an eight pole butterworth frequency content. Prior represented by adjacent known samples. in the output of the filter data, they are re-marked data is minimally filter, to eliminate using the high

to filtering, missing data is the interpolated value between the two While some ringing is observed at these missing portions of the as missing data and the remaining

that all data should be treated as discrete data

samples rather than as a time series in training the network. Standard statistical values are ascertained on all of the

impacted.

sensors and the defect data. Evaluation of the sensor and quality data utilizing:

Weigend, et al [90], among others, note that the network will first attempt to fit the strongest features and then later The filtering of the data should attempt to fit the noise. hopefully increase the rate of convergence to a solution by the network.

auto and cross covariance, fourier analysis, and spectral correlation are used. redundant A number information, of the sensors are eliminated and some specific filtering as

NEURAL NETWORK
Time and funding constraints plexity network propagation an artificial

DESIGN
with the comneural in

in conjunction

of the

of the problem suggests that the use of a back error approach is the best candidate artificial neural network is represented strength. paradigm to apply to the problem. Knowledge

data is indicated.

in two basic

ways, structure and interconnection

Feed Forward Network


Based on experience with other problems with similar complexities figure a network of the general structure depicted in

3 is utilized.

The network with two hidden is based upon known within are based primarily

layers

is not ftdly interconnected. into subnetworks and functional Functional is known


Figz&e 2 Power spectral the 24.98 unit frequency. estimate. Note the peak at

Groupings of the network nodes spatial, temporal, elements. upon what little the network

relationships

relationships

about the stage in the process and the units of

particular sensors. Additioml process factors such as shift and quality control personnel are also included and grouped into subnetworks. The output weight, neuronal, or processing, element element utilized multiplied in this by a These

Similarities frequency

exist between sensors and the quality data in the domain. Spectral evaluation of both the defect network is one in which the inputs are the product of the of some other neuronal Wji, which is unique for each connection.

data and the input data reveals that a number of periodicities exist in the data. In addition, three dominant frequency groupings permeate all of the sensor and quality data, in differing combinations. These frequencies of 227, 24.98, and 11.77 time units are not precise but rather fuzzy in nature, centered at these frequencies. The most dominant, 24.98 time units, is illustrated in figure 2. The power level of this period permeates almost all of the sensors and the defect measurements. The cepstrum also contains a dominant periodicity at the same interval. Interestingly, the relative significance of these frequencies are diminished or

inputs are summed and the sum pmcesscd by a function. The output of a neuronal element can be expressed as
Opi = f

(Dvijo,j) j

(1)

where thej. is a sigmoidal function,

function,

in this case the logistic

201

Input Layer

Hidden Layer 1

other unit.

Group 1 : Inputs

x :N
. ~ :
q

Hidden Layer 2 output Layer

NETWORK

TRAINING

The data are divided into two segments, the first utilized for training of the network and the second for testing. The data are presented to the network and a comparison of the defect

Group 2: lnDuts

\\7N--W

prediction

from

the network

is made with

the observed

quality control measurement. The difference is then used to propagate the error back through the various elements and layers of the network. An iterative improvement evaluation in of the network structure of of procesis the

planned
Inputs : A : /XV

through

the performance

network and some reconfiguration or elimination sing elements and interconnections. Interconnection strengths are modified

during the learning

process in order to minimize


q

the sum of the squared error

Individual Inputs

q q

during the learning period. Weights are modified after each pattern presentation rather than at the completion of one complete epoch. Experience also suggests that as the network appears to approach its convergence limit, some random perturbation in the training sequence may allow a further reduction in the error.

Figwe

3 Exnrnple arti@cial neural network illu.rtrat hki-

ing clusters of inpui units connected to multiple den layers,

Back Error Propagation


Perhaps the most widely and later published Rumelham, o et.al. [86]. nomenclature utilized neural network paradigm is that of Back Error Propagation
Tabk I Number

first used by Werbos [74]

of

neurons

in network.

by Le Cun

[85], Parker [85], and


the

For the purposes of this effort,

of Rumelhart

[86] is used, The basic objee-

Neural
Input

Network

Structure
113

tive is to minimize the sum of the squared error ( target, tpi, less the output, oPi) over a series of pattern, p, presentations.

Units

E=~ Hidden Layer 1 Hidden Layer 2 Output Layer 22 8 1


P

;~

(tPi-oP,)

(3)

1=1
across the network in a manner

The error is then distributed

which allows the interconnection weights, Wij, to be modified according to the following rule at time n+l: Aw,; =
@W;+
?@piOpi

(4)

where, using the function

in equation 2, 1 -Opi)

tip, =
f(a) = _

(rpi-opi)opi(

(5)
for an output

(2)
represents the amount of error attributable unit, and for any hidden layer units.

Connections to a bias, or constant, unit and the weight for that connection are not explicitly shown. Bias unit connections are treated in the same manner as a connection to any

202

Pov

Proportion of Variance

Coefficient of Determination

*loo.

9)

where the coefficient of determination ~ correlation 1 coefficient

12
1

(lo)

X# 0=(0
As the term proportion
Figwe 4 Basic neuron element illustrating the dependence prior layer. the weighted sum of the output from a

of variance

would

suggest, this

metric is a measure of that portion of the variance from the mean of the target signal is explained by the output of the neural network. correlation statistical This metric is also available, for the results through the coefficient, of the traditional

approach pursued previously.

Pattern Presentation 5,, = Opi( 1- opi)xi3pkwb


k

(6)

The order of presentation result in the network this problem however,

of the patterns can sometimes the order. In this case, A sequential order

learning

did not occur.

The momentum term, u, provides a smoothing of the weight modification over ~ entire epoch. Note-that the weights are actually modified after the presentation of each pattern, p. The learning rate, q, is used to determine the amount that each weight is modified. was determined that a=.7 and q=.5 (8) (7) For this problem, it

was used, except

that as the squared error appeared to

approach an asymptote, one or two random updates would be used. This perturbation to the network would briefly increase the squared error but would allow the network to rapidly settie to a lower squared error then that initially observed.

Convergence Criteria
Convergence of the network is monitored in two terms. during an of The mean squared error is monitored change, above a given threshold, epoch. Variance. The second term monitored to determine that no is the Proportion

has occurred

It is desirable to ensure that this term does not

were reasonable starting values. The momentum term and learning rate were both reduced as the squared error appeared to approach a minimum.

begin to decrease steadily over several epochs, while at the same time the mean square error is still decreasing.

Training Duration Metric


Several metrics are useful in the evaluation of the performance of the network during training and testing. The most frequently utilized metric for backpropagation is the sum of the mean squared error (eq. 3) and some variance ti.mction. It has been noted by many researchers that, in some cases, the reduction of the mean square error is not sufficient to adequately control this particular the learning of the neural network. of determination In (or case the coefficient The training ples. @od for the network was 19,600 presentaof 1250 examand tions, each with the training At the completion set consisting

of each epoch, the predictions

targets are used to develop the Proportion for the network at that point.

of Variance value

expressed as a percentage

in proportion

of the variance

NETWORK VARIATION USING RECENT OBSERVED DEFECTS


Lapedes and Farber (87) and Weigend, et al (90) have demonstrated that a neural network is capable of extraction of complex relationships from data in a noisy environment

accounted for, POV ) is found to be the most useful.

203

Figure

5- Neural

Network

Prediction

of Defect Rate for Test Data #2, POV = 27.2%

Figure

6- Neural

Network

Target Defect Rate for Test Data #2.

Figure

7- Neural

Network

Prediction

of Defect Rate with one prior

Observation,

POV = 81.2%

Figure

8- Newal

Network

Target Defect Rate for Training

Data.

through the use of past observations.

Although

much of the

values were obtained

by conducting

the training

in two

noise had already been removed by the filter discussed in an earlier section, a decision was made to observe the performance of the network when prior observations of the defect rate where utilized as input. These past observations were in addition to those inputs already in use by the network. Trials with from one to five of the most recent observations of defect rates were utilized as inputs. The highest POV

stages. First, the training of the network described in the body of the paper, is completed. The outputs of this network are then utilized along with the prior observed defects, as inputs to a smaller network. The results of this network variation are included in Table III for completeness with some discussion of their implication paragraphs. in the following

204

RESULTS
The neural network traditional metric produced predictions This estimate or the coefficient of the defect rate is based upon the of determination. techniques

Tab& III

- Results of using prior

observed defects as input.

for the chemical process are ten to twenty times better than techniques. of the POV

Proportion
Using Prior

of Variance
Observations Training Data 8~% 82.1% 83.6% 86.7% 97.6%

Independent estimates of the POV by traditional had only been performed not performed oped results are portrayed

for the training data set and were in table II. One Prior Two Prior Observation Observations Observations Observations Observations

for the test data set. The known and devel-

Tahk

II - Results

Three Prior

Proportion

of Variance
Four Prior Training Data 2=0 47.6% Test Data 27.2% 10 43.2% Five Prior

Traditional Artificial

Statistical Neural Network

interconnections some evaluation inputs. Although

between the processing of the relative

elements allows of particular

importance

This work was performed

by manual inspection of

the weight structure at given output levels from the network. Generalization in the network is apparent by the high POV this is tedious and subject to error, it was immediately apparent that approximately seventy-five ~rcent of the impact on the prediction was the result of about one quarter of the inputs. Keeping in mind that only about twenty-five The results with predictions prior observation the network utilizing the prior observed The use of a single to fifty percent

values attained on the test data. The test data first evaluated yielded the higher of the values listed in the table. This result was so high that another section of data was selected for testing. That data is the lower listed.

were better than expected.

of the variance from the mean is explained in the artificial neural network model, the dominant portion of cause is attributed shift, etc. to human factors, i.e. personnel, shift, day of

as input should yield a higher POV value,

as this value, in the absence of noise and any other inputs, will usually be closer to the target value than the mean.

Considerations
It is interesting eighty utilized percentile to note that, while range, the POV the networks with one which addiWhy do traditional sults? techniques not achieve the same rethrough four prior defect observations five prior observations, yielded POVS in the Several factors are apparent in the data. The first is

for the network

achieved a dramatic

the high degree of non-linearity that exists within the system. Examination of the weight structure for the bias and input connections, non-linear a function over the input range, suggest that the function is utilized. The nature of the solution space as This can be range of the sigmoidal

tional ten percent increase. that some relationship with the addition

This would prior with

seem to indicate The of

had been discovered by the network observation. associated that input would

of the fifth

second is the discontinuous

strength of the weights this result is anticipated.

of many of the factors involved,

seem to bear out this conclusion.

Further investigation

observed in the case of a shift team scheduled in proximity to another specific team, but not to any other. This is, in essence, analogous to the exclusive OR problem. Can the results with the artificial neural network be im-

CAUSAL INFERENCE
The predictive results are significant but perhaps even more profound is the result of the efforts to determine causatity in the process. A review of the relative weights in the proved? There are a number of improvements that can be noted immediately by improving data quality and integrity.

205

First, the sampling increased sampling

of the input data can be improved. rate, improved reliability

An

Rumelhart,

D., Hinton,

D., and Williams,

G., Learning

of recording,

and replacement of the averaging process, will all improve the quality of the input data. If a filter is to be used on the sensor data, it should be carefully selected. Calibrations of sensors and quality control procedures may also eliminate many of the fluctuations which occur between various personnel combinations. The design of the neural network architecture and weights) can be to reduce in the Weigend,

internal representations by error propagation, in Rumelhart, D. and McClelland, eds., Parallel Distributed Processing, MIT Press, 1986. A. S., Huberman, Predicting submitted Vol 1, Cambridge, MA:

B. A., and Rumelhart, A Connectionist Journal

D. E.,

the Future 1990.

Approach, of Neural

to the International

Systems, April

improved by the use of demonstrated relationships the components structure. (neurons, connections, data quality The improved

Werbos, P. J., Beyond regression: New tools for prediction and analysis in the behavioral sciences, Ph.D dissertation, Comm. on Appl. Math., University, Cambridge, MA, Nov 1974. Harvard

may allow the use of

the data without filtering the source data. Techniques such as those which utilize prior defect observations may provide additional benefits.

CONCLUSION
Artificial Neural Networks can be a powerful tool in the identification of salient features in processes where traditional techniques do not perform well on their own. The artificial neural network is not a rejection of those techniques but rather an enhancement of the tool set available to the investigator, nology arena. whether in the industrial or the biotech-

Bavarian,

Behnam, Automatic zine, April,

Introduction Control, 1988.

to Neural

Networks

for

IEEE Control

Systems Maga-

Guez, Allen,

Eilbert,

and Kam, Neural Network

Architec-

ture for Control, April, 1988. Lapedes, A.

IEEE Control Systems Magazine,

processing

S., and Farber, R. M., Nonlinear signal using neural networks: prediction and LOS

system modeling, TR LA-UR-87-2662, Alamos National Laboratory, 1987. Le Cun, Y., Une procedure dapprentissage seuil assymetrique, June 1985, pp. 599-604. Parker, D. B., Learning-logic, MIT

pour reseau a 85, Paris,

in Proc. Cognitiva

Center for ComputaManagement 1985. Sci-

tional Research in Economics ence, Cambridge, MA, TR-47,

206

You might also like