Professional Documents
Culture Documents
Robert L. Stites, Bryan Ward*, and Robert V. Walters** *IKONIX *STR Corp. Inc and q
The industrial
and scientific
world
case,
that are poorly un&rstood or for which apparent anomalous conditions exist. Artificial Neural Networks are utilized with conventional techniques to extract salient features and relationships which are non-linear in nature. Defect causality in a large continuous flow chemical process is investigated. Significant gains in the prediction of defects over traditional statistical methods are achieved.
adequate model, corrective actions may be taken and a better control system for the process cart be developed. The process is continuous with high capacitance, flow, raw materials are entered product on a
into the system, material flows through several stages, each and exits as finished continuous basis. The total lag through the system is on the order of a day. This system operates around the clock, three hundred sixty-five days per year. system, quality
INTRODUCTION
Data are available A chemical processing in defect plant experienced for unexplained a multi-stage mixing time of statistical from a data acquisition control records, and handwritten maintenance records. Lapses in recording of data and modification of the data prior to recording A description are known to be problems. of aspects of the chemical Minimal
fluctuations lags.
measurements to utilize
continuous liquid flow system with significant Repeated attempts traditional methods were largely unsuccessful any causal relationships. The ultimate system objective is the control
in the determimtion
of a number
process and the recording techniques is provided. detailed information to the neural network tions are critical developing The central artificial development team.
on the process itself has been provided These descripin of the difficulties
of quality
in
order to maximize the economic potential of the plant process. A precursor to the development of any control system is the development characteristics. twofold: develop a model capable of more accurate prediction of defect measurements, and identification of causal relationships withof a model of the plant operating The focus of the effort described herein is
to the understanding
an order of magnitude
in the process. A model can be developed from either known first princi-
defect rates than traditional statistical methods. Resulting predictions do not precisely match the actual statistics but are capable defects. The neural network provides a view into a unique combinaof explaining a significant portion of the
Permission to cnpy without fee atl or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that m ying is by permission of the Association for Computing Machinery. ! o copy otherwise, or to republish requires a fee and/or specific permission.
tion of the use of production factors, physical sensors, and human factors in the prediction of defect rates.
@ 1991 ACM
089791432-519110005/0199
$1.50
199
quality control sample location are estimated based upon limited measurements. Due to the continuous mixing of the The process includes a number of stages involving mixing and processing at each stage, each with capacitances. In a continuous large large product during the various stages, actual arrival times of the product at the quality control sample point follows some undetermined considering distribution. This can be best illustrated by a situation in which a large number of colored
continuously enter into the system and the finished product exits on a continuous basis. No batches are involved. This system operates twenty-four sixty-five
Raw Materials --p WaW 1 .~~2* Mix ................. ..::*::%+ .,...:.,.,.: ...... .,:,:;:,:,:::,:::,:,:,:,:, .:::,:::,:,:,::!,:,:,:,:,:. ..,.::::,:,:,., .... 1 Stage 2 .0. Stage N
Figure 1 - Generalized process. schematic of the continuous
balls are simultaneously added to the process at the same point or location in the process. Depending on the number of mixing stages involved, the remaining they will become spread out over If the time of arrival length of the process.
Continuous
of the balls is noted at the quality control station, it can be seen that all of the balls will not arrive at the same time but will follow some distribution. Since no information was available about the distribution of the lag times of the product through the process, singular arrival times are utilized for alignment.
%+--l Stage
Data Acquisition
The acquisition the basic recording
the average value recorded with a time tag associated with the end of the interval. An immediate effect of this averaging over the recording interval is that the nyquist frequency is a variable between two and four recording time units. In addition, the non-linear filter effect of the average
flow chemical
only
about the process is made available to the use of Available in the design of the network.
operation leaves uncertain the power levels associated with the higher frequencies. Loss of data from the acquisition system occurs frequently. At times the loss is for all data for a particular time interval and at others it may be only for individual sensors. Time alignment of the sensor data to the quality control defect data indicates that a substantial number of the data do not
by the acquisition
of the control
system, records of quality control measurements, and various operating parameters for the plant are provided.
Quality Control
Quality control measurements are based upon a number of selected as most representafactors with one measurement
have a complete
as input.
tive of overall quality for this effort. This quantitative laboratory measurement requires an extended period of time to evaluate and is sampled at regular intervals of two time units. All samples are time tagged for alignment with sensor data.
Maintenance Operations
Routine maintenance operations are performed regularly with no sw-up or shutdown sequences involved during the recorded data acquisition period. Time of maintenance occurrence is poorly, if at all, recorded. Most of the maintenance functions are estimated to be of short duration, on the order of one quarter to one half recording time unit. It is considered unlikely that the impact of maintenance is reflected directly in any of the sensor data.
Sensors
Sensors for the existing control system are located throughout the process. Over four hundred sensors are provided for all standard types of measurements: temperature, humidity, pressure, etc. Redundancy and computational constraints are used to
Human Factors
Information about team organization, shift assignments, and quality control assignments are provided. Initial information from the traditional statistical efforts indicate that these human factors have been eliminated as having any causal effect on the defect rate. A major finding of the study is to the the critical importance of these human factors.
reduce the number of sensors to just over 100. Approximate time offsets of the sensors relative
200
enhanced if different
Data are recorded over a long period onto magnetic media, and written form, The sensor data is time aligned to the defect quality measurements and the handwritten data is added. Data dropouts are noted with special out of bounds values. Due to the extremely determined large number of data dropouts, it is
Filtering
The sensor and the defect data are low pass filtered, an eight pole butterworth frequency content. Prior represented by adjacent known samples. in the output of the filter data, they are re-marked data is minimally filter, to eliminate using the high
to filtering, missing data is the interpolated value between the two While some ringing is observed at these missing portions of the as missing data and the remaining
samples rather than as a time series in training the network. Standard statistical values are ascertained on all of the
impacted.
sensors and the defect data. Evaluation of the sensor and quality data utilizing:
Weigend, et al [90], among others, note that the network will first attempt to fit the strongest features and then later The filtering of the data should attempt to fit the noise. hopefully increase the rate of convergence to a solution by the network.
auto and cross covariance, fourier analysis, and spectral correlation are used. redundant A number information, of the sensors are eliminated and some specific filtering as
NEURAL NETWORK
Time and funding constraints plexity network propagation an artificial
DESIGN
with the comneural in
in conjunction
of the
of the problem suggests that the use of a back error approach is the best candidate artificial neural network is represented strength. paradigm to apply to the problem. Knowledge
data is indicated.
in two basic
3 is utilized.
The network with two hidden is based upon known within are based primarily
layers
Groupings of the network nodes spatial, temporal, elements. upon what little the network
relationships
relationships
particular sensors. Additioml process factors such as shift and quality control personnel are also included and grouped into subnetworks. The output weight, neuronal, or processing, element element utilized multiplied in this by a These
Similarities frequency
exist between sensors and the quality data in the domain. Spectral evaluation of both the defect network is one in which the inputs are the product of the of some other neuronal Wji, which is unique for each connection.
data and the input data reveals that a number of periodicities exist in the data. In addition, three dominant frequency groupings permeate all of the sensor and quality data, in differing combinations. These frequencies of 227, 24.98, and 11.77 time units are not precise but rather fuzzy in nature, centered at these frequencies. The most dominant, 24.98 time units, is illustrated in figure 2. The power level of this period permeates almost all of the sensors and the defect measurements. The cepstrum also contains a dominant periodicity at the same interval. Interestingly, the relative significance of these frequencies are diminished or
inputs are summed and the sum pmcesscd by a function. The output of a neuronal element can be expressed as
Opi = f
(Dvijo,j) j
(1)
function,
201
Input Layer
Hidden Layer 1
other unit.
Group 1 : Inputs
x :N
. ~ :
q
NETWORK
TRAINING
The data are divided into two segments, the first utilized for training of the network and the second for testing. The data are presented to the network and a comparison of the defect
Group 2: lnDuts
\\7N--W
prediction
from
the network
is made with
the observed
quality control measurement. The difference is then used to propagate the error back through the various elements and layers of the network. An iterative improvement evaluation in of the network structure of of procesis the
planned
Inputs : A : /XV
through
the performance
network and some reconfiguration or elimination sing elements and interconnections. Interconnection strengths are modified
Individual Inputs
q q
during the learning period. Weights are modified after each pattern presentation rather than at the completion of one complete epoch. Experience also suggests that as the network appears to approach its convergence limit, some random perturbation in the training sequence may allow a further reduction in the error.
Figwe
of
neurons
in network.
by Le Cun
of Rumelhart
Neural
Input
Network
Structure
113
tive is to minimize the sum of the squared error ( target, tpi, less the output, oPi) over a series of pattern, p, presentations.
Units
;~
(tPi-oP,)
(3)
1=1
across the network in a manner
which allows the interconnection weights, Wij, to be modified according to the following rule at time n+l: Aw,; =
@W;+
?@piOpi
(4)
in equation 2, 1 -Opi)
tip, =
f(a) = _
(rpi-opi)opi(
(5)
for an output
(2)
represents the amount of error attributable unit, and for any hidden layer units.
Connections to a bias, or constant, unit and the weight for that connection are not explicitly shown. Bias unit connections are treated in the same manner as a connection to any
202
Pov
Proportion of Variance
Coefficient of Determination
*loo.
9)
12
1
(lo)
X# 0=(0
As the term proportion
Figwe 4 Basic neuron element illustrating the dependence prior layer. the weighted sum of the output from a
of variance
would
suggest, this
metric is a measure of that portion of the variance from the mean of the target signal is explained by the output of the neural network. correlation statistical This metric is also available, for the results through the coefficient, of the traditional
(6)
of the patterns can sometimes the order. In this case, A sequential order
learning
The momentum term, u, provides a smoothing of the weight modification over ~ entire epoch. Note-that the weights are actually modified after the presentation of each pattern, p. The learning rate, q, is used to determine the amount that each weight is modified. was determined that a=.7 and q=.5 (8) (7) For this problem, it
approach an asymptote, one or two random updates would be used. This perturbation to the network would briefly increase the squared error but would allow the network to rapidly settie to a lower squared error then that initially observed.
Convergence Criteria
Convergence of the network is monitored in two terms. during an of The mean squared error is monitored change, above a given threshold, epoch. Variance. The second term monitored to determine that no is the Proportion
has occurred
were reasonable starting values. The momentum term and learning rate were both reduced as the squared error appeared to approach a minimum.
begin to decrease steadily over several epochs, while at the same time the mean square error is still decreasing.
targets are used to develop the Proportion for the network at that point.
of Variance value
expressed as a percentage
in proportion
of the variance
203
Figure
5- Neural
Network
Prediction
Figure
6- Neural
Network
Figure
7- Neural
Network
Prediction
Observation,
POV = 81.2%
Figure
8- Newal
Network
Data.
Although
much of the
by conducting
the training
in two
noise had already been removed by the filter discussed in an earlier section, a decision was made to observe the performance of the network when prior observations of the defect rate where utilized as input. These past observations were in addition to those inputs already in use by the network. Trials with from one to five of the most recent observations of defect rates were utilized as inputs. The highest POV
stages. First, the training of the network described in the body of the paper, is completed. The outputs of this network are then utilized along with the prior observed defects, as inputs to a smaller network. The results of this network variation are included in Table III for completeness with some discussion of their implication paragraphs. in the following
204
RESULTS
The neural network traditional metric produced predictions This estimate or the coefficient of the defect rate is based upon the of determination. techniques
Tab& III
for the chemical process are ten to twenty times better than techniques. of the POV
Proportion
Using Prior
of Variance
Observations Training Data 8~% 82.1% 83.6% 86.7% 97.6%
Independent estimates of the POV by traditional had only been performed not performed oped results are portrayed
for the training data set and were in table II. One Prior Two Prior Observation Observations Observations Observations Observations
Tahk
II - Results
Three Prior
Proportion
of Variance
Four Prior Training Data 2=0 47.6% Test Data 27.2% 10 43.2% Five Prior
Traditional Artificial
importance
by manual inspection of
the weight structure at given output levels from the network. Generalization in the network is apparent by the high POV this is tedious and subject to error, it was immediately apparent that approximately seventy-five ~rcent of the impact on the prediction was the result of about one quarter of the inputs. Keeping in mind that only about twenty-five The results with predictions prior observation the network utilizing the prior observed The use of a single to fifty percent
values attained on the test data. The test data first evaluated yielded the higher of the values listed in the table. This result was so high that another section of data was selected for testing. That data is the lower listed.
of the variance from the mean is explained in the artificial neural network model, the dominant portion of cause is attributed shift, etc. to human factors, i.e. personnel, shift, day of
as this value, in the absence of noise and any other inputs, will usually be closer to the target value than the mean.
Considerations
It is interesting eighty utilized percentile to note that, while range, the POV the networks with one which addiWhy do traditional sults? techniques not achieve the same rethrough four prior defect observations five prior observations, yielded POVS in the Several factors are apparent in the data. The first is
achieved a dramatic
the high degree of non-linearity that exists within the system. Examination of the weight structure for the bias and input connections, non-linear a function over the input range, suggest that the function is utilized. The nature of the solution space as This can be range of the sigmoidal
tional ten percent increase. that some relationship with the addition
had been discovered by the network observation. associated that input would
of the fifth
Further investigation
observed in the case of a shift team scheduled in proximity to another specific team, but not to any other. This is, in essence, analogous to the exclusive OR problem. Can the results with the artificial neural network be im-
CAUSAL INFERENCE
The predictive results are significant but perhaps even more profound is the result of the efforts to determine causatity in the process. A review of the relative weights in the proved? There are a number of improvements that can be noted immediately by improving data quality and integrity.
205
An
Rumelhart,
D., Hinton,
G., Learning
of recording,
and replacement of the averaging process, will all improve the quality of the input data. If a filter is to be used on the sensor data, it should be carefully selected. Calibrations of sensors and quality control procedures may also eliminate many of the fluctuations which occur between various personnel combinations. The design of the neural network architecture and weights) can be to reduce in the Weigend,
internal representations by error propagation, in Rumelhart, D. and McClelland, eds., Parallel Distributed Processing, MIT Press, 1986. A. S., Huberman, Predicting submitted Vol 1, Cambridge, MA:
D. E.,
Approach, of Neural
to the International
Systems, April
improved by the use of demonstrated relationships the components structure. (neurons, connections, data quality The improved
Werbos, P. J., Beyond regression: New tools for prediction and analysis in the behavioral sciences, Ph.D dissertation, Comm. on Appl. Math., University, Cambridge, MA, Nov 1974. Harvard
the data without filtering the source data. Techniques such as those which utilize prior defect observations may provide additional benefits.
CONCLUSION
Artificial Neural Networks can be a powerful tool in the identification of salient features in processes where traditional techniques do not perform well on their own. The artificial neural network is not a rejection of those techniques but rather an enhancement of the tool set available to the investigator, nology arena. whether in the industrial or the biotech-
Bavarian,
to Neural
Networks
for
IEEE Control
Systems Maga-
Guez, Allen,
Eilbert,
Architec-
processing
S., and Farber, R. M., Nonlinear signal using neural networks: prediction and LOS
system modeling, TR LA-UR-87-2662, Alamos National Laboratory, 1987. Le Cun, Y., Une procedure dapprentissage seuil assymetrique, June 1985, pp. 599-604. Parker, D. B., Learning-logic, MIT
in Proc. Cognitiva
206