You are on page 1of 15

Proceedings of the Institution of Mechanical

Engineers, Part E: Journal of Process


Mechanical Engineering
http://pie.sagepub.com/

Plant machinery working life prediction method utilizing reliability and condition-monitoring data
K B Goode, J Moore and B J Roylance
Proceedings of the Institution of Mechanical Engineers, Part E: Journal of Process Mechanical Engineering 2000 214:
109
DOI: 10.1243/0954408001530146

The online version of this article can be found at:


http://pie.sagepub.com/content/214/2/109

Published by:

http://www.sagepublications.com

On behalf of:

Institution of Mechanical Engineers

Additional services and information for Proceedings of the Institution of Mechanical Engineers, Part E: Journal of Process Mechanical
Engineering can be found at:

Email Alerts: http://pie.sagepub.com/cgi/alerts

Subscriptions: http://pie.sagepub.com/subscriptions

Reprints: http://www.sagepub.com/journalsReprints.nav

Permissions: http://www.sagepub.com/journalsPermissions.nav

Citations: http://pie.sagepub.com/content/214/2/109.refs.html

>> Version of Record - May 1, 2000

What is This?

Downloaded from pie.sagepub.com at INDIAN INSTITUTE OF TECHNOLOG on December 2, 2014


109

Plant machinery working life prediction method


utilizing reliability and condition-monitoring data
K B Goode1, J Moore1 and B J Roylance2*
1
British Steel Strip Products, Port Talbot, Wales, UK
2
Department of Mechanical Engineering, University of Wales Swansea, UK

Abstract: A recently developed condition-based maintenance model is described which utilizes


reliability data combined with condition-monitoring measurements to predict the remaining useful life
of critical components in a hot strip steel mill. The results obtained from case studies are presented
which indicate how the model can be used as part of a condition-based maintenance strategy.

Keywords: condition-based maintenance model, condition monitoring, hot strip mill, steelworks,
failure prediction, life prediction, reliability, statistical process control

1 INTRODUCTION described in reference [2]. This paper brings together


these two pieces of work to provide a more detailed
In a highly competitive industry, steelworks management account of the methods used to establish the prediction
has to focus continually on achieving increased product model and how it functions in a condition-based
performance, quality and efficiency in order to maintain a maintenance environment involving plant machinery.
fair share of the available market and to improve its The model has been developed on the basis that the
customer base. In an integrated steelworks complex, the failure pattern can be divided into two distinct phases,
hot strip mill (HSM) is constantly a crucial area of namely stable and unstable phases, which can be
operation in which unscheduled failure or breakdown of distinguished from each other by using statistical process
machinery can critically affect production downtime and control methods. Depending on the way in which the
associated risk of a reduction in finished goods quality. machinery progresses to failure, one of two methods is
For several years, steel companies in the UK have employed to predict the remaining machine life. The first
practised condition-based maintenance in strategically relies entirely on a reliability model, while the second
vital areas such as the HSM. The methods of monitoring method uses a novel combination of reliability and
employed cover a wide spectrum of activity which condition monitoring measurements to narrow down the
includes vibration analysis, oil and wear debris analysis, time to a failure window. After describing the
and performance monitoring using numerous techniques methodology used to generate the predictive model, the
to measure parameters such as electric motor current and results of a case study will be presented to show how the
temperature. The present utilization of these methods model can be utilized as part of a condition-based
enables plant maintenance personnel to detect and also, maintenance strategy within the HSM.
very often, to diagnose pending failure of equipment.
What they are unable to do with a higher degree of
certainty is to predict the remaining useful life of failing
2 DEVELOPMENT OF A PREDICTION MODEL
components.
THEORY
The predictive model was initially developed using
artifically generated failures and has been described in
reference [1], with early trials using real failure data 2.1 Some basic aspects
For the purpose of identifying whether a potential failure
problem exists in the HSM, normal alarm limits are
utilized on which the levels are periodically adjusted
The MS was received on 12 May 1999 and was accepted after revision according to factors such as operational experience,
for publication on 30 March 2000.
*Corresponding author: Department of Mechanical Engineering, machine supplier recommendations, previous failure
University of Wales Swansea, Singleton Park, Swansea SA2 8PP, UK. data, or national and international standards. The
E00899 IMechE 2000 Proc Instn Mech Engrs Vol 214 Part E

Downloaded from pie.sagepub.com at INDIAN INSTITUTE OF TECHNOLOG on December 2, 2014


110 K B GOODE, J MOORE AND B J ROYLANCE

problems imposed by reliance on these methods is that, if gained [9]. Whether this regime is identified as infant
the alarm limits are set too high, the machine may fail mortality, random failure or wear-out, corrective main-
without sufficient advanced warning. If the limits are set tenance actions can be implemented to improve machine
too low, the machine will generate false alarms that can reliability further.
obscure a true warning until it is too late. Experienced Proportional hazard modelling (PHM), developed by
machine operators and maintenance personnel learn, Cox [10], combines a baseline hazard rate, stated as a
from experience, how to distinguish between false and function of time, and a hazard function based on the
true alarms. However, such people are not always present machine condition variables. The PHM technique was
and, hence, a need exists to try to mimic such applied, by Jardine [11, 12], to aircraft and marine
experiences. If this is attempted through the development engine failures in which the metal concentration
of a failure prediction model, a number of difficulties are measurements of the engine oil were used as the
encountered. condition variables. Jardine was able to show that the
Some of these difficulties may be addressed by PHM approach was superior to the equivalent time-
employing a statistical process control (SPC) approach. dependent hazard rate modelling and, through estimating
SPC theory assumes that operating measurements failure rates, more effective maintenance decisions could
obtained from a correctly functioning machine will be made.
normally vary around an average value [3]. If the In practice, a regular inspection strategy is often
machine malfunctions however, this natural measure- employed to monitor critical machinery. Christer and
ment pattern may change, indicating the source of the Waller [13, 14] developed a technique called delay time
problem. By setting suitable alarm limits, SPC can be analysis for modelling the consequences of such a policy.
utilized to distinguish such measurements in terms of The delay time is defined as the time period from the
stable or unstable regions [4]. However, the setting of point at which a defect is noticeable to the point at which
such limits is both machine and process dependent and the defect causes a failure. A repair is therefore possible
needs to be conducted for each individual situation. It is at any time within this period. Christer and Waller were
therefore, a time-consuming and failure data-intensive able to obtain a subjective estimate, through the use of
activity which may be alleviated to some extent by questionnaires, of the probability density function f(h) of
comparing measurements from a group of otherwise the delay time. Once f(h) is established, it is possible to
similar machines, thereby providing a larger population develop subsequent models describing the relationships
of failures from which to extract data and establish between time and other relevant variables such as
realistic alarm limits. expected downtime and operating costs.
In predicting the remaining useful life of a machine, Using historical reliability data to predict the future
previously developed models were based on the ex- performance of similar machines requires an assumption
tensive use of reliability data coupled to a number of that the historical and current performance are highly
simplifying assumptions [57]. Although the prediction correlated. In reality, this is rarely the case and, instead,
obtained from such models is seldom precise enough for individual assessments of a machines health are more
predicting the remaining life of individual machinery, effective in identifying when a problem exists. This can
they have been found to be useful for optimizing be achieved through the use of strategic machinery
maintenance strategies. health-monitoring techniques and the intelligent use of
A commonly encountered reliability model applied to predetermined alarm limits. Fitch [15] highlighted four
repairable systems is the renewal process. It assumes that, methodologies in setting such alarm limits; goal based,
when a machine fails, it is repaired perfectly, i.e. as good ageing, rate of change and statistical limits.
as new, and that times between failure are independent Moubray [16] examined condition monitoring failure
and identically distributed. When these assumptions hold patterns and highlights the use of the PF interval. A PF
true, the process is said to be stationary. A special case of interval is the time taken for a machine to reach
the renewal process is when the times between failures functional failure F from the point at which condition
are independently and exponentially distributed with a monitoring could have identified a potential problem with
constant failure rate. This is known as the homogenous the machine P. Provided that the condition-monitoring
Poisson process. It is known that the probability of some interval is no greater than the PF interval, a functional
arbitrary number of failures exhibits a Poisson distribu- failure should never occur without warning. In practice,
tion. From such reliability-based models, hazard rate however, the PF interval is difficult to quantify and for
functions can be constructed and used to optimize critical machinery, with a short PF interval, continuous
scheduled maintenance periods. However, in practice, monitoring may be justified. From the above review of
their construction is generally a function of many some relevant developments, it is evident that two models
variables, including design, operating conditions, envir- are required; one to describe when the component
onment and quality of repairs [8]. deteriorates, and the other to relate the degree of
By conducting a Weibull analysis on the failure data, deterioration to the condition of the equipment by the
an insight into the dominant failure regime can often be use of monitoring procedures.
Proc Instn Mech Engrs Vol 214 Part E E00899 IMechE 2000

Downloaded from pie.sagepub.com at INDIAN INSTITUTE OF TECHNOLOG on December 2, 2014


PLANT MACHINERY WORKING LIFE PREDICTION METHOD 111

Fig. 1 Typical failures

2.2 Prediction model theory that a problem has occurred, and it can be said that the
machine has entered the failure zone. The function of the
A theoretical solution is presented in the following alarm limits is to identify when this deviation becomes
section to three problems commonly encountered while significant and, hence, when the failure zone is entered.
practising condition monitoring: setting appropriate Observations made by the present authors on numerous
limits, identifying when next to monitor the machinery, sets of failure data indicate that, while in the failure zone
and predicting when to change or repair the machine after it is reasonable to assume that the machines health
a problem has been identified. The theory is based on the degrades as an exponential function. It should be noted,
assumption that the life of a machine can be divided into however, that not all machines exhibit this characteristic
two distinct regions, namely a stable zone and failure but, in the first instance, the model will be developed on
zone, which can be distinguished between the observed the basis of this assumption.
condition-monitored measurements.
Figure 1 shows typical vibration analysis data as a
function of time, taken from a hot strip mill condition- 2.2.1 Setting alarm limits
monitoring programme for a hydraulic pump system The first part of the model is concerned with ensuring the
which subsequently failed. Failure in this context earliest identification of a problem through setting
represents a change in the machines state such that it appropriate alarm limits. As stated previously, current
can no longer be relied upon to perform the function for condition-monitoring programmes rely mainly on alarm
which it was originally intended. limits being pre-set using manufacturers recommenda-
A general failure condition is self-evident, which may tions, British Standards, e.g. BS 7854 [17], and operators
be represented schematically as shown in Fig. 2. In the personal experience. Unfortunately, based on observa-
stable zone, the machinery is functioning correctly and tions arising from the current research programme, it is
condition-monitoring measurements, assuming that they evident that these alarm limits are often set either too high
reflect the true health of the machine, are varying or too low. This results in either too frequent alarm
randomly about an average value. The variance is reports, which ultimately are ignored due to compla-
probably due to process changes between successive cency, or an even worse situation whereby a machine
measurements and/or condition measurement error. fails without any warning, because the alarm limit was
When the condition measurements start to deviate not exceeded. Clearly, there is a balance between the
significantly from these values, it soon becomes apparent number of false alarms and the earliest detection of a
problem which is also influenced by the amount of time
required to maintain the malfunctioning machine in
relation to the rate of machine deterioration.
Based on these observations, it became clear that the
problem of setting appropriate alarm limits may be better
treated through the use of SPC. Provided that sufficient
stable zone data are available, and that the measurements
may be assumed to follow a normal distribution, the
average and standard deviation of the stable zone
condition measurements can be calculated.
In defining alarm limits, three standard deviations
either side of the average stable zone value, SPC theory
Fig. 2 General failure pattern states that 99.73 per cent of subsequent measurements
E00899 IMechE 2000 Proc Instn Mech Engrs Vol 214 Part E

Downloaded from pie.sagepub.com at INDIAN INSTITUTE OF TECHNOLOG on December 2, 2014


112 K B GOODE, J MOORE AND B J ROYLANCE

should fall within this band. If the condition measure- describing the time from the point of a problem being
ments lie outside these limits, then the measurement has detected to failure.
become unstable and it may be assumed that the If the IP and PF time intervals are shown to be
machine condition has entered the failure zone. constants, then the total time to failure (TTF) of the
machine, at any time t, is given by

2.2.2 Predicting the time to failure TTF IP PF t 1

In order to develop any form of prediction model, a However, it is generally recognized that estimating the IP
machine life model must be described. It is assumed that and PF intervals is difficult since they are often not time
a machine begins its life at the installation point I and is dependent and rarely accurately approximated by a
in a good functional state. Given that there is sufficient constant value. Instead, it is more realistic to represent
time, during which the machines condition is relatively these intervals using a suitable distribution. The follow-
constant, the machine eventually experiences a problem ing Weibull distribution offers the most benefits due to its
and begins to deteriorate until functional failure point F is flexibility in describing many types of distribution, and
reached, the point at which the machine can be said to its mathematical simplicity:
fulfil no longer its specified requirements.
If it is assumed that the condition-monitoring measure- "  b #
ments reflect the machines health, then, when operating Int
Ft 1 exp 2
correctly, it is equal to an average measurement, defined Z
as the lower limit (LL). A functional failure is defined by
the condition measurements reaching a specific value, the where
upper limit (UL), above which the machine no longer
operates. Once the alarm limit (AL), the point at which F(t) cumulative probability
the condition-monitoring measurements identify a prob- t elapsed time since machine installation
lem, has been defined, the potential failure point P can b shape parameter
also be introduced to the model, as illustrated in Fig. 3. g location parameter
This description of machine deterioration is similar to Z characteristic life parameter
that used by Moubray [16]. However, the plots have been Int interval time (either IP or PF)
developed to resemble more closely the overall vibration
monitoring deterioration profiles commonly found in the b, g and Z are constants derived from an analysis of the
steel industry. machines historical failures.
The potential failure point is used to define the Rearranging equation (2) with respect to the interval
transition from the stable to the failure zone. The stable time, the following expression is obtained:
zone time is defined as IP, the time from machine
installation to the potential failure point. This contains
largely condition-monitoring measurements which are Int Zf ln1 Ftg1=b 3
randomly varying around the lower limit. When the
condition measurements exceed the alarm limit, it is By using this expression to define the time intervals IP
assumed that the machine has entered the failure zone and PF, TTF now becomes
and will deteriorate, at an exponential rate, towards a
functional failure after time PF. For comparison, the PF TTF IP ZIP f ln1 Ftg1=bIP PF
notation is similar to that used by Moubray [16] in
ZPF f ln1 Ftg1=bPF 4

The solution to equation (4) can be obtained using a


Monte Carlo approach, with TTF being predicted as a
cumulative probability distribution, similar to Fig. 4.
However, this formula does not take into account the
survival of the machine to the current time and is, hence,
only strictly true for t 0 and t IP. Equation (5)
represents the condition when t 0:

TTF IP ZIP f ln1 Ftg1=bIP PF

Fig. 3 Machine life model ZPF f ln1 Ftg1=bPF 5


Proc Instn Mech Engrs Vol 214 Part E E00899 IMechE 2000

Downloaded from pie.sagepub.com at INDIAN INSTITUTE OF TECHNOLOG on December 2, 2014


PLANT MACHINERY WORKING LIFE PREDICTION METHOD 113

Fig. 4 Cumulative probability of reaching functional failure

and, if t IP, the following equation is derived: which to base a prediction of the time to failure, are its
historical reliability. Equation (5) gave one solution,
TTF PF ZPF f ln1 Ftg1=bPF 6 which predicted the ultimate functional failure from the
installation point I. However, it did not account for the
The condition-monitoring measurements may be thought machine surviving to the current time. In this section,
of as a switch, which moves the TTF prediction from another approach will be developed which predicts the
the stable to the failure zone calculations. It is argued that likelihood of failure over a period of time, given any time
this switch approach would give a better indication of the in the stable zone.
remaining life of the machine than relying solely on a While in the stable zone the chance of a functional
reliability analysis of functional failure times. failure, in a specified time period (n 1) dt n dt, is
However, it is possible to obtain an even better simply a function of the probability to reach the potential
prediction by using more of the available data. In the failure point, P(reaching potential point in time Y) and the
stable zone the machine age has yet to be incorporated. probability of reaching the functional failure point in the
Similarly, the condition-monitored measurements have remaining time, P (reaching functional failure in time Z),
so far only been utilized to trigger a switch. The as illustrated in Fig. 5. Therefore to calculate the
development of a failure zone model could result in a probability of failure in the increment, all individual
condition-based prediction model which employs the failure combination probabilities are summed, as follows:
condition measurement data more efficiently. In the next
two sections the means whereby these additional data can
X
be incorporated to achieve a better prediction of the P(functional failure) Ppotential failure in Y
failure time is described.
Pfunctional failure in Z
2.2.3 Failure prediction in the stable zone 7
In the stable zone, condition-monitoring measurements
provide little information except to confirm that the In theory, there are an infinite number of possible
machines health is fine. Therefore, the only data on failure combinations and hence, to reduce this number, it

Fig. 5 Probability of functional failure in increment


E00899 IMechE 2000 Proc Instn Mech Engrs Vol 214 Part E

Downloaded from pie.sagepub.com at INDIAN INSTITUTE OF TECHNOLOG on December 2, 2014


114 K B GOODE, J MOORE AND B J ROYLANCE

Fig. 6 Illustration of the multiple failure combinations

is assumed that it takes a small amount of time, dt, to time remaining from the potential failure point, the
move from the stable to failure zone. The failure cumulative density function for the PF Weibull distribu-
combination probabilities for the first four time incre- tion is used:
ments are shown in Fig. 6.
It is noticed that, for each time increment, another
failure combination is possible and, due to the assump- Pfunctional failure in Z Fz dt Fz 1 dt
tion that it takes one increment to move from a stable to 10
failure zone, there is no chance of a failure in the first
increment. Hence, to calculate the probability of func-
tional failure in the interval, the probability of functional Hence, the probability of an overall functional failure
failure during each increment is calculated and summed between 0 and 1 dt is zero.
as follows: The probability of functional failure between 1 dt and 2
dt is
P(functional failure in interval)
n interval=dt
X FIP t dt FIP t
P(functional failure in increment) FPF dt
1 FIP t
n2
8
The probability of functional failure between 2 dt and 3 dt
To find the probability of reaching a potential failure is
point, the Weibull distribution hazard rate of the IP
interval is used. The hazard rate h(t) is defined as the
probability of failure over the next time increment dt, FIP t dt FIP t
FPF 2 dt FPF dt
assuming that no failure has occurred up to current time t. 1 FIP t
This is expressed as
FIP t 2 dt FIP t dt
FIP t dt FIP t FPF dt
hIP t 9 1 FIP t
1 FIP t

To calculate the probability of functional failure in the The probability of functional failure between 3 dt and 4 dt
Proc Instn Mech Engrs Vol 214 Part E E00899 IMechE 2000

Downloaded from pie.sagepub.com at INDIAN INSTITUTE OF TECHNOLOG on December 2, 2014


PLANT MACHINERY WORKING LIFE PREDICTION METHOD 115

is and the probability of failure in the nth interval dt is

FIP t dt FIP t FPF t n dt FPF t n 1 dt


FPF 3 dt FPF 2 dt
1 FIP t 1 FPF t
FIP t 2 dt FIP t dt
FPF 2 dt FPF dt Again, these probabilities can be summed and plotted
1 FIP t against time to indicate the probability of functional
FIP t 3 dt FIP t 2 dt failure in a manner similar to the stable zone prediction.
FPF dt Since the use of condition monitoring has eliminated the
1 FIP t
use of the IP interval, it is argued that, while in the failure
zone, a more accurate prediction of the failure time will
The probability of functional failure between (n 1) dt
be achieved. However, there are two drawbacks to using
and n dt is
this approach. First, an accurate estimate of the potential
FIP t dt FIP t failure point is needed and, second, the condition-
fFPF n 1 dt FPF n 2 dtg monitoring measurements are still only being used to
1 FIP t trigger an alarm.
An improved model may possibly be achieved by
FIP t 2 dt FIP t dt
fFPF n 2 dt making further use of the condition-monitoring informa-
1 FIP t tion to track the progression of the problem until
FPF n 3 dtg functional failure occurs. Such a condition-based predic-
tion model eliminates the need to acquire an accurate
FIP t 3 dt FIP t 2 dt estimate of the potential failure point and incorporates
fFPF n 3 dt
1 FIP t condition-monitoring measurement data into the func-
tional failure prediction. In order to achieve these
FPF n 4 dtg
improvements, a model of the failure zone pattern must
 first be developed. Figure 7 shows an idealized failure
zone pattern.
until there are n 1 expressions. The failure begins at the lower limit LL, the average
Once the probability of functional failure for each condition measurement value within the stable zone. The
individual increment has been calculated, they are condition-monitored measurement X(t) increases until it
cumulatively plotted against time to indicate the prob- is detected passing through the alarm limit AL, where the
ability of failure in any time interval, using equation (8). potential failure point is identified. Eventually the
Hence, by using such plots it is possible to predict the condition becomes sufficiently serious that the upper
likelihood of a functional failure, given any time interval limit UL is reached and a functional failure occurs. It is
over which the machine is required to run. For each new assumed that the failure pattern is approximated by an
condition measurement indicating that the machine is exponential curve and hence, a model of the failure zone
operating correctly, a new plot must be produced. is defined as
However, if the condition-monitoring measurements
indicate a problem, i.e. the potential failure point has X t LL AL LL
been reached, a new approach must be used which makes  
lnUL LL=AL LL
use of the condition-monitoring information.  exp t
PF
2.2.4 Failure prediction in the failure zone 11

When the condition-monitoring measurements have


exceeded the alarm limit, it is assumed that a problem
exists with the machine, the failure zone has been entered
and a functional failure is approaching. If the approxi-
mate potential failure point is known, an analysis of the
reliability data may be undertaken which uses the hazard
rate to predict the time to functional failure. Given any
time t from the potential failure point, the probability of
failure in the next interval dt is

FPF t dt FPF t
1 FPF t Fig. 7 Failure zone model
E00899 IMechE 2000 Proc Instn Mech Engrs Vol 214 Part E

Downloaded from pie.sagepub.com at INDIAN INSTITUTE OF TECHNOLOG on December 2, 2014


116 K B GOODE, J MOORE AND B J ROYLANCE

Values of LL and AL may be obtained from statistical


modelling of the stable zone. The estimate of UL, the
point of functional failure can be set using a number of
parameters, including British Standards, manufacturers
recommendations or the operators past experience. The
value of the PF interval must also be estimated. This is
again best achieved using the Weibull distribution
function. If equation (11) is rearranged, an expression
for t with respect to the measured condition X(t) is
obtained as
 
lnfX t LL=AL LLg
t PF 12
lnUL LL=AL LL
Fig. 8 Highlow plot generation
This t is an estimate of the elapsed time since the
potential failure point. Hence, to predict the remaining
time to functional failure, estimates of t and PF are 2.2.6 Monitoring interval times
substituted into
It was stated in the previous section that current
TTF PF t 13 condition-monitoring interval times are based largely
on fixed times, developed through the use of British
to give Standards, equipment manufacturers recommendations
and operator experience. The major benefit of using such
  a fixed time interval is the ease of maintenance
TTF PF ZPF f ln1 Ftg1=bPF implementation. However, this compromises the effi-
  ciency of the data collection especially when a clear time-
lnfX t LL=AL LLg
 1 dependent failure pattern exists.
lnUL LL=AL LL In the previous sections, various models were devel-
14 oped which predict the time to functional failure through
the use of cumulative distribution functions. Provided
The form of the predicted time to functional failure will that an acceptable risk of failure between successive
be a distribution, in this case a probability density condition measurements can be established, the same
function, which can be easily converted into a cumulative plots may also be employed to predict the condition
density function. It should be noted that, by using this monitoring interval times, as shown in Fig. 9. Clearly, the
method to predict the functional failure, an accurate acceptable risk of functional failure will vary for
identification of the potential failure point is not individual machines; critical machines having a smaller
necessary. acceptable risk than non-critical machines.
To calculate the condition-monitoring interval time
while in the stable zone, the risk must be compared with
2.2.5 Prediction model output and interpretation the previously calculated cumulative incremental prob-
So far, the model formulates a time to failure prediction
in the form of a cumulative density function. Although
this function provides a complete description of the
probability of failure against time, it is a time-consuming
activity and successive function curves look cumbersome
and could easily be misinterpreted by an inexperienced
person. Hence, a method is needed to convert the model
output into a simpler form which still conveys the
necessary information. The use of a highlow chart is
seen as a tool to achieve this, as shown in Fig. 8.
Using the highlow plots, a quick and simple
interpretation of the prediction can be gained; within
the broad dark time band there exists an 80 per cent
chance of failure and the average TTF is located by the
peak. The 10, 50 and 90 per cent limits are subjective and
could be altered depending on the accuracy required in Fig. 9 Graphic calculation of the condition-monitoring (Con.
identifying the potential risk of failure. Mon.) interval
Proc Instn Mech Engrs Vol 214 Part E E00899 IMechE 2000

Downloaded from pie.sagepub.com at INDIAN INSTITUTE OF TECHNOLOG on December 2, 2014


PLANT MACHINERY WORKING LIFE PREDICTION METHOD 117

ability of functional failure. However, in the failure zone, of the next condition-monitoring measurement will be
equation (14) is used with F(t) being equal to the obtained and a further assessment of the machines health
acceptable risk: is conducted.
 
Interval PF ZPF ln1 Risk1=bPF
 
lnfX t LL=(AL LL)g 4 PREDICTION MODEL TRIALS
 1
ln(UL LL)=(AL LL)
15 To test and validate the maintenance model the following
studies were undertaken:
Alternatively, given a pre-set condition monitoring
(a) a simulated assessment of the model where the data
interval, equation (15) can be rearranged to indicate the
inputs are always known;
probability of functional failure before the next monitor-
(b) specific case studies of real machines used within
ing time is reached. This could provide useful informa-
British Steel, to examine the performance of the
tion in making a decision either to maintain a machine
model with historical failures.
immediately, or to allow it to run until the next planned
maintenance period.

4.1 Simulated data trial


3 PREDICTION MODEL IMPLEMENTATION To illustrate the way in which the model is best designed
to function, a computer program was written to simulate
The implementation of the maintenance model is ideally typical machine failure patterns of the type observed to
represented schematically in Fig. 10. Condition-monitor- occur frequently in the hot strip mill. The simulated
ing measurements are examined to verify whether or not machine failure pattern, shown in Fig. 11, is one such
they are within the alarm limits. If they are within the instance.
limits, the machine is considered to be stable and the TTF The prediction model results for each measurement
is predicted using the equations derived in Section 2.2.3. point are given in Fig. 12. In the stable zone a wide
If, however, the measurements exceed the alarm limits, prediction distribution is observed, reflecting the models
the machine has entered the failure zone and a functional dependence on reliability data. When the condition
failure is imminent. In this scenario the TTF is predicted measurements exceed the alarm limits and the failure
using the equations derived in Section 2.2.4. If these zone is entered, the prediction model is able to identify a
predictions indicate it to be necessary, the machine will potential problem. This triggers the use of the combined
be maintained. However, if no maintenance is required, condition-monitoring and reliability failure zone model,
then the next monitoring time will be calculated, the point resulting in the subsequent failure forecasts becoming

Fig. 10 Schematic diagram of prediction model implementation


E00899 IMechE 2000 Proc Instn Mech Engrs Vol 214 Part E

Downloaded from pie.sagepub.com at INDIAN INSTITUTE OF TECHNOLOG on December 2, 2014


118 K B GOODE, J MOORE AND B J ROYLANCE

operation beyond the last measurement, a significant


chance of failure during the monitoring interval existed
(about 82 per cent). Such information would be very
useful within a maintenance strategy for deciding when
to replace a machine. This is especially relevant when
restricted windows of maintenance exist.

4.2 Case studies


A number of hydraulic pumps located on the hot strip
mill, and subjected to regular condition monitoring using
vibration analysis, were selected for a case study. The
present condition-monitoring methods and strategies
Fig. 11 Simulated condition-monitoring measurements
used on the mill are generally very effective for
identifying pumps which require attention before cata-
strophic failure occurs. However, no method currently
exists for predicting the remaining useful life of the
pumps while they are still in operation. It is possible,
therefore, that the pumps are not achieving their optimum
operational life.
The machine group selected for the initial assessment
comprises three double-vane pumps, each delivering 320
l/min of hydraulic fluid at 160 bar pressure. The pumps
are each driven by a 120 kW electric motor at
1485 r/min. The system supplies the hydraulic require-
ments to critical machinery, including the reversing
rougher and vertical and horizontal scale breakers. An
overall condition measurement greater than 20 mm/s is
used by the condition-monitoring department as the
Fig. 12 Artificial data prediction monitoring results criterion for determining that a pump has functionally
failed.
SPC analysis of the stable zone condition measure-
ments resulted in an average measurement value of
more focused and eventually identifying the failure time 6.5 mm/s and an alarm level of 9.5 mm/s. Subsequent
as being at 89 days with a very high degree of certainty. Weibull analysis, on the individual stable and failure
From examining the predicted probability of func- zone times, resulted in the zone parameters given in
tional failure during the monitoring intervals, shown in Table 1.
Fig. 13, it is clear that, had the machine continued Two pump replacement cases are discussed here to
highlight the capability and also, some of the difficulties,
associated with the prediction model when dealing with
real condition-monitoring information. During operation
of pump 2, an adjustment was made after a high
condition-monitoring measurement which had the effect
of reducing the subsequent measurements. In pump 3, an
abnormally high condition measurement was recorded
which fell above the functional failure criteria limit.

Table 1 Stable and failure zone


parameters
Parameter Stable zone Failure zone

b 2.91 1.03
Z 526 days 222 days
g 0 days 0 days
Fig. 13 Probability of failure between monitoring intervals
Proc Instn Mech Engrs Vol 214 Part E E00899 IMechE 2000

Downloaded from pie.sagepub.com at INDIAN INSTITUTE OF TECHNOLOG on December 2, 2014


PLANT MACHINERY WORKING LIFE PREDICTION METHOD 119

Fig. 14 Pump 2 condition-monitoring measurements

Subsequent measurements were far lower but no the estimated pump demise at 532 days. Examining the
maintenance action was reported. results presented in Fig. 15, the familiar broad stable zone
predictions are clearly distinguishable from the more
4.2.1 Hydraulic system 3, pump 2 focused failure zone predictions. The very narrow
prediction at 430 days is due to the high recorded
In studying the condition-monitoring measurements of measurement.
pump 2, shown in Fig. 14, it is observed that a It could be argued that pump 2 was replaced too early
maintenance action, conducted when the machine was and could have continued operation for a little while
480 days old, improved the health of the pump and longer, possibly with additional condition measurements
thereby prolonged its life. Clearly, this type of opera- to track the impending failure. Figure 16 shows that the
tional maintenance has not been incorporated into the last condition measurement resulted in a predicted chance
prediction model. To do this would be very complex and of functional failure during the next monitoring interval,
require substantial quantities of maintenance data. 14 days, as 12 per cent. An 88 per cent chance of survival
However, even with these restrictions, the model is still to the next monitoring time is arguably a worthwhile risk.
able to adapt and perform reasonably well in predicting However, since these pumps are critical to the HSM

Fig. 15 Pump 2 prediction model results


E00899 IMechE 2000 Proc Instn Mech Engrs Vol 214 Part E

Downloaded from pie.sagepub.com at INDIAN INSTITUTE OF TECHNOLOG on December 2, 2014


120 K B GOODE, J MOORE AND B J ROYLANCE

Fig. 16 Pump 2 chance of functional failure between monitoring intervals

operation, a 12 per cent chance of operational disruption pump 3, in Fig. 17, it is noticed that a number of
is undesirable. It may also be the case that maintenance abnormalities exist. A significant time period with no
scheduling played an important role in deciding when recorded measurements is observed at machine age
pump 2 was changed. Clearly, the prediction model 10361216 days, a consequence of the fact that the
enables a more scientific approach to be used in pump is on standby. A high measurement, in excess of
assessing the risk of machine failure. However, the final 20 mm/s, is present, which would normally indicate that
maintenance action decision will also be strongly the machine had functionally failed. However, as the
influenced by other external factors such as scheduling, latter measurements indicate, this high measurement was
criticality, cost, spares, environmental and safety impact. not normal and probably occurred as a result of process
Therefore, the need for a maintenance planner will still and/or measurement error.
continue. The abnormally high reading cannot be processed by
the prediction model, except to indicate that the pump has
4.2.2 Hydraulic system 3, pump 3 functionally failed and, hence, does not appear in the
results given in Fig. 18. The predicted time range of
In examining the condition-monitoring measurements of functional failure only narrows when the condition

Fig. 17 Pump 3 condition-monitoring measurements


Proc Instn Mech Engrs Vol 214 Part E E00899 IMechE 2000

Downloaded from pie.sagepub.com at INDIAN INSTITUTE OF TECHNOLOG on December 2, 2014


PLANT MACHINERY WORKING LIFE PREDICTION METHOD 121

Fig. 18 Pump 3 prediction model results

measurements are relatively high. Even when the alarm 5 CONCLUSIONS


limit is exceeded, at 992 and 1036 days, the prediction is
still quite broad, thereby indicating no requirement for Currently, in industry, condition monitoring can identify
immediate maintenance action. when machine problems are occurring and, given enough
These results show how the predictions focused on to experience, pinpoint the exact cause. However, it is more
the estimated failure point of 1259 days, prior to this difficult to predict the remaining life of the machine once
point being reached. Once again, the broad predictions in the problem has been identified and, therefore, when to
the stable zone are clearly distinguished from the more change or maintain the machine.
narrowly defined band of failure zone predictions. Current literature on remaining life prediction has been
A study of the predicted chance of failure between the focused predominantly on reliability-based or mathema-
monitoring intervals, shown in Fig. 19, indicates an tically complex models. There is clearly a need for a
increasing probability of functional failure. Following the simple systematic prediction model readily applicable to
last condition measurement, the model predicts a 31 per the industrial situation.
cent chance of pump failure before the next condition This paper describes the development of a model
measurement is taken and, hence, the decision to designed to achieve such a method. Condition-monitored
maintain the pump at this point is appropriate. measurements have been divided into two regions: a

Fig. 19 Pump 3 chance of functional failure between monitoring intervals


E00899 IMechE 2000 Proc Instn Mech Engrs Vol 214 Part E

Downloaded from pie.sagepub.com at INDIAN INSTITUTE OF TECHNOLOG on December 2, 2014


122 K B GOODE, J MOORE AND B J ROYLANCE

stable zone and failure zone. While in the stable zone, development of a predictive model for condition-based
condition measurements are normal and, hence, a maintenance in a steel works hot strip mill. In Proceedings
reliability-based model is employed. When condition of the JOAP International Condition Monitoring Con-
measurements indicate the existence of a problem, both ference, Mobile, Alabama, 1998, pp. 203218.
reliability and condition-monitoring information are 3 Wheeler, D. J. and Chambers, D. S. Understanding
combined to predict the remaining machine life. Statistical Process Control, 1990, pp. 577 (SPC Press).
4 Weatherill, G. B. and Brown, D. W. Statistical Process
Both simulated and real case studies were investigated
and ControlTheory and Practice, 1990 (Chapman and
to test the models performance and highlight some of its Hall, London).
implementation difficulties. Arising from these studies it 5 Asher, H. and Feingold, H. Repairable Systems Relia-
is evident that the prediction model is dependent on the bility, 1984 (Marcel Dekker, New York).
quality and accuracy of the condition-monitored 6 Van Alven, W. H. Reliability Engineering, 1964 (Prentice-
measurements. Hall, Englewood Cliffs, New Jersey).
It is anticipated that the model will enable a more 7 Davidson, J. F. Reliability of Mechanical Systems, IMechE
systematic approach to assessing the risk of machine Guides for the Process Industry, 1988 (Mechanical
failure and be applicable to most condition-monitored Engineering Publications, London).
situations, in which the failure lead time is sufficient and 8 Sherwin, D. J. Improved schedules by using data collection
the condition-monitoring measurements reflect the ma- under preventative maintenance. IEEE Trans. Reliability,
chines true health. However, the final maintenance 1984, R33(4), 315320.
action decision will inevitably depend on other external 9 Bloch, H. P. and Geitner, F. K. An Introduction to
factors such as scheduling, criticality, cost, spares, Machinery Reliability Assessment, 1990, pp. 3334 (Van
Nostrand Reinhold, New York).
environmental and safety impact.
10 Cox, D. R. and Lewis, P. A. W. The Statistical Analysis of
Series of Events, 1966 (John Wiley, New York).
11 Jardine, A. K. S. and Anderson, P. M. Use of concomitant
ACKNOWLEDGEMENTS variables for reliability estimation. Maintenance Managmt
Int., 1985, 5, 135140.
12 Jardine, A. K. S., Anderson, P. M. and Mann, D. S.
The authors would like to thank Dr B. J. Hewitt, Director,
Application of the Weibull proportional hazard model to
Technical, and Mr E. F. Walker, Manager, Technical Co- aircraft and marine engine failure data. Qual. Reliability
ordinator, Welsh Technology Centre, British Steel Strip Engng Int., 1987, 3, 7782.
Products, for permission to publish this paper and 13 Christer, A. H. and Waller, W. M. Delay time models of
acknowledge the support of the Engineering and Physical industrial inspection maintenance problems. J. Opl. Res.
Sciences Research Council. Thanks are also due to Port Soc., 1984, 35(5), 401406.
Talbot hot strip mill PCM department and the Llanwern 14 Christer, A. H. and Waller, W. M. Reducing production
FMMS department for their help and contribution to this downtime using delay time analysis. J. Opl. Res. Soc.,
project. 1984, 35(6), 499512.
15 Fitch, J. C. Proactive and predictive strategies for setting
oil analysis alarms and limits. In Proceedings of the JOAP
International Condition Monitoring Conference, Mobile,
REFERENCES Alabama, 1998, pp. 370378.
16 Moubray, J. RCM II, 1991 (ButterworthHeinemann,
1 Goode, K. B., Roylance, B. J. and Moore, J. Development Oxford).
of predictive model for monitoring of hot strip mill. Iron 17 BS 7854 Mechanical VibrationEvaluation of Measure-
Steelmaking, 1998, 25(1), 4247. ments on Non-rotating Parts (British Standards Institution,
2 Goode, K. B., Roylance, B. J. and Moore, J. The London).

Proc Instn Mech Engrs Vol 214 Part E E00899 IMechE 2000

Downloaded from pie.sagepub.com at INDIAN INSTITUTE OF TECHNOLOG on December 2, 2014

You might also like